Probabilistic forecasts give you insights into possible futures. In this post, I suggest that the best plot type to show them all at a glance is a ridgeline plot.
Areas of doubt and uncertainty
In data-driven operations, decisions are based on numeric expectations about what the future holds. Usually, these expectations come from point forecasts or from the expected value of probabilistic forecasts (the average of multiple opinions). Sometimes, decisions need to take into account risk, and so our uncertainty about the future is expressed by adding a prediction interval around these forecasts. What we need to see really depends on how we plan to use the information.
For example, I usually pack a rain coat if there’s more than 1% chance of rain during my trip. So I’m interested in the uncertainty of the forecaster.
However, when scoping out opportunities in data, we often do not know beforehand what we’re looking for. In those cases, taking a look at all possible outcomes may be exactly what we need. For example, we might find out that plotting expected values and prediction intervals—the conventional way of plotting forecasts—is useless when only 2 distinct outcomes are likely to occur. Clearly, the conventional way doesn’t cut it.
Probability distribution showing two distinct possible outcomes: around +10 or around -10. The expected value is 0, but there is a very small chance of that actually happening.
Fortune favours the ridge
In comes the ridgeline plot, a way to visualise density distributions over another dimension (such as time). It is still relatively new, given that it’s Wikipedia page simply redirects to it’s origin story as a music album cover.
At Seita, we use ridgeline plots to show every possible outcome of probabilistic forecasts. In the example below on the left, we show how the confidence of fluctuating temperature forecasts deteriorates as we look further into the future. On the right, we show how the confidence of temperature forecasts improves as we approach the hour we forecast.
Ridgeline plot looking forwards into the future, showing forecasts made at one time about different times in the future. The confidence of fluctuating temperature forecasts deteriorates as we look further into the future.
Ridgeline plot looking backwards into the past, showing forecasts made at different times about one specific time in the future. The confidence of temperature forecasts improves as we approach the hour we forecast.
From such graphs, you can quickly gain a lot of insights:
- Detect risks, for example, of overnight frost.
- Distinguish scenarios, in case forecasts show multi-model distributions (not shown here).
- Know when things become more certain, maybe because relevant information with predictive power becomes available.
- Know when things are more certain. For example, this forecasting model seems to be more confident in predicting daily minima and maxima than intermediate temperatures (see left plot).
With every decision, you place a bet on what the futures holds. Ridgeline plots help you see what to bet and when to bet it. If you’re interested in plotting probabilistic forecasts this way, check out our open-source package for handling time series forecasts.
Developer links
After gaining popularity as an R package, a few popular Python visualisation libraries now support ridgeline plots, such as:
- Altair
- Bokeh
- Seaborn
- Matplotlib (through this extension)
“That’s right! We demand guaranteed rigidly defined areas of doubt and uncertainty!” ~ Douglas Adams