The Forecasting Proficiency Test

And what it teaches us about forecasting

Feb 11, 2025

Nine college-age students at individual desks taking an exam.

A recent paper from the Forecasting Research Institute develops an easy-to-administer test for forecasting ability. While the best indicator of forecasting ability is performance on forecasting questions, performance on certain cognitive tasks involved in forecasting may be a decent preliminary indicator of forecasting ability.

The best way to tell how good a forecaster someone is is to see how well they do in practice; the best predictor of how accurate their forecasts will be is how accurate their forecasts have been. The original Good Judgment Project—which I participated in—was essentially a massive forecasting test: participants who were exceptionally accurate were identified as superforecasters. But it takes on the order of 100 unique questions to reliably distinguish between consistently accurate forecasters and people who just happened to make some lucky guesses. We had to spend nine months in a formal forecasting setting to establish a substantial enough track record to demonstrate forecasting skill.

We can potentially identify capable forecasters more quickly by using “intersubjective measures” like proxy scoring. Rather than waiting to see how forecasting questions resolve, we can score forecasters against one another. One way is to score individual forecasts against the aggregate forecast produced by a large group of forecasters. Crowd forecasts are generally fairly accurate since the individual forecasters’ errors tend to cancel one another out, so we can use the crowd aggregate as a proxy for actual outcomes. Another way is to ask people to make meta-predictions about what other forecasters will forecast. Forecasters who can accurately estimate what crowd aggregate demonstrate that they can reproduce a fairly accurate forecast. These intersubjective scoring techniques have been shown to be almost as effective at evaluating forecasters as scoring them against their actual results. While intersubjective scoring has the advantage that it can be done in real time, without waiting to see forecasting questions resolve, you’d still normally need to have access to a large sample of formal forecasts to assess forecasters with these techniques.

My forecasting track record with Good Judgment from 2015-2024. The line y=x represents perfect calibration (events occurring exactly as often as predicted).

A Forecasting Research Institute paper led by Mark Himmelstein develops an easy-to-administer test to identify forecasting skill outside of a large formal forecasting setting.1 Their Forecasting Proficiency Test predicts more than 60% of the variation on forecasting performance in other settings, making it potentially a useful tool for identifying forecasting talent. Since performance on actual forecasting questions is clearly the best indicator of forecasting skill, the paper calls for a forecasting proficiency test to include at least some actual forecasting questions. The authors find that they can meaningfully score forecasts without having a large sample of contemporaneous forecasts by comparing them to normalized historical forecasts—so the test can be administered outside of a large formal forecasting project—although they still have to wait a month or more for forecasting questions to resolve.

But forecasting skill is also associated with a number of traits a test can measure. The authors found that they can obtain good preliminary results—good enough to explain something like 40% of the variation in forecasting performance in other settings—by testing people on a range of cognitive tasks. These included pattern extrapolation and probabilistic reasoning tasks on which performance has previously been shown to be associated with forecasting ability. The authors also found that they were able to identify forecasting ability by including several novel tasks. One tested the ability to avoid denominator neglect bias, which is the tendency to treat proportions as larger if the absolute value of the numerator is larger—to treat 9-in-100 as larger than 1-in-10, for example—regardless of the value of the numerator. Another tested the ability to update probabilistic judgments based on new information (often called “Bayesian updating” because it involves applying Bayes’ rule). A third tested the ability to apply decision-making rules to evaluate outcomes according to normative preferences.

February 3, 2025 tweet by Jessica Riedl reading, "I see a lot of people want to throw out 230 years of constitutional government and replace it with an authoritarian dictator because they have big feelings about the budget and can't be bothered to work through Congress. That will surely work out well."

The value of these cognitive tasks in identifying forecasting ability—this is what interests me the most—hints at the role those tasks play in forecasting. Forecasting requires more than just general fluid intelligence and numerical reasoning ability. It’s specifically associated with the ability to recognize and extrapolate patterns, to reason probabilistically, and to apply decision-making rules. This shouldn’t be too surprising. Forecasting is to a large extent extrapolating the patterns of the past into the future in the same way you might identify the next number in a sequence. Because the future is uncertain, forecasting also requires the ability to think probabilistically. Interestingly, the Forecasting Proficiency Test is better at singling out the worst forecasters than it is at identifying elite forecasters. That suggests that what distinguishes the best forecasters from merely competent forecasters may still not be captured by any of these measures.

I’ll write about the extraordinary damage the Trump administration is already doing to American democracy soon. Telling the Future depends entirely on the support of readers, so if you found this interesting, please consider buying a paid subscription. You can also always help spread the word by sharing Telling the Future with others.

January 29, 2025 tweet by Scorpio Baby consisting of a picture of a woman wearing a Mexico football jersey holding a sign saying, "I drink my horchata warm cause FUCK I.C.E. The image is caption with a middle finger emoji and an ice cube emoji.

Mark Himmelstein, Sophie Ma Zhu, Nikolay Petrov, et. al., “The Forecasting Proficiency Test: A General Use Assessment of Forecasting Ability” (2024).

Telling the Future

Discussion about this post

Ready for more?