There are a lot of bad forecasts because people are not generally held accountable for the accuracy of their forecasts. In some cases, they’re not really making forecasts in good faith at all. I’m going to try to do better. I’ve made fifteen forecasts—only six of which have definitively resolved—in the nearly four months since I started this newsletter. That may not be enough to assess my accuracy, but I want you to have the opportunity to judge for yourselves.
“There’s no recession coming. The pessimistas were wrong. It’s not going to happen.”—Larry Kudlow, December 2007
One reason there are so many bad forecasts is that we aren’t generally held accountable for the accuracy of our forecasts. We often have other, more compelling incentives than we do to be accurate. Many of the people who make predictions are primarily entertainers whose main incentive is to attract viewers or get clicks. Others have an agenda—like promoting a policy or winning an election—that matters more to them than accurately predicting the future.
Collectively, we’re not very good at recognizing accurate forecasts anyway. Forecasts that sound accurate may not in fact be accurate. We’re more likely to find forecasts convincing that predict that what we already expected to happen will happen. Selective memory allows forecasters to take credit for good forecasts without necessarily being held responsible for bad ones.
I have most of the same incentives any other writer or pundit has. When I was competing for the lowest Brier score in forecasting tournaments I had a strong incentive to produce accurate forecasts. Presumably many of you are reading this precisely because I have that record of prior accuracy. But my success now depends on whether I attract readers and subscribers—please tell everyone you know to subscribe!—rather than on whether my forecasts are accurate. If I’m sufficiently interesting or entertaining, I could probably attract and retain readers without actually being accurate. If I’m able to promote myself well enough, readers might not notice if my forecasts aren’t accurate.
The only reliable way to evaluate forecast accuracy is to compare a large sample of forecasts to outcomes. In 1950, Glenn Brier proposed a method for doing this with weather forecasts.1 Brier proposed in effect that the forecast probabilities of events be scored according to how closely they correlate with what ultimately happens. In mathematical terms, a Brier score measures the mean squared error of a set of forecasts. It’s the average of the square of difference between the forecast probability of specific events and either 0% or 100% depending on whether the events ultimately occurred. Because Brier scores measure how far forecasts are off from outcomes—that is, forecasting error—lower scores represent greater accuracy. Because Brier scores are what mathematicians call a “strictly proper” scoring rule—which means the only sure way to improve the score is to be more accurate—they allow us to hold forecasters accountable specifically for their accuracy.

But I haven’t produced nearly enough forecasts in the four months since I started this newsletter to meaningfully score my accuracy. Only six of my forecasts—on just three distinct questions—have resolved definitively one way or another. With such a small sample of forecasts my score depends on whether I hit or miss on just a few questions. In any case, it’s hard to say how impressive my forecasts are on those particular questions—which might be hard or easy—without comparing them to other forecasts on the same or similar questions.
Nevertheless, I want to be accountable for my forecasts, so you can use your own judgment about how accurate they are. I can assure you, for what it’s worth, that when I do make a bad forecast—as I definitely sometimes do—I find it extremely embarrassing. Here are the six specific forecasts I’ve made that have already resolved definitively (the date I made the forecast is in parentheses):
65% chance Russia invades Ukraine before April (February 12) YES
25% chance Russia occupies territory or cities outside Eastern Ukraine (February 12) YES
30% chance Russia occupies territory or cities outside Eastern Ukraine (February 22) YES
8% chance of a bilateral ceasefire in Ukraine before June (March 24) NO
10% chance of a bilateral ceasefire in Ukraine before June (March 31) NO
3% chance of a bilateral ceasefire in Ukraine before June (April 28) NO
I was “on the right side of maybe”—thought what ended up happening was more likely than not—with four out of six forecasts. At the time I made predictions, I thought a Russian invasion was more likely than many other observers did. But, in hindsight, I probably should have forecast a higher probability of a full-scale invasion.
For the record, here are the nine outstanding predictions I have made that have yet to resolve:
4% chance Russia kills someone using a nuclear weapon before July 1 (March 1)
1% chance Russia kills someone using a nuclear weapon before July 1 (March 31)
6% chance Putin ceases to be President of Russia before January 2023 (March 15)
54% chance of a bilateral ceasefire in Ukraine before January 2023 (March 24)
63% chance of a bilateral ceasefire in Ukraine before January 2023 (March 31)
41% chance of a bilateral ceasefire in Ukraine before January 2023 (April 28)
50% chance PCE inflation is over 4.7% in the US in 2022 (April 7)
11% chance PCE inflation is under 4.0% in the US in 2022 (April 7)
68% chance the US goes into recession before January 2024 (May 20)
I will, in any case, continue to do my best to make accurate forecasts. I hope my forecasts can help you understand what’s going to happen a little better. But I’m counting on you to hold me accountable.
I’m very grateful to the hundreds of you who have already subscribed to this newsletter. If you enjoy my work, I hope you will support me by sharing it with others.
I assume you will write in detail about scoring in later posts. Could you please explain how to evaluate forecasts that were made at different points of time im that context? E.g. your own changing invasion probabilities.