Google's Breakthrough Weather AI
The weather may be getting worse, but we're getting better at forecasting it
Google DeepMind recently rolled out a new weather forecasting model that outperforms the best current models and pushes the limits of what we can predict about weather systems.
Right now the best weather forecaster may be an AI model. Google DeepMind has developed a new generative AI model called GenCast trained on four decades of historical data that provides more accurate probabilistic forecasts than ENS, the ensemble forecast of the European Centre for Medium-Range Weather Forecasts (ECMWF), the best operational medium-range weather forecast model. GenCast is a diffusion model similar to the models that have recently had breakthrough success producing images, sounds, and video. While it has some limitations, GenCast significantly outperformed ENS at predicting more than 97% of target combinations of weather features and better predicts extreme weather, tropical cyclone tracks, and wind power production.1 Rémi Lam, one of the scientists on the GenCast project, said in an interview that “it’s like we’ve made decades worth of improvements in one year.”
Weather forecasting is hard largely because weather systems are chaotic. Weather is “chaotic” not just in the ordinary sense of the word, but also in the technical sense that small differences in initial conditions can result in wildly different outcomes. Chaos theory originally grew out of meteorologist Edward Lorenz’ observation that rounding errors that would normally be trivial could cause weather models to produce dramatically different outputs. Lorenz showed that nonlinear systems like the weather can produce irregular, shifting patterns. Small perturbations could potentially have large effects; the movement of a butterfly’s wings could in theory determine where and when tornadoes form. While the weather may not in practice be as chaotic as Lorenz initially thought—and although weather forecasting has improved dramatically over the last 60 years—a 2019 paper argued that midlatitude weather may be “intrinsically unpredictable” more than about 15 days in advance. The chaotic nature of weather systems may make them effectively impossible to forecast past that point even with nearly perfect modeling of initial conditions.2
Since weather forecasting is fundamentally uncertain, weather forecasters use ensemble models to produce probabilistic forecasts. Rather than producing a single best guess about what will happen, ensemble models generate a range of plausible outcomes by introducing small random variations in the equations modeling weather processes. Ensemble models can use the range of outcomes to estimate the probability that a particular outcome will occur. If most of the outcomes generated by the ensemble are similar—if they all show a storm making landfall at the same place, for example—then the model’s uncertainty will be low, but if they show a wide range of different outcomes, its uncertainty will be high.
GenCast’s ensemble is able to produce well-calibrated probabilistic forecasts of the weather as far as 15 days ahead—in other words, close to the theoretical limit of predictability—and do it more quickly and more cheaply than current state of the art models. Predicting the weather more accurately and further in advance is not just an impressive technical accomplishment, but has the potential to save lives, reduce economic losses from storms and extreme weather, and manage renewable energy generation.
GenCast’s success makes me optimistic about the prospects for an AI model that can outperform human geopolitical forecasters. The modern discipline of probabilistic forecasting—including tools like the Brier scores we use to assess the accuracy of forecasts—were developed by weather forecasters. Politics and social behavior are chaotic and irregular in something like the way weather systems are, although forecasting politics is probably more challenging in some ways. I am skeptical of attempts to use large language models (LLMs) like GPT-4 to forecast politics by simulating the reasoning of skilled human forecasters. As impressive as these models are, they aren’t really general purpose reasoning engines; they struggle with logical consistency and probabilistic thinking. While GenCast was purpose built to model atmospheric physics on the surface of a globe, language models weren’t designed to forecast political behavior. Nevertheless, it’s probably only a matter of time before AI can identify patterns in politics that even the best human forecasters can’t readily see.
I’m on the most recent episode of the new podcast Hyperfixed—alongside Vox’s Dylan Matthews and Our World in Data’s Bastian Herre—talking to about whether it makes sense to bring kids into the world. Happy holidays and thanks for reading Telling the Future!