Machine learning techniques are becoming important tools to solve questions in Earth system sciences. In recent years, Earth system data has grown dramatically. These data come from a variety of sources such as remote sensors, in situ observations, citizen science, and increasingly high-quality computer simulations of the Earth. Scientists use this data to make better predictions (such as weather forecasts or epidemic projections), test new hypotheses, improve existing models, and develop new theories. However, scientists are currently unable to fully benefit from these large volumes of data because our current ability to collect and generate data surpasses our ability to process, understand, and use them.
This sharp increase in data volumes has been accompanied by recent advances in machine learning algorithms from within the computer science community. “Machine learning” is a generic term for a variety of emerging data science algorithms that use data to learn to perform tasks without being explicitly programmed to. Such machine learning models can perform a variety of tasks with near-human-level skill including image recognition, classification, prediction, and pattern recognition. These algorithms have already revolutionized several domains, such as computer vision (He et al. 2015), natural language processing (Devlin et al., 2018), and video games (Justesen et al., 2019).
The success of machine learning has been adopted by atmospheric scientists, oceanographers, and climate scientists who are now using machine learning algorithms to better predict, process, analyze, and learn from large volumes of Earth systems data.
For example, machine learning has become a common approach in pattern identification and prediction of weather and climate phenomena such as synoptic fronts (Lagerquist et al., 2019) and El Niño (Ham et al., 2019). It is possible that, in the near future, a machine learning algorithm could provide a more accurate weather forecast than traditional numerical weather models. The idea that a purely data-driven algorithm could outperform state-of-the-art numerical weather models which are based on our physical knowledge is remarkable and could change our approach to the simulation of weather and climate (Balaji, 2021). Benchmarking community datasets to formalize data-driven numerical weather prediction intercomparisons to this end is emerging (Rasp et al., 2020).
For longer-term climate projection, another modeling approach that is gaining popularity is combining machine learning with traditional climate models to create hybrid climate models. This approach relies on (a) mathematical equations derived from physical laws for simulation of processes that are accurately captured in climate models and (b) machine learning for the simulation of processes, such as clouds, that are not well captured by current climate models, but for which high-fidelity training datasets can be generated (Gentine et al. 2018, Brenowitz and Bretherton 2018, Rasp et al. 2018, Yuval and O’Gorman 2020). This hybrid approach has been attracting scientists because it might answer some pressing questions regarding the climate sensitivity of Earth. However, there are still non-trivial issues with this approach. For example, using out-of-the-box machine-learning algorithms as a part of a climate model can lead to physically inconsistent results, such as the violation of conservation laws (Brenowitz and Bretherton, 2019).
Failure of machine learning algorithms to produce physically consistent results is one reason why there is a growing recognition that machine-learning approaches should be used in nuanced ways that incorporate ideas from existing scientific knowledge.
Such approaches are commonly referred to as knowledge-guided machine learning approaches. The idea of these approaches is to find ways to integrate scientific knowledge into machine learning algorithms; for example, by designing algorithms that can enforce physical constraints (Beucler et al. 2021), or tailoring training data to allow strategically emulating subprocesses in ways that also enable constraints be satisfied (Yuval et al. 2021).
Much remains to be explored in this new subfield, including what combination of algorithmic versus knowledge-guided approaches will lead to reliably robust operational process emulators. Since machine learning is empirical, and the decisions of how to optimize neural networks can also have a major effect on their performance in hybrid climate models (Ott et al. 2020), a healthy technical debate in the machine learning assisted climate simulation literature is emerging.
One promising thread is the advent of explainable artificial intelligence. Most machine learning applications lead to improvements in our predictive abilities but provide little information regarding how these machine learning algorithms provide accurate prediction, and many scientists perceive machine learning algorithms as uninterpretable “black-boxes.” However, in recent years, atmospheric scientists, oceanographers, and climate scientists have been adapting methods that help to interpret machine learning algorithms. For example, these interpretability methods can help us to understand when and why these algorithms can provide reliable subseasonal forecasts (Mayer and Barnes, 2021) and assist in discovering unknown equations for ocean turbulence (Zanna and Bolton, 2020). These ideas give us hope that machine learning algorithms together with the abundance of Earth system data could lead to scientific breakthroughs.
To accelerate this important application and communication within the fields of atmosphere, ocean, and land, a new special collection in Journal of Advances in Modeling Earth Systems (JAMES), entitled “Machine Learning Application to Earth System Modeling” aims to bring together new research that uses machine learning to advance Earth system modeling.
The collection is open to manuscripts covering use of new machine learning methodologies developed for advancing Earth system science (for example, interpretability of machine learning algorithms, physics-guided algorithms, causal inference, and hybrid modeling) and applications of Machine learning to Earth system modeling (for example, predictability of weather and climate, Machine learning parameterizations, uncertainty quantification). Manuscripts should be submitted via the GEMS website for JAMES.
—Janni Yuval ([email protected], 0000-0001-7519-0118), Massachusetts Institute of Technology, USA; Mike Pritchard ( 0000-0002-0340-6327), University of California Irvine, USA; Pierre Gentine ( 0000-0002-0845-8345), Columbia University, USA; Laure Zanna ( 0000-0002-8472-4828), New York University, USA; Jiwen Fan ( 0000-0001-5280-4391), Pacific Northwest National Laboratory, USA