Using machine learning to derive cloud condensation nuclei number concentrations from commonly available measurements

Nair, A.A., and F. Yu (2020), Using machine learning to derive cloud condensation nuclei number concentrations from commonly available measurements, Atmos. Chem. Phys., doi:10.5194/acp-2020-509.
Abstract

Cloud condensation nuclei (CCN) number concentrations are an important aspect of aerosol–cloud interactions and the subsequent climate effects; however, their measurements are very limited. We use a machine learning tool, random decision forests, to develop a Random Forest Regression Model (RFRM) to derive CCN at 0.4% supersaturation ([CCN0.4]) from commonly available measurements. The RFRM is trained on the long-term simulations in a global size-resolved particle mi5 crophysics model. Using atmospheric state and composition variables as predictors, through associations of their variabilities, the RFRM is able to learn the underlying dependence of [CCN0.4] on these predictors, which are: 8 fractions of PM2.5 (NH4 , SO4 , NO3 , secondary organic aerosol (SOA), black carbon (BC), primary organic carbon (POC), dust, and salt), 7 gaseous species (NOx , NH3 , O3 , SO2 , OH, isoprene, and monoterpene), and 4 meteorological variables (temperature (T), relative humidity (RH), precipitation, and solar radiation). The RFRM is highly robust: median mean fractional bias (MFB) of 4.4%

10 with ∼ 96.33% of the derived [CCN0.4] within a good agreement range of −60% < MFB < +60% and strong correlation of Kendall’s τ coefficient ≈ 0.88. The RFRM demonstrates its robustness over 4 orders of magnitude of [CCN0.4] over varying spatial (such as continental to oceanic, clean to polluted, and near surface to upper troposphere) and temporal (from the hourly to the decadal) scales. At the Atmospheric Radiation Measurement Southern Great Plains observatory (ARM SGP) in Lamont, Oklahoma, United States, long-term measurements for PM2.5 speciation (NH4 , SO4 , NO3 , and organic carbon (OC)), NOx ,

15 O3 , SO2 , T, and RH, as well as [CCN0.4] are available. We modify, optimise, and retrain the developed RFRM to make predictions from 19 → 9 of these available predictors. This retrained RFRM (RFRM-ShortVars) shows a reduction in performance due to the unavailability and sparsity of measurements (predictors); it captures the [CCN0.4] variability and magnitude at SGP with ∼ 67.02% of the derived values in the good agreement range. This work shows the potential of using the more commonly available measurements of PM2.5 speciation to alleviate the sparsity of CCN number concentrations’ measurements.

20

1

Research Program
Atmospheric Composition Modeling and Analysis Program (ACMAP)