- To provide daily and annual PM2.5 concentration data in the U.S. at a resolution of 1-km (about 30 arc-seconds) for public health research to respectively estimate short- and long-term effects on human health, and for other related research.
- The Daily and Annual PM2.5 Concentrations for the Contiguous United States, 1-km Grids, Version 1.10 (2000-2016) data set includes predictions of PM2.5 concentration in grid cells at a resolution of 1-km for the years 2000-2016. A generalized additive model was used that accounted for geographic difference to ensemble daily predictions of three machine learning models: neural network, random forest, and gradient boosting. The three machine learners incorporated multiple predictors, including satellite data, meteorological variables, land-use variables, elevation, chemical transport model predictions, several reanalysis data sets, and others. The annual predictions were calculated by averaging the daily predictions for each year in each grid cell. The ensembled model demonstrated better predictive performance than the individual machine learners with 10-fold cross-validated R-squared values of 0.86 for daily predictions and 0.89 for annual predictions. In version 1.10, the completeness of daily PM2.5 predictions have been enhanced by employing linear interpolation to impute missing values. Specifically, for days with small spatial patches of missing data with less than 100 grid cells, inverse distance weighting interpolation was used to fill the missing grid cells. Other missing daily PM2.5 predictions were interpolated from the nearest days with available data. Annual predictions were updated by averaging the imputed daily predictions for each year in each grid cell. These daily and annual PM2.5 predictions allow public health researchers to respectively estimate the short- and long-term effects of PM2.5 exposures on human health, supporting the U.S. Environmental Protection Agency (EPA) for the revision of the National Ambient Air Quality Standards for 24-hour average and annual average concentrations of PM2.5. The data are available in RDS and GeoTIFF formats for statistical research and geospatial analysis.
- Recommended Citation(s)*:
Di, Q., Y. Wei, A. Shtein, X. Xing, E. Castro, H. Amini, C. Hultquist, L. Shi, I. Kloog, R. Silvern, J. Kelly, M. B. Sabath, C. Choirat, P. Koutrakis, A. Lyapustin, Y. Wang, L. J. Mickley, Y. Daouk, and J. Schwartz. 2024. Daily and Annual PM2.5 Concentrations for the Contiguous United States, 1-km Grids, Version 1.10 (2000-2016). Palisades, New York: NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/g2n9-ca10. Accessed DAY MONTH YEAR.
Di, Q., H. Amini, L. Shi, I. Kloog, R. Silvern, J. Kelly, M. B. Sabath, C. Choirat, P. Koutrakis, A. Lyapustin, Y. Wang, L. J. Mickley, and J. Schwartz. 2019. An Ensemble-based Model of PM2.5 Concentration Across the Contiguous United States with High Spatiotemporal Resolution. Environment International 130: 104909. https://doi.org/10.1016/j.envint.2019.104909.
* When authors make use of data they should cite both the data set and the scientific publication, if available. Such a practice gives credit to data set producers and advances principles of transparency and reproducibility. Please visit the data citations page for details. Users who would like to choose to format the citation(s) for this dataset using a myriad of alternate styles can copy the DOI number and paste it into Crosscite's website.
† For EndNote users, please check the Research Note field for issues with importing authors that are organizations when using the ENW file format.
- Available Formats:
- raster, tabular, vector