- To provide annual PM2.5 component trace elements concentration data for the contiguous U.S. at resolutions of 50m in urban areas and 1km in non-urban areas for public health research to estimate effects on human health, and for other related research.
- The Annual Mean PM2.5 Components Trace Elements (TEs) 50m Urban and 1km Non-Urban Area Grids for Contiguous U.S., 2000-2019, v1 data set contains annual predictions of trace elements concentrations at a hyper resolution (50m x 50m grid cells) in urban areas and a high resolution (1km x 1km grid cells) in non-urban areas, for the years 2000 to 2019. Particulate matter with an aerodynamic diameter of less than 2.5 µm (PM2.5) is a human silent killer of millions worldwide, and contains many trace elements (TEs). Understanding the relative toxicity is largely limited by the lack of data. In this work, ensembles of machine learning models were used to generate approximately 163 billion predictions estimating annual mean PM2.5 TEs, namely Bromine (Br), Calcium (Ca), Copper (Cu), Iron (Fe), Potassium (K), Nickel (Ni), Lead (Pb), Silicon (Si), Vanadium (V), and Zinc (Zn). The monitored data from approximately 600 locations were integrated with more than 160 predictors, such as time and location, satellite observations, composite predictors, meteorological covariates, and many novel land use variables using several machine learning algorithms and ensemble methods. Multiple machine-learning models were developed covering urban areas and non-urban areas. Their predictions were then ensembled using either a Generalized Additive Model (GAM) Ensemble Geographically-Weighted-Averaging (GAM-ENWA), or Super-Learners. The overall best model R-squared values for the test sets ranged from 0.79 for Copper to 0.88 for Zinc in non-urban areas. In urban areas, the R-squared model values ranged from 0.80 for Copper to 0.88 for Zinc. The Coordinate Reference System (CRS) used in the predictions is the World Geodetic System 1984 (WGS84) and the units for the PM2.5 Components TEs are ng/m^3. The data are provided in RDS tabular format, a file format native to the R programming language, but can also be opened by other languages such as Python.
- Recommended Citation(s)*:
Amini, H., M. Danesh-Yazdi, Q. Di, W. Requia, Y. Wei, Y. AbuAwad, L. Shi, M. Franklin, C.-M. Kang, J. M. Wolfson, P. James, R. Habre, Q. Zhu, J. S. Apte, Z. J. Andersen, X. Xing, C. Hultquist, I. Kloog, F. Dominici, P. Koutrakis, and J. Schwartz. 2023. Annual Mean PM2.5 Components Trace Elements (TEs) 50m Urban and 1km Non-Urban Area Grids for Contiguous U.S., 2000-2019, v1. Palisades, New York: NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/1x94-mv38. Accessed DAY MONTH YEAR.
* When authors make use of data they should cite both the data set and the scientific publication, if available. Such a practice gives credit to data set producers and advances principles of transparency and reproducibility. Please visit the data citations page for details. Users who would like to choose to format the citation(s) for this dataset using a myriad of alternate styles can copy the DOI number and paste it into Crosscite's website.
† For EndNote users, please check the Research Note field for issues with importing authors that are organizations when using the ENW file format.
- Available Formats: