Efficient and Flexible Aggregation and Distribution of MODIS Atmospheric Products Based on Climate Analytics as a Service Framework
<p>Illustration of data partitioning-based parallel aggregation with one-month data.</p> "> Figure 2
<p>Integration of Stratus service framework with parallel MODIS aggregation.</p> "> Figure 3
<p>The comparison of the mean value and pixel counts of cloud top temperature between the python-based flexible aggregation algorithm and the MYD08_D3 product on 1 January 2008.</p> "> Figure 4
<p>The comparison of the cloud fraction between the python-based algorithm and the MYD08_D3 product on 1 January 2008.</p> "> Figure 5
<p>The flow chart of the aggregation on the desired cloud top temperature by using the flexible aggregation framework (<b>left</b>) and the current L3 product (<b>right</b>). The dashed box indicates the internal process in the python-based flexible aggregation method.</p> "> Figure 6
<p>Execution time results (in seconds) for scalability evaluation by (<b>a</b>) increasing the number of processes within a node and (<b>b</b>) increasing the number of nodes with 32 processes per node.</p> ">
Abstract
:1. Introduction
2. Data and Methods
2.1. General Procedures of the Flexible Aggregation
- (1)
- Read all input options from the user, including the spatiotemporal resolutions, the defined sampling rate, the selected variables with the desired statistics, and the list of variable names of the L2 products.
- (2)
- Create an empty Python dictionary that is used to store data values in key (i.e., value pairs with designated key names in strings). This provides a flexible way to store and refer to the aggregated values of each combination of selected L2 variables and statistics given by the user. For instance, if a user requests two variables with five statistics as the aggregation output, there will be ten keys assigned in the Python dictionary. Each key is a L3 variable name that is combined by the requested L2 variable and statistics (e.g., “cloud_fraction_minimum”).
- (3)
- Initialize the data arrays corresponding to each key names that represents each combination of selected L2 variables and statistics in the Python dictionary. Note that different combinations of L2 variables and statistics have varied data dimensions. The L2 variables with statistics other than 1D and 2D histograms have two dimensions (i.e., latitude × longitude). Otherwise, any L2 variables with a statistic of a 1D histogram is in three dimensions, which are latitude, longitude, and variable intervals of the 1D histogram. Similarly, any L2 variables with a statistic of 2D histogram therefore has four dimensions. Take the example in step (2), the “cloud_fraction_minimum” is a 2D array, while the “cloud_fraction_histogram_counts” is a three-dimensional array.
- (4)
- As shown in the user guide of the collection 6 MODIS L3 product [8], the definition of Day of the daily L3 product is different from that in the natural way, which ranges from 00:00 to 24:00 UTC. Consequently, as explained further below, the adjustment of the definition of Day is applied after the initialization of the Python dictionary.
- (5)
- Read all values and the affiliated attributes (e.g., fill values, offsets, scale factors, physical units; etc.) from selected variables of the L2 products with the defined sampling rate within the defined range of time that is adjusted by the new definition of Day.
- (6)
- Note that the L2 MODIS atmosphere products include runtime QA information for all variables. For instance, the cloud optical property QA flags (e.g., “Quality_Assurance_1km”) in MOD06/MYD06 products describe the product quality, retrieval processing and scene characteristic flags [20]. The data with no confidence are assigned with the affiliated fill value. As the Collection 6 and 6.1 L3 atmosphere products no longer have QA-weighted aggregations of the L2 cloud properties (MOD06/MYD06) for the seven selected statistics in this study [8,9], we developed our flexible algorithm for the statistics without a QA weighting. Therefore, the invalid L2 input data are filtered by the affiliated fill value and all valid input L2 data are then assigned equal weights for the statistics calculation.
- (7)
- Aggregate the variables into the defined grid boxes over the defined region and store them in the Python dictionary. Then, they calculate the requested statistics for each variable in each corresponding key name in the Python dictionary. The calculations of each of seven statistics are listed below:
- The minimum and maximum: The initial keys with the minimum and maximum are set to be negative and positive infinitely, respectively. During the aggregation, the value of the keys is replaced by the new minimum/maximum value until the end of the process.
- The pixel counts: For cloud fraction, the pixel count of each grid is the total number of L2 pixels within the grid. For other cloud properties, the pixel count of each grid is the number of confident cloudy pixels derived from the “cloud mask” in the MYD06_L2 product.
- The mean value: For the cloud fraction of each grid, the mean value is the number of confident cloudy pixels divided by the number of total L2 pixels within the grid. For other cloud properties, the mean value of each grid is the summation of L2 pixel values within the grid divided by the number of total L2 pixels within the grid. Note that the pixel counts of each selected variable is calculated if the user only requests the calculation of the mean value.
- The standard deviation: Normally, the standard deviation in each grid is calculated as , where is the L2 value of each pixel within the grid, is the grid mean value and is the grid pixel counts. However, it is impossible to aggregate as the final grid mean value is absent until the end of the aggregation. Instead, we derive the equation of standard deviation as . In this formula, we can obtain the during the aggregation and further get the after the calculation of and at the end of the process. Note that the pixel counts and the mean value of each selected variable is required if the user only requests the calculation of the standard deviation.
- The 1D histogram: In each grid, we count the numbers in each bin within the interval range that were defined by users for the 1D histogram.
- The 2D histogram: In each grid, we count the numbers in each bin within the interval range that is defined by users for the 2D histogram.
- (8)
- Generate an HDF file (version 4 and/or version 5) to store the aggregated outputs values and attributes.
2.2. Parallelization of the Aggregation Algorithm
2.3. Service-Based MODIS Aggregation Framework
3. Results
3.1. Comparison with Level-3 MODIS Cloud Product in Collection 6.1 (MYD08_D3)
3.2. Comparison of Usage Difference of L3 Data and Flexible Aggregation
3.3. Speed-Up Improvement of the Parallelization
4. Summaries and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Salomonson, V.V.; Barnes, W.; Xiong, J.; Kempler, S.; Masuoka, E. An overview of the Earth Observing System MODIS instrument and associated data systems performance. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Toronto, ON, Canada, 24–28 June 2002; pp. 1174–1176. [Google Scholar]
- Lee, J.; Kim, J.; Yang, P.; Hsu, N.C. Improvement of aerosol optical depth retrieval from MODIS spectral reflectance over the global ocean using new aerosol models archived from AERONET inversion data and tri-axial ellipsoidal dust database. Atmos. Chem. Phys. 2012, 12, 7087–7102. [Google Scholar] [CrossRef] [Green Version]
- Xiong, X.; Chiang, K.; Esposito, J.; Guenther, B.; Barnes, W. MODIS on-orbit calibration and characterization. Metrologia 2003, 40, S89. [Google Scholar] [CrossRef]
- Justice, C.; Townshend, J.; Vermote, E.; Masuoka, E.; Wolfe, R.; Saleous, N.; Roy, D.; Morisette, J. An overview of MODIS Land data processing and product status. Remote Sens. Environ. 2002, 83, 3–15. [Google Scholar] [CrossRef]
- Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
- Righi, M.; Andela, B.; Eyring, V.; Lauer, A.; Predoi, V.; Schlund, M.; Vegas-Regidor, J.; Bock, L.; Brötz, B.; de Mora, L. Earth System Model Evaluation Tool (ESMValTool) v2. 0–technical overview. Geosci. Model Dev. 2020, 13, 1179–1199. [Google Scholar] [CrossRef] [Green Version]
- Masuoka, E.; Fleig, A.; Wolfe, R.E.; Patt, F. Key characteristics of MODIS data products. IEEE Trans. Geosci. Remote 1998, 36, 1313–1323. [Google Scholar] [CrossRef]
- Platnick, S.; King, M.D.; Meyer, K.G.; Wind, G.; Amarasinghe, N.; Marchant, B.; Arnold, G.T.; Zhang, Z.; Hubanks, P.A.; Ridgway, B.; et al. MODIS Cloud Optical Properties: User Guide for the Collection 6/6.1 Level-2 MOD06/MYD06 Product and Associated Level-3 Datasets. Version 2018, 1, 150. [Google Scholar]
- Hubanks, P.A.; Platnick, S.; King, M.D.; Ridgway, B. MODIS Atmosphere L3 Global Gridded Product User’s Guide & algorithm theoretical basis document (ATBD) for C6.1 products: 08_D3, 08_E3, 08_M3. Version 2020, 1, 129. [Google Scholar]
- Savtchenko, A.; Ouzounov, D.; Ahmad, S.; Acker, J.; Leptoukh, G.; Koziana, J.; Nickless, D. Terra and Aqua MODIS products available from NASA GES DAAC. Adv. Space Res. 2004, 34, 710–714. [Google Scholar] [CrossRef]
- Park, S.S.; Kim, J.; Lee, J.; Lee, S.; Kim, J.S.; Chang, L.S.; Ou, S. Combined dust detection algorithm by using MODIS infrared channels over East Asia. Remote Sens. Environ. 2014, 141, 24–39. [Google Scholar] [CrossRef] [Green Version]
- Zhang, P.; Lu, N.-M.; Hu, X.-Q.; Dong, C.-H. Identification and physical retrieval of dust storm using three MODIS thermal IR channels. Glob. Planet Chang. 2006, 52, 197–206. [Google Scholar] [CrossRef]
- Platnick, S.; Heidinger, A.; Ackerman, S.; Amarasinghe, N.; Dutcher, S.; Frey, R.; Hubanks, P.; Li, Y.; Marchant, B.; Meyer, K. EOS MODIS and SNPP VIIRS Cloud Properties: User Guide for the Climate Data Record Continuity Level-2 Cloud Top and Optical Properties Product (CLDPROP); Technical Report; NASA Goddard Space Flight Center: Greenbelt, MD, USA, 2019.
- Ruiz-Arias, J.; Dudhia, J.; Gueymard, C.; Pozo-Vázquez, D. Assessment of the Level-3 MODIS daily aerosol optical depth in the context of surface solar radiation and numerical weather modeling. Atmos. Chem. Phys. 2013, 13, 675–692. [Google Scholar] [CrossRef] [Green Version]
- NASA LAADS Distributed Active Archive Center. Available online: https://ladsweb.modaps.eosdis.nasa.gov/search/ (accessed on 30 June 2021).
- Acker, J.G.; Leptoukh, G. Online analysis enhances use of NASA earth science data. Eos Trans. Am. Geophys. Union 2007, 88, 14–17. [Google Scholar] [CrossRef]
- Wang, J.; Huang, X.; Zheng, J.; Rajapakshe, C.; Kay, S.; Kandoor, L.; Maxwell, T.; Zhang, Z. Scalable Aggregation Service for Satellite Remote Sensing Data. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing, New York, NY, USA, 2–4 October 2020; pp. 184–199. [Google Scholar]
- Chang, I.; Gao, L.; Burton, S.P.; Chen, H.; Diamond, M.S.; Ferrare, R.A.; Flynn, C.J.; Kacenelenbogen, M.; LeBlanc, S.E.; Meyer, K.G.; et al. Spatiotemporal Heterogeneity of Aerosol and Cloud Properties Over the Southeast Atlantic: An Observational Analysis. Geophys. Res. Lett. 2021, 48, e2020GL091469. [Google Scholar] [CrossRef]
- Cai, Y.; Guan, K.; Lobell, D.; Potgieter, A.B.; Wang, S.; Peng, J.; Xu, T.; Asseng, S.; Zhang, Y.; You, L.; et al. Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. Agric. For. Meteorol. 2019, 274, 144–159. [Google Scholar] [CrossRef]
- Hubanks, P.A. MODIS Atmosphere QA Plan for Collection 061. Version 2017, 1, 67. [Google Scholar]
- Sayer, A.M.; Knobelspiesse, K.D. How should we aggregate data? Methods accounting for the numerical distributions, with an assessment of aerosol optical depth. Atmos. Chem. Phys. 2019, 19, 15023–15048. [Google Scholar] [CrossRef] [Green Version]
- Levy, R.C.; Leptoukh, G.G.; Kahn, R.; Zubko, V.; Gopalan, A.; Remer, L.A. A critical look at deriving monthly aerosol optical depth from satellite data. IEEE T. Geosci. Remote 2009, 47, 2942–2956. [Google Scholar] [CrossRef]
- Colarco, P.R.; Kahn, R.A.; Remer, L.A.; Levy, R.C. Impact of satellite viewing-swath width on global and regional aerosol optical thickness statistics and trends. Atmos. Meas. Tech. 2014, 7, 2313–2335. [Google Scholar] [CrossRef] [Green Version]
- Li, R.; Hu, H.; Li, H.; Wu, Y.; Yang, J. MapReduce parallel programming model: A state-of-the-art survey. Int. J. Parallel Program. 2016, 44, 832–866. [Google Scholar] [CrossRef]
- Maitrey, S.; Jha, C. MapReduce: Simplified data analysis of big data. Procedia Comput. Sci. 2015, 57, 563–571. [Google Scholar] [CrossRef] [Green Version]
- Dean, J.; Ghemawat, S. Mapreduce: Simplified data processing on large clusters. Commun. ACM 2008, 51, 107–113. [Google Scholar] [CrossRef]
- Rocklin, M. Dask: Parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; p. 136. [Google Scholar]
- Pacheco, P. Parallel Programming with MPI; Morgan Kaufmann: Burlington, MA, USA, 1997. [Google Scholar]
- Dalcín, L.; Paz, R.; Storti, M. MPI for Python. J. Parallel Distrib. Comput. 2005, 65, 1108–1115. [Google Scholar] [CrossRef]
- Perrey, R.; Lycett, M. Service-oriented architecture. In Proceedings of the 2003 Symposium on Applications and the Internet Workshops, Orlando, FL, USA, 27–31 January 2003; pp. 116–119. [Google Scholar]
- Laskey, B.; Laskey, K. Service oriented architecture. Wiley Interdiscip. Rev. Comput. Stat. 2009, 1, 101–105. [Google Scholar] [CrossRef]
- Schnase, J.L. Climate analytics as a service. In Cloud Computing in Ocean and Atmospheric Sciences; Elsevier: Amsterdam, The Netherlands, 2016; pp. 187–219. [Google Scholar]
- Stratus: Synchronization Technology Relating Analytic Transparently Unified Services. Available online: https://github.com/nasa-nccs-cds/stratus/ (accessed on 30 June 2021).
- Hintjens, P. ZeroMQ: Messaging for Many Applications; O’Reilly Media Inc.: Newton, MA, USA, 2013. [Google Scholar]
- Fengping, P.; Jianzheng, C. Distributed system based on ZeroMQ. Electron. Test 2012, 7, 24–29. [Google Scholar]
- ZeroMQ: An Open-Source Universal Messaging Library. Available online: https://zeromq.org/ (accessed on 30 June 2021).
- OpenAPI Initiative (OAI). Available online: www.openapis.org (accessed on 30 June 2021).
- Pautasso, C.; Wilde, E.; Alarcon, R. REST: Advanced Research Topics and Practical Applications; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Slurm Workload Manager. Available online: https://slurm.schedmd.com/overview.html (accessed on 30 June 2021).
- Scalable MODIS Data Aggregation Platform. Available online: https://github.com/big-data-lab-umbc/MODIS_Aggregation (accessed on 30 June 2021).
- Amazon Web Services (AWS). Available online: https://aws.amazon.com (accessed on 30 June 2021).
User-Defined Inputs | Formats |
---|---|
Input Data Path | [data input path, file prefix] |
Output Data Path | [data output path, file prefix] |
Range of time | Start Date and End Date (yyyy/mm/dd) |
Regional Boundaries | [latitude1, latitude2, longitude1, longitude2] |
Spatial resolution | [Latitude degree; Longitude degree] |
Sampling rates | Positive integer |
Statistics | 7 statistics with value of True/False |
Input file of user-defined variables | Variable names with the histogram intervals |
Variable with Joint 2D histogram | Variable names with the joint histogram intervals |
User-Defined Input | Input Value |
---|---|
Range of time | 1 January 2008 to 1 January 2008 |
Regional Boundaries | −90° N~90° N; 180° W~180° E |
Spatial resolution | 1° by 1° |
Sampling rates | 1 |
Statistics | Mean, Pixel Counts, Minimum, Maximum, Standard Deviation |
Input variables for aggregation | 5 km Cloud Top Temperature and 5 km Cloud Fraction |
Variable | Statistics | Difference (This Study—MYD08_D3) | |
---|---|---|---|
Cloud Fraction | Minimum | 97.7% | |
Maximum | 99.5% | ||
Pixel Counts | 98.6% | ||
Mean | 98.6% | ||
Standard Deviation | 98.9% | ||
Cloud Top Temperature | Minimum | 96.5% | |
Maximum | 87.4% | ||
Pixel Counts | 98.9% | ||
Mean | 97.3% | ||
Standard Deviation | 99.2% |
User-Defined Input | Input Value |
---|---|
Range of time | 1 January 2008 to 1 January 2008 |
Regional Boundaries | 20° N~40° N; 120° E~150° E |
Spatial resolution | 0.5° by 0.5° |
Sampling rates | 1 |
Statistics | All seven statistics |
Input variables for aggregation | 1 km Cloud Top Temperature |
User-Defined Input | Input Value |
---|---|
Range of time | 1 January 2008 to 31 January 2008 |
Regional Boundaries | −90° N~90° N; 180° W~180° E |
Spatial resolution | 1° by 1° |
Sampling rates | 5 |
Statistics | All seven statistics |
Input variables for aggregation | 1 km Cloud Top Temperature, 1 km Cloud Top Pressure, 1 km Cloud Top Height, 1 km Cloud Emissivity and 1 km Cloud Mask Flag |
Num of Nodes | Num of Processes | Dask Execution Time | Dask Speedup | MPI Execution Time | MPI Speedup |
---|---|---|---|---|---|
1 | 2 | 19,105 | 1.89 | 18,368 | 1.97 |
1 | 4 | 10,036 | 3.60 | 9459 | 3.82 |
1 | 8 | 5143 | 7.03 | 4919 | 7.35 |
1 | 16 | 3076 | 11.75 | 2646 | 13.67 |
1 | 32 | 1586 | 22.80 | 1579 | 22.90 |
2 | 64 | 811 | 44.59 | 803 | 45.03 |
4 | 128 | 540 | 66.97 | 564 | 64.12 |
6 | 192 | 484 | 74.71 | 416 | 86.93 |
8 | 256 | 418 | 86.50 | 408 | 88.63 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zheng, J.; Huang, X.; Sangondimath, S.; Wang, J.; Zhang, Z. Efficient and Flexible Aggregation and Distribution of MODIS Atmospheric Products Based on Climate Analytics as a Service Framework. Remote Sens. 2021, 13, 3541. https://doi.org/10.3390/rs13173541
Zheng J, Huang X, Sangondimath S, Wang J, Zhang Z. Efficient and Flexible Aggregation and Distribution of MODIS Atmospheric Products Based on Climate Analytics as a Service Framework. Remote Sensing. 2021; 13(17):3541. https://doi.org/10.3390/rs13173541
Chicago/Turabian StyleZheng, Jianyu, Xin Huang, Supriya Sangondimath, Jianwu Wang, and Zhibo Zhang. 2021. "Efficient and Flexible Aggregation and Distribution of MODIS Atmospheric Products Based on Climate Analytics as a Service Framework" Remote Sensing 13, no. 17: 3541. https://doi.org/10.3390/rs13173541
APA StyleZheng, J., Huang, X., Sangondimath, S., Wang, J., & Zhang, Z. (2021). Efficient and Flexible Aggregation and Distribution of MODIS Atmospheric Products Based on Climate Analytics as a Service Framework. Remote Sensing, 13(17), 3541. https://doi.org/10.3390/rs13173541