[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Scaling Properties of Common Statistical Operators for Gridded Datasets

Published: 01 November 2007 Publication History

Abstract

An accurate cost model that accounts for dataset size and structure can help optimize geoscience data analysis. We develop and apply a computational model to estimate data analysis costs for arithmetic operations on gridded datasets typical of satellite- or climate model-origin. For these dataset geometries our model predicts data reduction scalings that agree with measurements of widely used geoscience data processing software, the netCDF Operators (NCO). I/O performance and library design dominate throughput for simple analysis (e.g. dataset differencing). Dataset structure can reduce analysis throughput ten-fold relative to same-sized unstructured datasets. We demonstrate algorithmic optimizations which substantially increase throughput for more complex, arithmetic-dominated analysis such as weighted-averaging of multi-dimensional data. These scaling properties can help to estimate costs of distribution strategies for data reduction in cluster and grid environments.

References

[1]
Browne, S., Dongarra, J., Garner, N., Ho, G., and Mucci, P. (2000). A portable programming interface for performance evaluation on modern processors, Int. J. High Perform. Comput. Appl., 14(3): 189—204 .
[2]
Chen, L. and Agrawal, G. (2004). Resource allocation in a middleware for streaming data, in Proceedings of the 2nd Workshop on Middleware for Grid Computing, pp. 5—10, New York, NY: ACM Press.
[3]
Collins, W.D., Rasch, P.J., Boville, B.A., Hack, J.J., McCaa, J.R., Williamson, D.L., Briegleb, B.P., Bitz, C.M., Lin, S.-J., and Zhang, M. (2006). The formulation and atmospheric simulation of the Community Atmosphere Model: CAM3, J. Climate, 19(11): 2144—2161 .
[4]
Cornillon, P., Gallagher, J., and Sgouros, T. (2003). OPeNDAP: Accessing data in a distributed heterogeneous environment, Data Science Journal, 2: 164—174 .
[5]
Craig, A.P., Jacob, R., Kauffman, B., Bettge, T., Larson, J., Ong, E., Ding, C., and He, Y. (2005). CPL6: The new extensible, high performance parallel coupler for the Community Climate System Model, Int. J. High Perform. Comput. Appl., 19(3): 309—327 .
[6]
Cubasch, U. and Meehl, G. (2001). Projections of future climate change, in Climate Change 2001: The Scientific Basis. Contribution of Working Group I to the Third Assessment Report of the Intergovernmental Panel on Climate Change, J. T. Houghton, Y. Ding, D. J. Griggs, M. Noguer, P. J. van der Linden, X. Dai, K. Maskell, and C. A. Johnson, editors, chapter 9, pp. 527—578, Cambridge, UK, and New York, NY: Cambridge Univ. Press .
[7]
Drake, J.B., Jones, P.W., and Carr, Jr., G.R. (2005). Overview of the software design of the Community Climate System Model, Int. J. High Perform. Comput. Appl., 19(3): 177—186 .
[8]
Fiorino, M. and Williams, D. (2002). The PCMDI Climate Data Analysis Tools (CDAT)—an open system approach to the implementation of a model diagnosis infrastructure, in Proceedings of the 18th International Conference on Interactive Information and Processing Systems for Meteorology, page J3.22, January 11—15, Seattle, WA. Boston, MA: American Meteorological Society, AMS Press.
[9]
Foster, I., Alpert, E., Chervenak, A., Drach, B., Kesselman, C., Nefedova, V., Middleton, D., Shoshani, A., Sim, A., and Williams, D. (2002). The Earth System Grid II: Turning climate datasets into community resources, in Proceedings of the 18th International Conference on Interactive Information and Processing Systems for Meteorology, January 11—15, Seattle, WA. Boston, MA: American Meteorological Society, AMS Press.
[10]
Gregory, J. (2003). The CF metadata standard, CLIVAR Exchanges, 8(4): 4 .
[11]
Gropp, W., Huss-Lederman, S., Lumsdaine, A., Lusk, E., Nitzberg, B., Saphir, W., and Snir, M. 1998. MPI: The Complete Reference. Volume 2, The MPI-2 Extensions, Cambridge, MA: MIT Press .
[12]
Jacob, R., Larson, J., and Ong, E. (2005). M × N communication and parallel interpolation in Community Climate System Model version 3 using the Model Coupling Toolkit, Int. J. High Perform. Comput. Appl., 19(3): 309—327 .
[13]
Li, J., keng Liao, W., Choudhary, A., Ross, R., Thakur, R., Gropp, W., Latham, R., Siegel, A., Gallagher, B., and Zingale, M. (2003). Parallel netCDF: A high-performance scientific I/O interface, in Proceedings of the 2003 ACM/ IEEE Conference on Supercomputing, pp. 39—49, November 15—21, Phoenix, AZ. Washington, DC: Association for Computing Machinery, IEEE Computer Society.
[14]
Mellor-Crummey, J., Fowler, R.J., Marin, G., and Tallent, N. (2002). HPCVIEW: A tool for top-down analysis of node performance, J. Supercomput., 23(1): 81—104 .
[15]
Nrc (2001). Grand Challenges in Environmental Sciences, National Research Council, Washington, DC: National Academy Press .
[16]
Nsf (2003). Revolutionizing Science and Engineering Through Cyber-Infrastructure, D. E. Atkins, Ed., Number NSF 03-2. National Science Foundation, Arlington, VA,
[17]
Rew, R. and Davis, G. (1990). NetCDF: an interface for scientific data access, IEEE Comput. Graph. Appl., 10(4): 76— 82 .
[18]
Rew, R., Hartnett, E., and Caron, J. (2006). NetCDF-4: Software implementing an enhanced data model for the geosciences, in Proceedings of the 22nd AMS Conference on Interactive Information and Processing Systems for Meteorology, page 6.6, Boston, MA: American Meteorological Society, AMS Press.
[19]
Ucar (2005). Establishing a Petascale Collaboratory for the Geosciences: Scientific Frontiers. A Report to the Geosciences Community, University Corporation for Atmospheric Research/JOSS, Boulder, CO, Ad Hoc Committee and Technical Working Group for a Petascale Collaboratory for the Geosciences .
[20]
Woolf, A., Haines, K., and Liu, C. (2003). A web service model for climate data access on the grid, Int. J. High Perform. Comput. Appl., 17(3): 281—295 .
[21]
Zender, C.S. (n.d.). Analysis of self-describing gridded geoscience data with netcdf operators (NCO), Submitted to Environ. Modell. Softw. Available from http://dust.ess.uci.edu/ppr/ppr_Zen07.pdf .

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications  Volume 21, Issue 4
November 2007
111 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 November 2007

Author Tags

  1. computational model
  2. data analysis
  3. geoscience
  4. netCDF
  5. scaling

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media