More Web Proxy on the site http://driver.im/

research-article

Hardware/software co-design for energy-efficient seismic modeling

Authors:

David Donofrio,

Marghoob Mohiyuddin,

Samuel Williams,

Franz-Josef PfreundAuthors Info & Claims

SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 73, Pages 1 - 12

https://doi.org/10.1145/2063384.2063482

Published: 12 November 2011 Publication History

Abstract

Reverse Time Migration (RTM) has become the standard for high-quality imaging in the seismic industry. RTM relies on PDE solutions using stencils that are 8^th order or larger, which require large-scale HPC clusters to meet the computational demands. However, the rising power consumption of conventional cluster technology has prompted investigation of architectural alternatives that offer higher computational efficiency. In this work, we compare the performance and energy efficiency of three architectural alternatives -- the Intel Nehalem X5530 multicore processor, the NVIDIA Tesla C2050 GPU, and a general-purpose manycore chip design optimized for high-order wave equations called "Green Wave." We have developed an FPGA-accelerated architectural simulation platform to accurately model the power and performance of the Green Wave design. Results show that across highly-tuned high-order RTM stencils, the Green Wave implementation can offer up to 8x and 3.5x energy efficiency improvement per node respectively, compared with the Nehalem and GPU platforms. These results point to the enormous potential energy advantages of our hardware/software co-design methodology.

References

[1]

M. Araya-Polo, F. Rubio, M. Hanzich, R. de la Cruz, J. M. Cela, and D. P. Scarpazza. High-performance seismic acoustic imaging by reverse-time migration on the cell/b.e. architecture. In ISCA2008, 2008.

[2]

Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick. The landscape of parallel computing research: A view from Berkeley. Technical Report UCB/EECS-2006-183 (http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html), EECS Department, University of California, Berkeley, December 2006.

[3]

Satish Balay, Kris Buschelman, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Barry F. Smith, and Hong Zhang. PETSc Web page, 2009. http://www.mcs.anl.gov/petsc.

[4]

James Balfour and William J. Dally. Design tradeoffs for tiled cmp on-chip networks. In ICS '06: Proceedings of the 20th annual international conference on Supercomputing, pages 187--198, New York, NY, USA, 2006. ACM.

Digital Library

[5]

Edip Baysal, Dan D Kosloff, and John W. C. Sherwood. Reverse time migration. In Geophysics, volume 48(11), 1983.

[6]

Berkeley Wireless Research Center. RAMP Homepage. http://ramp.eecs.berkeley.edu/.

[7]

Cadence Inc. Denali DDR3 memory controller IPt. Whitepaper, April 2011. http://www.cadence.com/solutions/dip/memorystorage/ddr_cntrl_ip/Pages/default.aspxl.

[8]

Robert G. Clapp, Haohuan Fu, and Olav Lindtjorn. Selecting the right hardware for reverse time migration. The Leading Edge, 29(1), 2010.

[9]

Kaushik Datta, Shoaib Kamil, Samuel Williams, Leonid Oliker, John Shalf, and Katherine Yelick. Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Review, 51(1):129--159, 2009.

Digital Library

[10]

Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, et al. Stencil Computation Optimization and Auto-Tuning on State-of-the-art Multicore Architectures. In Proceedings SC '08, pages 1--12, Piscataway, NJ, USA, 2008. IEEE Press.

Digital Library

[11]

D. Donofrio, L. Oliker, J. Shalf, M. Wehner, C. Rowen, J. Krueger, S. Kamil, and M. Mohiyuddin. Energy-efficient computing for extreme-scale science. In IEEE Computer, 2009.

Digital Library

[12]

James P. Durbano, Fernando E. Ortiz, John R. Humphrey, Mark S. Mirotznik, and Dennis W. Prather. Hardware implementation of a three dimensional finite-difference time-domain algorithm. In IEEE Antennas and Wireless Propagation Letters, volume 2, 2003.

[13]

Eric Dussaud, William W. Symes, Paul Williamson, Larent Lemaistre, Paul Singer, Bertrand Denel, and Adam Cherrett. Computational strategies for reverse-time migration. In SEG, Las Vegas, 2008.

[14]

Peter Kogge et al. Exascale computing study: Technology challenges in achieving exascale systems. http://users.ece.gatech.edu/~mrichard/ExascaleComputingStudyReports/exascale_final_report_100208.pdf, 2008.

[15]

Darren Foltinek, Daniel Eaton, Jeff Mahovsky, Peyman Moghaddam, and Ray McGarry. Industrial-scale reverse time migration on gpu hardware. In SEG Houston International Exposition, 2009.

[16]

Haohuan Fu, William Osborne, Robert G. Clapp, Oskar-Mencer, and Wayne Luk. Accelerating seismic computations using customized number representations on fpgas. EURASIP Journal on Embedded Systems, 2008.

Digital Library

[17]

Chuan He, Mi Lu, and Chuanwen Sun. Accelerating seismic migration using fpga-based coprocessor platform. In IEEE Symposium on Field-Programmable Custom Computing Machines, 2004.

Digital Library

[18]

Gilbert Hendry, Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf, et al. Silicon nanophotonic network-on-chip using tdm arbitration. High-Performance Interconnects, Symposium on, 0:88--95, 2010.

Digital Library

[19]

Gilbert Hendry, Shoaib Kamil, and Aleksandr Biberman. Analysis of photonic networks for a chip multiprocessor using scientific applications. In NOCS, pages 104--113, 2009.

Digital Library

[20]

Gilbert Hendry, Shoaib Kamil, Aleksandr Biberman, Johnnie Chan, Benjamin G. Lee, Marghoob Mohiyuddin, Ankit Jain, Keren Bergman, Luca P. Carloni, John Kubiatowicz, Leonid Oliker, and John Shalf. Analysis of photonic networks for a chip multiprocessor using scientific applications. In NOCS, pages 104--113, 2009.

Digital Library

[21]

Voltaire Inc. Datasheet for voltaire qdr infiniband switch for ibm idataplex, 2011. http://www.voltaire.com/assets/files/Voltaire_IBM_BC-H_HSSM_datasheet-WEB-070109.pdf.

[22]

Jonathan G Koomey. Worldwide electricity used in data centers. Environ. Res. Lett., 3(034008), 2008.

[23]

Jacob Leverich, Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, and Christos Kozyrakis. Comparing Memory Systems for Chip Multiprocessors. In International Symposium on Computer Architecture, 2007.

Digital Library

[24]

A. Lim, S. Liao, and M. Lam. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 2001.

Digital Library

[25]

Wei Liu and et al. Anisotropic reverse-time migration using co-processors. In SEG Houston International Exposition. SEG, 2009.

[26]

Paulius Micikevicius. 3D finite difference computation on GPUs using CUDA. In GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009.

Digital Library

[27]

Paulius Micikevicius. 3d finite difference computation on gpus using cuda. In GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pages 79--84, New York, NY, USA, 2009. ACM.

Digital Library

[28]

Paulius Micikevicius. Performance scaling of 3d finite difference computation on gpu clusters. Technical Report NVR-002-2009, NVIDIA, December 2009.

[29]

Micron Inc. Calculating Memory System Power for DDR3, June 2010. http://www.micron.com/support/dram/power_calc.html.

[30]

M. Mohiyuddin, M. Murphy, S. Williams, L. Oliker, J. Shalf, and J. Wawrzynek. A Case for Hardware/Software Co-Tuning for Power Efficient Scientific Computing. In Proc. SC'09, 2009.

[31]

E. Motuk, R. Woods, and S Bilbao. Implementation of finite difference schemes for the wave equation on fpga. Technical report, University of Belfast, 2005.

[32]

NERSC. Carver/Magellan Web page, 2010. http://www.nersc.gov/systems/carver-ibm-idataplex/.

[33]

Francisco Ortigosa, Mauricio Araya-Polo, Felix Rubio, Mauricio Hanzich, Raul de la Cruz, and Jose Maria Cela. Evaluation of 3d rtm on hpc platforms. In Barcelona Supercomputing Center, editor, SEG, 2008.

[34]

Ramesh Radhakrishnan, Rizwan Ali, and Vishvesh Sahasrabudhe. Evaluating energy efficiency in infiniband-based dell poweredge energy smart clusters. Dell Power Solutions, February 2008.

[35]

G. Rivera and C. Tseng. Tiling optimizations for 3D scientific computations. In Proceedings of SC'00, Dallas, TX, November 2000. Supercomputing 2000.

Digital Library

[36]

S. Sellappa and S. Chatterjee. Cache-efficient multigrid algorithms. International Journal of High Performance Computing Applications, 18(1):115--133, 2004.

Digital Library

[37]

John Shalf, David Donofrio, Curtis Janssen, and Dan Quinlan. CoDEx Web page, 2011. http://www.nersc.gov/projects/CoDEx.

[38]

David E. Shaw, Ron O. Dror, John K. Salmon, et al. Millisecond-scale molecular dynamics simulations on anton. In Proceedings SC'09, pages 1--11, New York, NY, USA, 2009. ACM.

Digital Library

[39]

Silicon Creations Inc. Si Creations Programmable PLL IP product. Whitepaper, April 2008. http://www.siliconcr.com/news.html.

[40]

William W. Symes. Reverse time migration with optimal checkpointing. In SEG, 2007.

[41]

Tensilica Inc. Xtensa Architecture and Performance. Whitepaper, October 2005. http://www.tensilica.com/pdf/xtensa_arch_white_paper.pdf.

[42]

Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, HP Labs, 2008.

[43]

Reverse time migration with random boundaries. Reverse time migration with random boundaries. In SEG, 2009.

[44]

D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob. Dramsim: a memory system simulator. SIGARCH Comput. Archit. News, 33(4):100--107, 2005.

Digital Library

[45]

M. Wehner, L. Oliker, and J. Shalf. Green Flash: Designing an energy efficient climate supercomputer. In IEEE Spectrum, 2009.

[46]

S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, and K. Yelick. Scientific Computing Kernels on the Cell Processor. International Journal of Parallel Programming, 35(3):263--298, 2007.

Digital Library

[47]

Öz Yilmaz. Seismic Data Analysis. Society of Exploration Geophysics, 2001.

Cited By

Zoschke KOppermann HSchiffer MNdip IBecker KAdler MGäbler AMaaß UPaulin GKocon W(2024)Key Technologies and Design Aspects for Wafer Level Packaging of High Performance Computing Modules2024 IEEE 74th Electronic Components and Technology Conference (ECTC)10.1109/ECTC51529.2024.00340(433-440)Online publication date: 28-May-2024
https://doi.org/10.1109/ECTC51529.2024.00340
Newkirk AHanus NPayne C(2024)Expert and operator perspectives on barriers to energy efficiency in data centersEnergy Efficiency10.1007/s12053-024-10244-717:6Online publication date: 17-Jul-2024
https://doi.org/10.1007/s12053-024-10244-7
Gourounas DHanindhito BFathi ATrenev DJohn LGerstlauer A(2023)FAWS: FPGA Acceleration of Large-Scale Wave Simulations2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP57973.2023.00025(76-84)Online publication date: Jul-2023
https://doi.org/10.1109/ASAP57973.2023.00025
Show More Cited By

Index Terms

Hardware/software co-design for energy-efficient seismic modeling
1. Applied computing
  1. Physical sciences and engineering
    1. Earth and atmospheric sciences
2. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
    2. Simulation types and techniques

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Performance of CPU/GPU compiler directives on ISO/TTI kernels

GPUs are slowly becoming ubiquitous devices in High Performance Computing, as their capabilities to enhance the performance per watt of compute intensive algorithms as compared to multicore CPUs have been identified. The primary shortcoming of a GPU is ...
Optimizing stencil application on multi-thread GPU architecture using stream programming model
ARCS'10: Proceedings of the 23rd international conference on Architecture of Computing Systems

With fast development of GPU hardware and software, using GPUs to accelerate non-graphics CPU applications is becoming inevitable trend. GPUs are good at performing ALU-intensive computation and feature high peak performance; however, how to harness ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

November 2011

866 pages

ISBN:9781450307710

DOI:10.1145/2063384

Conference Chair:
Scott Lathrop
University of Chicago
,
Program Chairs:
Jim Costa
Sandia National Laboratories
,
William Kramer
National Center for Supercomputing Applications

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Advanced Scientific Computing Research

Conference

SC '11

Sponsor:

SIGARCH
IEEE-CS

SC '11: International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 18, 2011

Washington, Seattle

Acceptance Rates

SC '11 Paper Acceptance Rate 74 of 352 submissions, 21%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
428
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zoschke KOppermann HSchiffer MNdip IBecker KAdler MGäbler AMaaß UPaulin GKocon W(2024)Key Technologies and Design Aspects for Wafer Level Packaging of High Performance Computing Modules2024 IEEE 74th Electronic Components and Technology Conference (ECTC)10.1109/ECTC51529.2024.00340(433-440)Online publication date: 28-May-2024
https://doi.org/10.1109/ECTC51529.2024.00340
Newkirk AHanus NPayne C(2024)Expert and operator perspectives on barriers to energy efficiency in data centersEnergy Efficiency10.1007/s12053-024-10244-717:6Online publication date: 17-Jul-2024
https://doi.org/10.1007/s12053-024-10244-7
Gourounas DHanindhito BFathi ATrenev DJohn LGerstlauer A(2023)FAWS: FPGA Acceleration of Large-Scale Wave Simulations2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP57973.2023.00025(76-84)Online publication date: Jul-2023
https://doi.org/10.1109/ASAP57973.2023.00025
Jacquelin MAraya-Polo MMeng JWolf FShende SCulhane CAlam SJagode H(2022)Scalable distributed high-order stencil computationsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571924(1-13)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571924
Jacquelin MAraya–Polo MMeng J(2022)Scalable Distributed High-Order Stencil ComputationsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00035(1-13)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00035
Haindl CLeng KNissen-Meyer T(2021)A 3D complexity-adaptive approach to explore sparsity in elastic wave propagationGEOPHYSICS10.1190/geo2020-0490.186:5(T321-T335)Online publication date: 16-Aug-2021
https://doi.org/10.1190/geo2020-0490.1
Weis CGimmler-Dumont CJung MWehn N(2020)Design of Efficient, Dependable SoCs Based on a Cross-Layer-Reliability Approach with Emphasis on Wireless Communication as Application and DRAM MemoriesDependable Embedded Systems10.1007/978-3-030-52017-5_18(435-455)Online publication date: 10-Dec-2020
https://doi.org/10.1007/978-3-030-52017-5_18
Schuiki FSchaffner MBenini L(2019)NTX: An Energy-efficient Streaming Accelerator for Floating-point Generalized Reduction Workloads in 22 nm FD-SOI2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715007(662-667)Online publication date: Mar-2019
https://doi.org/10.23919/DATE.2019.8715007
Pauget BPearce DPotanin AKell SMarr S(2018)Towards compilation of an imperative language for FPGAsProceedings of the 10th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages10.1145/3281287.3281291(47-56)Online publication date: 4-Nov-2018
https://dl.acm.org/doi/10.1145/3281287.3281291
Jagtap RJung MElsasser WWeis CHansson AWehn NJacob B(2017)Integrating DRAM power-down modes in gem5 and quantifying their impactProceedings of the International Symposium on Memory Systems10.1145/3132402.3132444(86-95)Online publication date: 2-Oct-2017
https://dl.acm.org/doi/10.1145/3132402.3132444
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten