Abstract
Using the Barnes-Hut algorithm as an example we deal with the design of parallel algorithms that are able to exploit multicore CPUs and GPUs conjointly. Specifically, we demonstrate how to modularize a parallel application according to specific aspects of parallel execution. This allows for a flexible assignment of individual modules to the two parallel architectures based on their actual performance characteristics. Furthermore, we discuss a hybrid module for the most time consuming part of the algorithm that utilizes CPU and GPU simultaneously employing a novel load balancing heuristic. Our experimental evaluation shows that our method greatly increases overall efficiency by allowing to deploy the optimal configuration of modules for each individual computer system.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Anderson, D.P.: Boinc: A system for public-resource computing and storage. In: 5th IEEE/ACM International Workshop on Grid Computing, pp. 4–10 (2004)
Barnes, J.E., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324(6096), 446–449 (1986)
Barnes, J.E.: A modified tree code: Don’t laugh; it runs. Journal of Computational Physics 87(1), 161–170 (1990)
Bédorf, J., Gaburov, E., Zwart, S.P.: A sparse octree gravitational n-body code that runs entirely on the GPU processor. Journal of Computational Physics 231(7), 2825–2839 (2012)
Billings, J.J.: Starscream – An open source galaxy modeling and simulation tool, http://code.google.com/p/starscream/ (accessed in February 2013)
Blochinger, W., Dangelmayr, C., Schulz, S.: Aspect-oriented parallel discrete optimization on the cohesion desktop grid platform. In: Proc. of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), pp. 49–56 (2006)
Burtscher, M., Pingali, K.: An efficient CUDA implementation of the tree-based Barnes Hut n-body algorithm. In: Hwu, W.-M.W. (ed.) GPU Computing Gems Emerald Edition, ch. 6, pp. 75–92. Morgan Kaufmann Publishers Inc. (2011)
Gaburov, E., Bédorf, J., Zwart, S.P.: Gravitational tree-code on graphics processing units: Implementation in CUDA. Procedia CS 1(1), 1119–1127 (2010)
Hannak, H., Blochinger, W., Trieflinger, S.: A desktop grid enabled parallel Barnes-hut algorithm. In: Proceedings of the 31st IEEE International Performance Computing and Communications Conference (IPCCC 2012), pp. 120–129 (2012)
Heien, E., Kondo, D., Anderson, D.: A correlated resource model of internet end hosts. IEEE Transactions on Parallel and Distributed Systems 23(6), 977–984 (2012)
Jetley, P., Wesolowski, L., Gioachin, F., Kalé, L.V., Quinn, T.R.: Scaling hierarchical n-body simulations on GPU clusters. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)
Keckeisen, M., Blochinger, W.: Parallel implicit integration for cloth animations on distributed memory architectures. In: Proc. of Eurographics Symposium on Parallel Graphics and Visualization 2004, pp. 119–126 (2004)
Makino, J.: Vectorization of a treecode. Journal of Computational Physics 87(1), 148–160 (1990)
Meißner, M., Hüttner, T., Blochinger, W., Weber, A.: Parallel direct volume rendering on PC networks. In: Arabnia, H.R. (ed.) Proc. of the Intl. Conf. on Parallel and Distributed Processing Techniques and Applications, PDPTA 1998. CSREA Press (1998)
Nakasato, N.: Implementation of a parallel tree method on a GPU. Journal of Computational Science 3(3), 132–141 (2012)
Schulz, S., Blochinger, W., Held, M., Dangelmayr, C.: Cohesion - a microkernel based desktop grid platform for irregular task-parallel applications. Future Generation Computer Systems - The International Journal of Grid Computing: Theory, Methods and Applications 24(5), 354–370 (2008)
Singh, J.P., Holt, C., Totsuka, T., Gupta, A., Hennessy, J.: Load balancing and data locality in adaptive hierarchical n-body methods: Barnes-hut, fast multipole, and radiosity. Journal of Parallel and Distributed Computing 27(2), 118–141 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hannak, H., Hochstetter, H., Blochinger, W. (2013). A Hybrid Parallel Barnes-Hut Algorithm for GPU and Multicore Architectures. In: Wolf, F., Mohr, B., an Mey, D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_57
Download citation
DOI: https://doi.org/10.1007/978-3-642-40047-6_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40046-9
Online ISBN: 978-3-642-40047-6
eBook Packages: Computer ScienceComputer Science (R0)