• Alaei M and Yazdanpanah F. (2024). A Survey on Heterogeneous CPU–GPU Architectures and Simulators. Concurrency and Computation: Practice and Experience. 10.1002/cpe.8318. 37:1. Online publication date: 10-Jan-2025.

    https://onlinelibrary.wiley.com/doi/10.1002/cpe.8318

  • Gralka P, Müller C, Heinemann M, Reina G, Weiskopf D and Ertl T. (2024). Power overwhelming: the one with the oscilloscopes. Journal of Visualization. 27:6. (1171-1193). Online publication date: 1-Dec-2024.

    https://doi.org/10.1007/s12650-024-01001-0

  • Alavani G, Desai J, Saha S and Sarkar S. (2023). Program Analysis and Machine Learning–based Approach to Predict Power Consumption of CUDA Kernel. ACM Transactions on Modeling and Performance Evaluation of Computing Systems. 8:4. (1-24). Online publication date: 31-Dec-2024.

    https://doi.org/10.1145/3603533

  • Saed M, Chou Y, Liu L, Nowicki T and Aamodt T. Vulkan-Sim: A GPU Architecture Simulator for Ray Tracing. Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture. (263-281).

    https://doi.org/10.1109/MICRO56248.2022.00027

  • Muller C, Heinemann M, Weiskopf D and Ertl T. (2022). Power Overwhelming: Quantifying the Energy Cost of Visualisation 2022 IEEE Evaluation and Beyond - Methodological Approaches for Visualization (BELIV). 10.1109/BELIV57783.2022.00009. 979-8-3503-9629-4. (38-46).

    https://ieeexplore.ieee.org/document/9978510/

  • Gubran A and Aamodt T. Emerald. Proceedings of the 46th International Symposium on Computer Architecture. (169-182).

    https://doi.org/10.1145/3307650.3322221

  • Kenzel M, Kerbl B, Tatzgern W, Ivanchenko E, Schmalstieg D and Steinberger M. (2018). On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing. Proceedings of the ACM on Computer Graphics and Interactive Techniques. 1:2. (1-17). Online publication date: 24-Aug-2018.

    https://doi.org/10.1145/3233303

  • Kerbl B, Kenzel M, Ivanchenko E, Schmalstieg D and Steinberger M. (2018). Revisiting The Vertex Cache. Proceedings of the ACM on Computer Graphics and Interactive Techniques. 1:2. (1-16). Online publication date: 24-Aug-2018.

    https://doi.org/10.1145/3233302

  • Bridges R, Imam N and Mintz T. (2016). Understanding GPU Power. ACM Computing Surveys. 49:3. (1-27). Online publication date: 13-Dec-2016.

    https://doi.org/10.1145/2962131

  • Jung Y and Carloni L. ΣVP. Proceedings of the 52nd Annual Design Automation Conference. (1-6).

    https://doi.org/10.1145/2744769.2744913

  • Ma J, Yu L, Ye J and Chen T. (2014). MCMG simulator. Journal of Computer and System Sciences. 81:1. (57-71). Online publication date: 15-Feb-2015.

    https://doi.org/10.1016/j.jcss.2014.06.017

  • Guerrero G, Cebrián J, Pérez-Sánchez H, García J, Ujaldón M and Cecilia J. (2014). Toward energy efficiency in heterogeneous processors. Concurrency and Computation: Practice & Experience. 26:10. (1832-1846). Online publication date: 1-Jul-2014.

    https://doi.org/10.1002/cpe.3119

  • Pathania A, Jiao Q, Prakash A and Mitra T. Integrated CPU-GPU Power Management for 3D Mobile Games. Proceedings of the 51st Annual Design Automation Conference. (1-6).

    https://doi.org/10.1145/2593069.2593151

  • Lama P, Li Y, Aji A, Balaji P, Dinan J, Xiao S, Zhang Y, Feng W, Thakur R and Zhou X. pVOCL. Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems. (145-154).

    https://doi.org/10.1109/ICDCS.2013.51

  • Nath R, Carmean D and Rosing T. (2013). Power modeling and thermal management techniques for manycores 2013 IEEE Symposium on Computers and Communications (ISCC). 10.1109/ISCC.2013.6755037. 978-1-4799-3755-4. (000740-000746).

    http://ieeexplore.ieee.org/document/6755037/

  • Arnau J, Parcerisa J and Xekalakis P. TEAPOT. Proceedings of the 27th international ACM conference on International conference on supercomputing. (37-46).

    https://doi.org/10.1145/2464996.2464999

  • Bakos J and Gao Y. Sparse matrix-vector multiply on the Texas Instruments C6678 Digital Signal Processor. Proceedings of the 2013 IEEE 24th International Conference on Application-specific Systems, Architectures and Processors (ASAP). (168-174).

    https://doi.org/10.1109/ASAP.2013.6567571

  • Wang B and Yu W. Performance and Power Simulation for Versatile GPGPU Global Memory. Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum. (2254-2257).

    https://doi.org/10.1109/IPDPSW.2013.70

  • Song S, Su C, Rountree B and Cameron K. A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures. Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. (673-686).

    https://doi.org/10.1109/IPDPS.2013.73

  • Ma X, Deng Z, Dong M and Zhong L. (2013). Characterizing the Performance and Power Consumption of 3D Mobile Games. Computer. 46:4. (76-82). Online publication date: 1-Apr-2013.

    https://doi.org/10.1109/MC.2012.190

  • Issa J and Figueira S. (2012). Graphics Processor performance analysis for 3D applications 2012 2nd International Conference on Advances in Computational Tools for Engineering Applications (ACTEA). 10.1109/ICTEA.2012.6462881. 978-1-4673-2489-2. (269-272).

    http://ieeexplore.ieee.org/document/6462881/

  • Jararweh Y and Hariri S. (2012). Power and Performance Management of GPUs Based Cluster. International Journal of Cloud Applications and Computing. 2:4. (16-31). Online publication date: 1-Oct-2012.

    https://doi.org/10.4018/ijcac.2012100102

  • Arnau J, Parcerisa J and Xekalakis P. (2012). Boosting mobile GPU performance with a decoupled access/execute fragment processor. ACM SIGARCH Computer Architecture News. 40:3. (84-93). Online publication date: 5-Sep-2012.

    https://doi.org/10.1145/2366231.2337169

  • Johnsson B, Ganestam P, Doggett M and Akenine-Möller T. Power efficiency for software algorithms running on graphics processors. Proceedings of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics. (67-75).

    /doi/10.5555/2383795.2383806

  • Arnau J, Parcerisa J and Xekalakis P. Boosting mobile GPU performance with a decoupled access/execute fragment processor. Proceedings of the 39th Annual International Symposium on Computer Architecture. (84-93).

    /doi/10.5555/2337159.2337169

  • Arnau J, Parcerisa J and Xekalakis P. (2012). Boosting mobile GPU performance with a decoupled access/execute fragment processor 2012 ACM/IEEE 39th International Symposium on Computer Architecture (ISCA). 10.1109/ISCA.2012.6237008. 978-1-4673-0476-4. (84-93).

    http://ieeexplore.ieee.org/document/6237008/

  • Cebrín J, Guerrero G and Garcia J. Energy Efficiency Analysis of GPUs. Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. (1014-1022).

    https://doi.org/10.1109/IPDPSW.2012.124

  • Jia W, Shaw K and Martonosi M. Stargazer. Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software. (2-13).

    https://doi.org/10.1109/ISPASS.2012.6189201

  • Jararweh Y, Alzubi S and Hariri S. (2011). An optimal multi-processor allocation algorithm for high performance GPU accelerators 2011 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT). 10.1109/AEECT.2011.6132516. 978-1-4577-1084-1. (1-6).

    http://ieeexplore.ieee.org/document/6132516/

  • Matsumoto T, Yamaguchi S and Sakai T. A Study on Improving Power-Consumption Performance Ratio in GPGPU Computing. Proceedings of the 2011 Second International Conference on Networking and Computing. (288-290).

    https://doi.org/10.1109/ICNC.2011.53

  • Wang P, Yang C, Chen Y and Cheng Y. (2011). Power gating strategies on GPUs. ACM Transactions on Architecture and Code Optimization. 8:3. (1-25). Online publication date: 1-Oct-2011.

    https://doi.org/10.1145/2019608.2019612

  • Meng J and Skadron K. (2011). A reconfigurable simulator for large-scale heterogeneous multicore architectures Software (ISPASS). 10.1109/ISPASS.2011.5762722. 978-1-61284-367-4. (119-120).

    http://ieeexplore.ieee.org/document/5762722/

  • Silpa B, Krishnaiah G and Panda P. Rank based dynamic voltage and frequency scaling fortiled graphics processors. Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis. (3-12).

    https://doi.org/10.1145/1878961.1878965

  • Pool J, Lastra A and Singh M. (2010). An energy model for graphics processing units 2010 IEEE International Conference on Computer Design (ICCD 2010). 10.1109/ICCD.2010.5647678. 978-1-4244-8936-7. (409-416).

    http://ieeexplore.ieee.org/document/5647678/

  • Collange S, Daumas M, Defour D and Parello D. (2010). Barra: A Parallel Functional Simulator for GPGPU Simulation of Computer and Telecommunication Systems (MASCOTS). 10.1109/MASCOTS.2010.43. 978-1-4244-8181-1. (351-360).

    http://ieeexplore.ieee.org/document/5581577/

  • Hong S and Kim H. (2010). An integrated GPU power and performance model. ACM SIGARCH Computer Architecture News. 38:3. (280-289). Online publication date: 19-Jun-2010.

    https://doi.org/10.1145/1816038.1815998

  • Hong S and Kim H. An integrated GPU power and performance model. Proceedings of the 37th annual international symposium on Computer architecture. (280-289).

    https://doi.org/10.1145/1815961.1815998

  • Wu J, Pan X, Liu G and Yang X. SEMCS. Proceedings of the 2009 WASE International Conference on Information Engineering - Volume 02. (192-196).

    https://doi.org/10.1109/ICIE.2009.220

  • Fung W, Sham I, Yuan G and Aamodt T. (2009). Dynamic warp formation. ACM Transactions on Architecture and Code Optimization. 6:2. (1-37). Online publication date: 1-Jun-2009.

    https://doi.org/10.1145/1543753.1543756

  • Bakhoda A, Yuan G, Fung W, Wong H and Aamodt T. (2009). Analyzing CUDA workloads using a detailed GPU simulator Software (ISPASS). 10.1109/ISPASS.2009.4919648. 978-1-4244-4184-6. (163-174).

    http://ieeexplore.ieee.org/document/4919648/

  • Po-Han Wang , Yen-Ming Chen , Chia-Lin Yang and Yu-Jung Cheng . A Predictive Shutdown Technique for GPU Shader Processors. IEEE Computer Architecture Letters. 10.1109/L-CA.2009.1. 8:1. (9-12).

    http://ieeexplore.ieee.org/document/4758617/

  • Silpa B, Patney A, Krishna T, Panda P and Visweswaran G. Texture filter memory. Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design. (559-564).

    /doi/10.5555/1509456.1509581

  • Silpa B, Patney A, Krishna T, Panda P and Visweswaran G. (2008). Texture Filter Memory — a power-efficient and scalable texture memory architecture for mobile graphics processors 2008 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 10.1109/ICCAD.2008.4681631. 978-1-4244-2819-9. (559-564).

    http://ieeexplore.ieee.org/document/4681631/

  • Kyöstilä S, Kangas K and Pulli K. Tracy. Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware. (1-11).

    /doi/10.5555/1413957.1413959

  • Nam B, Lee J, Kim K, Lee S and Yoo H. A low-power handheld GPU using logarithmic arithmetic and triple DVFS power domains. Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware. (73-80).

    /doi/10.5555/1280094.1280106

  • Dale K, Sheaffer J, Vijay Kumar V, Luebke D, Humphreys G and Skadron K. (2007). Small-scale reconfigurability for improved performance and double-precision in graphics hardware. International Journal of Electronics. 10.1080/00207210701308500. 94:5. (549-561). Online publication date: 1-May-2007.

    http://www.tandfonline.com/doi/abs/10.1080/00207210701308500

  • Lee W, Park W, Srini V and Han T. (2007). Simulation and development environment for mobile 3D graphics architectures. IET Computers & Digital Techniques. 10.1049/iet-cdt:20050205. 1:5. (501).

    http://digital-library.theiet.org/content/journals/10.1049/iet-cdt_20050205

  • Shi W, Lee H, Yoo R and Boldyreva A. A digital rights enabled graphics processing system. Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware. (17-26).

    https://doi.org/10.1145/1283900.1283903

  • Mochocki B, Lahiri K, Cadambi S and Hu X. Signature-based workload estimation for mobile 3D graphics. Proceedings of the 43rd annual Design Automation Conference. (592-597).

    https://doi.org/10.1145/1146909.1147062

  • Mochocki B, Lahiri K and Cadambi S. Power analysis of mobile 3D graphics. Proceedings of the conference on Design, automation and test in Europe: Proceedings. (502-507).

    /doi/10.5555/1131481.1131617

  • del Barrio V, Gonzalez C, Roca J and Fernandez A. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures 2006 IEEE International Symposium on Performance Analysis of Systems and Software. 10.1109/ISPASS.2006.1620807. 1-4244-0186-0. (231-241).

    http://ieeexplore.ieee.org/document/1620807/

  • Mochocki B, Lahiri K and Cadambi S. (2006). Power Analysis of Mobile 3D Graphics 2006 Design, Automation and Test in Europe. 10.1109/DATE.2006.243859. 3-9810801-1-4. (1-6).

    http://ieeexplore.ieee.org/document/1656933/

  • Dale K, Sheaffer J, Kumar V, Luebke D, Humphreys G and Skadron K. (2006). Applications of Small-Scale Reconfigurability to Graphics Processors. Reconfigurable Computing: Architectures and Applications. 10.1007/11802839_14. (99-108).

    http://link.springer.com/10.1007/11802839_14

  • Moya V, González C, Roca J, Fernández A and Espasa R. A single (unified) shader GPU microarchitecture for embedded systems. Proceedings of the First international conference on High Performance Embedded Architectures and Compilers. (286-301).

    https://doi.org/10.1007/11587514_19

  • Moya V, Gonzalez C, Roca J, Fernandez A and Espasa R. Shader Performance Analysis on a Modern GPU Architecture. Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture. (355-364).

    https://doi.org/10.1109/MICRO.2005.30

  • Sheaffer J, Skadron K and Luebke D. Fine-grained graphics architectural simulation with Qsilver. ACM SIGGRAPH 2005 Posters. (118-es).

    https://doi.org/10.1145/1186954.1187089

  • Sheaffer J, Skadron K and Luebke D. Studying Thermal Management for Graphics-Processor Architectures. Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005. (54-65).

    https://doi.org/10.1109/ISPASS.2005.1430559

  • Tack N, Lafruit G, Catthoor F and Lauwereins R. A content quality driven energy management system for mobile 3D graphics IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.. 10.1109/SIPS.2005.1579879. 0-7803-9333-3. (278-283).

    http://ieeexplore.ieee.org/document/1579879/