Abstract
Random forests and decision trees are increasingly interesting candidates for resource-constrained machine learning models. In order to make the execution of these models efficient under resource limitations, various optimized implementations have been proposed in the literature, usually implementing either native trees or if-else trees. While a certain motivation for the optimization of if-else trees is to benefit the behavior of dedicated instruction caches, in this work we highlight that if-else trees might also strongly depend on data caches. We identify one crucial issue of if-else tree implementations and propose an optimized implementation, which keeps the logic tree structure untouched and thus does not influence the accuracy, but eliminates the need to load comparison values from the data caches. Experimental evaluation of this implementation shows that we can greatly reduce the amount of data cache misses by up to \(\approx 99\%\), while not increasing the amount of instruction cache misses in comparison to the state-of-the-art. We additionally highlight various scenarios, where the reduction of data cache misses draws important benefit on the allover execution time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This observation is not necessarily bounded to the X86 architecture, the ARMv8 architecture neither does offer such a feature.
- 2.
The optimization targets to optimize the memory layout, such that cache misses are reduced. Thus, effects likely only can be observed when the model size exceeds the cache capacity, which is only for larger models the case.
- 3.
- 4.
Our generator also supports double precision floating points; the code is generated accordingly on demand.
- 5.
The ARMv8 system we use does not allow tracking of icache misses with perf.
References
Asadi, N., Lin, J., de Vries, A.P.: Runtime optimizations for tree-based machine learning models. IEEE Trans. Knowl. Data Engi. 26(9), 2281–2292 (2014)
Buschjäger, S., Morik, K.: Decision tree and random forest implementations for fast filtering of sensor data. IEEE Trans. Circuits Syst. I: Regul. Pap. 65, 1–14 (2017). https://doi.org/10.1109/TCSI.2017.2710627
Buschjäger, S., Chen, K.H., Chen, J.J., Morik, K.: Realization of random forest for real-time evaluation through tree framing. In: 2018 IEEE International Conference on Data Mining (2018). https://doi.org/10.1109/ICDM.2018.00017
Chen, K.H., et al.: Efficient realization of decision trees for real-time inference. ACM Trans. Embed. Comput. Syst. (TECS) 21, 1–26 (2022)
Dato, D., et al.: Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. ACM Trans. Inf. Syst. 35, 1–31 (2016)
Dua, D., Graff, C.: UCI machine learning repository (2017). https://archive.ics.uci.edu/ml/index.php
Kim, C., et al.: FAST: Fast architecture sensitive tree search on modern CPUs and GPUs. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM (2010)
Lucchese, C., Nardini, F., Orlando, S., Perego, R., Tonellotto, N., Venturini, R.: Quickscorer: a fast algorithm to rank documents with additive ensembles of regression trees. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 73–82. ACM (2015)
Lucchese, C., Perego, R., Nardini, F.M., Tonellotto, N., Orlando, S., Venturini, R.: Exploiting CPU SIMD extensions to speed-up document scoring with tree ensembles. In: SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (2016). https://doi.org/10.1145/2911451.2914758
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(85), 2825–2830 (2011)
Prenger, R., Chen, B., Marlatt, T., Merl, D.: Fast map search for compact additive tree ensembles (cate). Technical report, Lawrence Livermore National Laboratory (LLNL), Livermore, CA (2013)
Van Essen, B., Macaraeg, C., Gokhale, M., Prenger, R.: Accelerating a random forest classifier: multi-core, GP-GPU, or FPGA? In: 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 232–239. IEEE (2012)
Ye, T., Zhou, H., Zou, W.Y., Gao, B., Zhang, R.: RapidScorer: Fast tree ensemble evaluation by maximizing compactness in data level parallelization. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (2018). https://doi.org/10.1145/3219819.3219857
Acknowledgement
This work has been supported by Deutsche Forschungsgemeinschaft (DFG) within the project OneMemory (project number 405422836), the SFB876 A1 (project number 124020371), and Deutscher Akademischer Austauschdienst (DAAD) within the Programme for Project-Related Personal Exchange (PPP) (project number 57559723).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hakert, C., Chen, KH., Chen, JJ. (2023). Immediate Split Trees: Immediate Encoding of Floating Point Split Values in Random Forests. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13717. Springer, Cham. https://doi.org/10.1007/978-3-031-26419-1_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-26419-1_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26418-4
Online ISBN: 978-3-031-26419-1
eBook Packages: Computer ScienceComputer Science (R0)