[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Machine learning for financial transaction classification across companies using character‐level word embeddings of text fields

Published: 28 October 2021 Publication History

Abstract

An important initial step in accounting is mapping financial transfers to the corresponding accounts. We devised machine‐learning‐based systems that automate this process. They use word embeddings with character‐level features to process transaction texts. When considering 473 companies independently, our approach achieved an average top‐1 accuracy of 80.50%, outperforming baselines that exclude the transaction texts or rely on a lexical bag‐of‐words text representation. We extended the approach to generalizes across companies and even across different corporate sectors. After standardization of the account structures and careful feature engineering, a single classifier trained on 44 companies from 28 sectors achieved a test accuracy of more than 80%. When trained on 43 companies and tested on the remaining one, the system achieved an average performance of 64.62%. This rate increased to nearly 70% when considering only the largest sector.

References

[1]
Bengtsson, H., & Jansson, J. (2015). Using classification algorithms for smart suggestions in accounting systems. (Master's thesis), Chalmers University of Technology, Department of Computer Science and Engineering, Gothenberg, Sweden.
[2]
Bergdorf, J. (2018). Machine learning and rule induction in invoice processing. (Master's thesis), KTH Royal Institute of Technology, School of Electrical Engineering and Computer Science, Stockholm, Sweden.
[3]
Bhatt, H. S., Rajkumar, A., & Roy, S. (2016). Multi‐source iterative adaptation for cross‐domain classification, IJCAI'16: Proceedings of the Twenty‐Fifth International Joint Conferences on Artificial Intelligence, pp. 3691–3697.
[4]
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
[5]
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
[6]
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees, The Wadsworth and Brooks–Cole Statistics–Probability Series. Boca Raton, FL: Chapman & Hall/CRC.
[7]
Ek Folkestad, O., & NÃÿtsund Vollset, E. E. (2017). Automatic classification of bank transactions. (MSc thesis), Norwegian University of Science and Technology, Department of Computer Science, Trondheim, Norway.
[8]
European Commission (2008). Final report of the expert group: Accounting systems for small enterprises—Recommendations and good practices. Luxembourg: European Commission.
[9]
EUROSTAT (2017). EPSAS issue paper on the national approaches to harmonisation of chart of accounts. (Report EPSAS WG 17/12). Luxembourg: European Commission.
[10]
Fernández‐Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15, 3133–3181.
[11]
Folkestad, E., Vollset, E., Gallala, M. R., & Gulla, J. A. (2017). Why enriching business transactions with linked open data may be problematic in classification tasks, Knowledge engineering and semantic web. KESW 2017, pp. 347–362.
[12]
Gieseke, F., & Igel, C. (2018). Training big random forests with little resources, KDD '18: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1445–1454.
[13]
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., & Mikolov, T. (2018). Learning word vectors for 157 languages, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 3483–3487.
[14]
Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York, NY: Springer.
[15]
Inman, J. (1849). Navigation and nautical astronomy, for the use of British seamen. London, UK: F. & J. Rivington.
[16]
Jorge, S., Vaz de Lima, D., Pontoppidan, C. A., & Dabbicco, G. (2019). The role of charts of account in public sector accounting. In paper presented at II Congresso Internacional de Contabilidade PÞblica, Lisbon, Portugal.
[17]
Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., & Mikolov, T. (2016). Fasttext.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651.
[18]
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification, Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: Volume 2, short papers, pp. 427–434.
[19]
Kouw, W. M., & Loog, M. (2018). An introduction to domain adaptation and transfer learning. (Technical report). Delft, Netherlands: Delft University of Technology, Department of Intelligent Systems.
[20]
Macpherson, S. (2016). Xero's no‐code accounting—What is it and how to prepare for it. https://www.digitalfirst.com Accessed: June 21, 2020.
[21]
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[22]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems (NIPS), pp. 3111–3119.
[23]
Mui, C., & McCarthy, W. E. (1987). FSA: Applying AI techniques to the familiarization phase of financial decision making. IEEE Computer Architecture Letters, 2(3), 33–41.
[24]
Murphy, L. (2017). How algorithms will set your bookkeeping to autopilot. https://www.theglobaltreasurer.com/2017/07/12/how-algorithms-will-set-your-bookkeeping-to-autopilot. Accessed: June 21, 2020.
[25]
O'Leary, D. E., & Kandelin, N. (1992). ACCOUNTANT: A domain dependent accounting language processing system, Expert Systems in Finance, pp. 253–267.
[26]
O'Leary, D. E., & Munakata, T. (1988). Developing consolidated financial statements using a prototype expert system, Applied Expert Systems, pp. 143–157.
[27]
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
[28]
Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543.
[29]
Quiñonero‐Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (Eds.) (2009). Dataset shift in machine learning Edited by Quiñonero‐Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D., Neural Information Processing Series. Cambridge, MA: MIT Press.
[30]
Skeppe, L. (2014). Classify Swedish bank transactions with early and late fusion techniques. (Master's thesis), KTH Royal Institute of Technology, School of Computer Science and Communication (CSC), Stockholm, Sweden.
[31]
Sun, S., Shi, H., & Wu, Y. (2015). A survey of multi‐source domain adaptation. Information Fusion, 24, 84–92.
[32]
The BAS Organisation (2017). The accounting manual 2017. Stockholm, Sweden: Wolters Kluwer.
[33]
The Danish Central Business Register (2019). Det centrale virksomhedsregister (CVR). https://data.virk.dk. Accessed: June 21, 2020.
[34]
Vollset, E., Folkestad, E., Gallala, M. R., & Gulla, J. A. (2017). Making use of external company data to improve the classification of bank transactions, Advanced Data Mining and Applications. ADMA 2017, pp. 767–780.
[35]
Xero Blog (2017). How artificial intelligence and machine learning will transform accounting. https://www.xero.com/blog/2017/02/artificial-intelligence-machine-learning-transform-accounting. Accessed: June 21, 2020.

Cited By

View all
  • (2024)Multilabel Classification of Account Code in Double-Entry BookkeepingProceedings of the 2024 10th International Conference on Computer Technology Applications10.1145/3674558.3674587(209-214)Online publication date: 15-May-2024
  • (2024)Evaluating Deep Neural Networks in Deployment: A Comparative Study (Replicability Study)Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680401(1300-1311)Online publication date: 11-Sep-2024
  • (2024)PECJ: Stream Window Join on Disorder Data Streams with Proactive Error CompensationProceedings of the ACM on Management of Data10.1145/36392682:1(1-24)Online publication date: 26-Mar-2024

Index Terms

  1. Machine learning for financial transaction classification across companies using character‐level word embeddings of text fields
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image International Journal of Intelligent Systems in Accounting and Finance Management
          International Journal of Intelligent Systems in Accounting and Finance Management  Volume 28, Issue 3
          July/September 2021
          58 pages
          ISSN:1055-615X
          EISSN:2160-0074
          DOI:10.1002/isaf.v28.3
          Issue’s Table of Contents

          Publisher

          John Wiley and Sons Ltd.

          United Kingdom

          Publication History

          Published: 28 October 2021

          Author Tags

          1. accounting
          2. finance
          3. financial transactions
          4. multiclass classification
          5. random forest
          6. word embedding

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 31 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Multilabel Classification of Account Code in Double-Entry BookkeepingProceedings of the 2024 10th International Conference on Computer Technology Applications10.1145/3674558.3674587(209-214)Online publication date: 15-May-2024
          • (2024)Evaluating Deep Neural Networks in Deployment: A Comparative Study (Replicability Study)Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680401(1300-1311)Online publication date: 11-Sep-2024
          • (2024)PECJ: Stream Window Join on Disorder Data Streams with Proactive Error CompensationProceedings of the ACM on Management of Data10.1145/36392682:1(1-24)Online publication date: 26-Mar-2024
          • (2022)Detection of temporality at discourse level on financial news by combining Natural Language Processing and Machine LearningExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.116648197:COnline publication date: 18-May-2022

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media