Multi-task Learning for Features Extraction in Financial Annual Reports

Syrielle Montariol⁴⁶,
Matej Martinc⁴⁶,
Andraž Pelicon⁴⁶,
Senja Pollak⁴⁶,
Boshko Koloski⁴⁶,
Igor Lončarski⁴⁷,
Aljoša Valentinčič⁴⁷,
Katarina Sitar Šuštar⁴⁷,
Riste Ichev⁴⁷ &
…
Martin Žnidaršič⁴⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1753))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

865 Accesses
2 Citations

Abstract

For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information. This textual data can provide valuable weak signals, for example through stylistic features, which can complement the quantitative data on financial performance or on Environmental, Social and Governance (ESG) criteria. In this work, we use various multi-task learning methods for financial text classification with the focus on financial sentiment, objectivity, forward-looking sentence prediction and ESG-content detection. We propose different methods to combine the information extracted from training jointly on different tasks; our best-performing method highlights the positive effect of explicitly adding auxiliary task predictions as features for the final target task during the multi-task training. Next, we use these classifiers to extract textual features from annual reports of FTSE350 companies and investigate the link between ESG quantitative scores and these features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 55.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach

Article Open access 04 October 2023

Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary

Article 12 August 2023

Language as a Lens: A Hybrid Text Summarization and Sentiment Analysis Approach for Multiclass Stock Return Prediction

Notes

1.
https://ec.europa.eu/info/business-economy-euro/company-reporting-and-auditing/company-reporting/corporate-sustainability-reporting_en.
2.
For a sample of 19,426 PDF annual reports published by 3252 firms listed on the London Stock Exchange.
3.
https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-2022/shared-task-finsim4-esg.
4.
Code and details to re-create the dataset are available at osf.io/rqgp4.
5.
Note that the N/As can only appear in the joint and weighted settings, where there is no explicit final target task.
6.
Note that the results for each method reported in Table 3 are lower than the results reported in Table 2, since here we report the average method’s performance across all task combinations, while in Table 2 we only report results for the best ranked task combinations for each specific method.

References

Aghajanyan, A., Gupta, A., Shrivastava, A., Chen, X., Zettlemoyer, L., Gupta, S.: Muppet: massive multi-task representations with pre-finetuning. arXiv preprint arXiv:2101.11038 (2021)
Amir, E., Lev, B.: Value-relevance of nonfinancial information: the wireless communications industry. J. Account. Econ. 22(1), 3–30 (1996). https://doi.org/10.1016/S0165-4101(96)00430-2
Aribandi, V., et al.: Ext5: towards extreme multi-task scaling for transfer learning. arXiv preprint arXiv:2111.10952 (2021)
Armbrust, F., Schäfer, H., Klinger, R.: A computational analysis of financial and environmental narratives within financial reports and its value for investors. In: Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, pp. 181–194, COLING, Barcelona, Spain, December 2020. https://aclanthology.org/2020.fnp-1.31
Bobicev, V., Sokolova, M.: Inter-annotator agreement in sentiment analysis: machine learning perspective. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 97–102, INCOMA Ltd., Varna, Bulgaria, September 2017. https://doi.org/10.26615/978-954-452-049-6_015
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article Google Scholar
Chen, C.C., Huang, H.H., Takamura, H., Chen, H.H.: An overview of financial technology innovation (2022)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20(1), 37–46 (1960). https://doi.org/10.1177/001316446002000104
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019
Google Scholar
Dyer, T., Lang, M., Stice-Lawrence, L.: The evolution of 10-k textual disclosure: evidence from latent dirichlet allocation. J. Account. Econ. 64(2), 221–245 (2017). https://EconPapers.repec.org/RePEc:eee:jaecon:v:64:y:2017:i:2:p:221-245
Halder, K., Akbik, A., Krapac, J., Vollgraf, R.: Task-aware representation of sentences for generic text classification. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 3202–3213 (2020)
Google Scholar
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339, Melbourne, Australia, July 2018
Google Scholar
Keith, K., Jensen, D., O’Connor, B.: Text and causal inference: a review of using text to remove confounding from causal estimates. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5332–5344, Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.474, https://aclanthology.org/2020.acl-main.474
Lev, B., Thiagarajan, S.R.: Fundamental information analysis. J. Account. Res. 31(2), 190–215 (1993). https://doi.org/10.2307/2491270, http://dx.doi.org/10.2307/2491270
Lewis, C., Young, S.: Fad or future? automated analysis of financial text and its implications for corporate reporting. Account. Bus. Res. 49(5), 587–615 (2019). https://doi.org/10.1080/00014788.2019.1611730
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lydenberg, S., Rogers, J., Wood, D.: From transparency to performance: Industry-based sustainability reporting on key issues. Technical Report, Hauser Center for Nonprofit Organizations at Harvard University (2010). https://iri.hks.harvard.edu/links/transparency-performance-industry-based-sustainability-reporting-key-issues
Masson, C., Montariol, S.: Detecting omissions of risk factors in company annual reports. In: Proceedings of the Second Workshop on Financial Technology and Natural Language Processing, pp. 15–21 (2020)
Google Scholar
McCann, B., Keskar, N.S., Xiong, C., Socher, R.: The natural language decathlon: multitask learning as question answering. arXiv preprint arXiv:1806.08730 (2018)
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia medica 22(3), 276–282 (2012)
Article Google Scholar
Merkl-Davies, D.M., Brennan, N.M., McLeay, S.J.: Impression management and retrospective sense-making in corporate narratives: a social psychology perspective. Account. Audit. Account. J. 24(3), 315–344 (2011), https://doi.org/10.1108/09513571111124036
Montariol, S., Allauzen, A., Kitamoto, A.: Variations in word usage for the financial domain. In: Proceedings of the Second Workshop on Financial Technology and Natural Language Processing, pp. 8–14, Kyoto, Japan, 5 January 2020. https://aclanthology.org/2020.finnlp-1.2
Peng, B., Chersoni, E., Hsu, Y.Y., Huang, C.R.: Is domain adaptation worth your investment? comparing BERT and FinBERT on financial tasks. In: Proceedings of the Third Workshop on Economics and Natural Language Processing, pp. 37–44, Association for Computational Linguistics, Punta Cana, Dominican Republic, November 2021. https://doi.org/10.18653/v1/2021.econlp-1.5, https://aclanthology.org/2021.econlp-1.5
Phang, J., Févry, T., Bowman, S.R.: Sentence encoders on stilts: supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088 (2018)
Pruksachatkun, Y., et al.: Intermediate-task transfer learning with pretrained language models: When and why does it work? In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5231–5247, Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.467, https://aclanthology.org/2020.acl-main.467
Purver, M., et al.: Tracking changes in ESG representation: initial investigations in UK annual reports. In: LREC 2022 Workshop Language Resources and Evaluation Conference 20–25 June 2022, pp. 9–14 (2022)
Google Scholar
Reverte, C.: Corporate social responsibility disclosure and market valuation: evidence from Spanish listed firms. Rev. Manage. Sci. 10(2), 411–435 (2016)
Article Google Scholar
Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)
SEC: Securities exchange act of 1934. Securities Exchange Act of 1934 (2012)
Google Scholar
Slattery, D.: The power of language in corporate financial reports. Commun. Lang. Work 3(3), 53–63 (2014). https://doi.org/10.7146/claw.v1i3.16555
Article Google Scholar
Standley, T., Zamir, A., Chen, D., Guibas, L., Malik, J., Savarese, S.: Which tasks should be learned together in multi-task learning? In: International Conference on Machine Learning, pp. 9120–9132, PMLR (2020)
Google Scholar
Stepišnik-Perdih, T., Pelicon, A., Škrlj, B., Žnidaršič, M., Lončarski, I., Pollak, S.: Sentiment classification by incorporating background knowledge from financial ontologies. In: Proceedings of the 4th FNP Workshop, 2022, to appear
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: analyzing and interpreting Neural Networks for NLP, pp. 353–355, Association for Computational Linguistics, Brussels, Belgium, November 2018. https://doi.org/10.18653/v1/W18-5446, https://aclanthology.org/W18-5446
Worsham, J., Kalita, J.: Multi-task learning for natural language processing in the 2020s: where are we going? Pattern Recogn. Lett. 136, 120–126 (2020)
Article Google Scholar
Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 01, 1 (2021)
Google Scholar

Download references

Acknowledgements

This work was supported by the Slovenian Research Agency (ARRS) grants for the core programme Knowledge technologies (P2-0103) and the project quantitative and qualitative analysis of the unregulated corporate financial reporting (J5-2554). We also want to thank the students of the SBE for their effort in data annotation.

Author information

Authors and Affiliations

Jožef Stefan Institute, Jamova Cesta 39, 1000, Ljubljana, Slovenia
Syrielle Montariol, Matej Martinc, Andraž Pelicon, Senja Pollak, Boshko Koloski & Martin Žnidaršič
School of Economics and Business, University of Ljubljana, Kardeljeva pl. 17, 1000, Ljubljana, Slovenia
Igor Lončarski, Aljoša Valentinčič, Katarina Sitar Šuštar & Riste Ichev

Authors

Syrielle Montariol
View author publications
You can also search for this author in PubMed Google Scholar
Matej Martinc
View author publications
You can also search for this author in PubMed Google Scholar
Andraž Pelicon
View author publications
You can also search for this author in PubMed Google Scholar
Senja Pollak
View author publications
You can also search for this author in PubMed Google Scholar
Boshko Koloski
View author publications
You can also search for this author in PubMed Google Scholar
Igor Lončarski
View author publications
You can also search for this author in PubMed Google Scholar
Aljoša Valentinčič
View author publications
You can also search for this author in PubMed Google Scholar
Katarina Sitar Šuštar
View author publications
You can also search for this author in PubMed Google Scholar
Riste Ichev
View author publications
You can also search for this author in PubMed Google Scholar
Martin Žnidaršič
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Syrielle Montariol .

Editor information

Editors and Affiliations

University of Sydney, Sydney, Australia
Irena Koprinska
University of Bari Aldo Moro, Bari, Italy
Paolo Mignone
University of Pisa, Pisa, Italy
Riccardo Guidotti
Warsaw University of Technology, Warsaw, Poland
Szymon Jaroszewicz
Heidelberg University, Heidelberg, Germany
Holger Fröning
UniCredit, Rome, Italy
Francesco Gullo
University of Lisbon, Lisbon, Portugal
Pedro M. Ferreira
Roche, Basel, Switzerland
Damian Roqueiro
Barcelona Supercomputing Center, Barcelona, Spain
Gaia Ceddia
Halmstad University, Halmstad, Sweden
Slawomir Nowaczyk
University of Porto, Porto, Portugal
João Gama
University of Porto, Porto, Portugal
Rita Ribeiro
UPC BarcelonaTech, Barcelona, Spain
Ricard Gavaldà
University of Naples Federico II, Naples, Italy
Elio Masciari
University of North Carolina, Charlotte, USA
Zbigniew Ras
ICAR-CNR, Rende, Italy
Ettore Ritacco
University of Pisa, Pisa, Italy
Francesca Naretto
Aalen University of Applied Sciences, Aalen, Germany
Andreas Theissler
Warsaw University of Technology, Warszaw, Poland
Przemyslaw Biecek
KU Leuven, Leuven, Belgium
Wouter Verbeke
University of Duisburg-Essen, Essen, Germany
Gregor Schiele
Graz University of Technology, Graz, Austria
Franz Pernkopf
AMD, Dublin, Ireland
Michaela Blott
UniCredit, Rome, Italy
Ilaria Bordino
UniCredit, Milan, Italy
Ivan Luciano Danesi
National Agency for New Technologies, Rome, Italy
Giovanni Ponti
Unicredit, Rome, Italy
Lorenzo Severini
University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
University of Bari Aldo Moro, Bari, Italy
Giuseppina Andresini
University of Lisbon, Lisbon, Portugal
Ibéria Medeiros
University of Lisbon, Lisbon, Portugal
Guilherme Graça
Northwestern University, Chicago, USA
Lee Cooper
Roche, Basel, Switzerland
Naghmeh Ghazaleh
University of Lausanne, Lausanne, Switzerland
Jonas Richiardi
Novartis, Basel, Switzerland
Diego Saldana
Novartis, Basel, Switzerland
Konstantinos Sechidis
Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy
Arif Canakoglu
Politecnico di Milano, Milan, Italy
Sara Pido
Politecnico di Milano, Milan, Italy
Pietro Pinoli
University of Waikato, Hamilton, New Zealand
Albert Bifet
Halmstad University, Halmstad, Sweden
Sepideh Pashami

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Montariol, S. et al. (2023). Multi-task Learning for Features Extraction in Financial Annual Reports. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1753. Springer, Cham. https://doi.org/10.1007/978-3-031-23633-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-23633-4_1
Published: 31 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23632-7
Online ISBN: 978-3-031-23633-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-task Learning for Features Extraction in Financial Annual Reports

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach

Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary

Language as a Lens: A Hybrid Text Summarization and Sentiment Analysis Approach for Multiclass Stock Return Prediction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-task Learning for Features Extraction in Financial Annual Reports

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach

Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary

Language as a Lens: A Hybrid Text Summarization and Sentiment Analysis Approach for Multiclass Stock Return Prediction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation