Abstract
For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information. This textual data can provide valuable weak signals, for example through stylistic features, which can complement the quantitative data on financial performance or on Environmental, Social and Governance (ESG) criteria. In this work, we use various multi-task learning methods for financial text classification with the focus on financial sentiment, objectivity, forward-looking sentence prediction and ESG-content detection. We propose different methods to combine the information extracted from training jointly on different tasks; our best-performing method highlights the positive effect of explicitly adding auxiliary task predictions as features for the final target task during the multi-task training. Next, we use these classifiers to extract textual features from annual reports of FTSE350 companies and investigate the link between ESG quantitative scores and these features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
For a sample of 19,426 PDF annual reports published by 3252 firms listed on the London Stock Exchange.
- 3.
- 4.
Code and details to re-create the dataset are available at osf.io/rqgp4.
- 5.
Note that the N/As can only appear in the joint and weighted settings, where there is no explicit final target task.
- 6.
References
Aghajanyan, A., Gupta, A., Shrivastava, A., Chen, X., Zettlemoyer, L., Gupta, S.: Muppet: massive multi-task representations with pre-finetuning. arXiv preprint arXiv:2101.11038 (2021)
Amir, E., Lev, B.: Value-relevance of nonfinancial information: the wireless communications industry. J. Account. Econ. 22(1), 3–30 (1996). https://doi.org/10.1016/S0165-4101(96)00430-2
Aribandi, V., et al.: Ext5: towards extreme multi-task scaling for transfer learning. arXiv preprint arXiv:2111.10952 (2021)
Armbrust, F., Schäfer, H., Klinger, R.: A computational analysis of financial and environmental narratives within financial reports and its value for investors. In: Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, pp. 181–194, COLING, Barcelona, Spain, December 2020. https://aclanthology.org/2020.fnp-1.31
Bobicev, V., Sokolova, M.: Inter-annotator agreement in sentiment analysis: machine learning perspective. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 97–102, INCOMA Ltd., Varna, Bulgaria, September 2017. https://doi.org/10.26615/978-954-452-049-6_015
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Chen, C.C., Huang, H.H., Takamura, H., Chen, H.H.: An overview of financial technology innovation (2022)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20(1), 37–46 (1960). https://doi.org/10.1177/001316446002000104
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019
Dyer, T., Lang, M., Stice-Lawrence, L.: The evolution of 10-k textual disclosure: evidence from latent dirichlet allocation. J. Account. Econ. 64(2), 221–245 (2017). https://EconPapers.repec.org/RePEc:eee:jaecon:v:64:y:2017:i:2:p:221-245
Halder, K., Akbik, A., Krapac, J., Vollgraf, R.: Task-aware representation of sentences for generic text classification. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 3202–3213 (2020)
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339, Melbourne, Australia, July 2018
Keith, K., Jensen, D., O’Connor, B.: Text and causal inference: a review of using text to remove confounding from causal estimates. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5332–5344, Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.474, https://aclanthology.org/2020.acl-main.474
Lev, B., Thiagarajan, S.R.: Fundamental information analysis. J. Account. Res. 31(2), 190–215 (1993). https://doi.org/10.2307/2491270, http://dx.doi.org/10.2307/2491270
Lewis, C., Young, S.: Fad or future? automated analysis of financial text and its implications for corporate reporting. Account. Bus. Res. 49(5), 587–615 (2019). https://doi.org/10.1080/00014788.2019.1611730
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lydenberg, S., Rogers, J., Wood, D.: From transparency to performance: Industry-based sustainability reporting on key issues. Technical Report, Hauser Center for Nonprofit Organizations at Harvard University (2010). https://iri.hks.harvard.edu/links/transparency-performance-industry-based-sustainability-reporting-key-issues
Masson, C., Montariol, S.: Detecting omissions of risk factors in company annual reports. In: Proceedings of the Second Workshop on Financial Technology and Natural Language Processing, pp. 15–21 (2020)
McCann, B., Keskar, N.S., Xiong, C., Socher, R.: The natural language decathlon: multitask learning as question answering. arXiv preprint arXiv:1806.08730 (2018)
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia medica 22(3), 276–282 (2012)
Merkl-Davies, D.M., Brennan, N.M., McLeay, S.J.: Impression management and retrospective sense-making in corporate narratives: a social psychology perspective. Account. Audit. Account. J. 24(3), 315–344 (2011), https://doi.org/10.1108/09513571111124036
Montariol, S., Allauzen, A., Kitamoto, A.: Variations in word usage for the financial domain. In: Proceedings of the Second Workshop on Financial Technology and Natural Language Processing, pp. 8–14, Kyoto, Japan, 5 January 2020. https://aclanthology.org/2020.finnlp-1.2
Peng, B., Chersoni, E., Hsu, Y.Y., Huang, C.R.: Is domain adaptation worth your investment? comparing BERT and FinBERT on financial tasks. In: Proceedings of the Third Workshop on Economics and Natural Language Processing, pp. 37–44, Association for Computational Linguistics, Punta Cana, Dominican Republic, November 2021. https://doi.org/10.18653/v1/2021.econlp-1.5, https://aclanthology.org/2021.econlp-1.5
Phang, J., Févry, T., Bowman, S.R.: Sentence encoders on stilts: supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088 (2018)
Pruksachatkun, Y., et al.: Intermediate-task transfer learning with pretrained language models: When and why does it work? In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5231–5247, Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.467, https://aclanthology.org/2020.acl-main.467
Purver, M., et al.: Tracking changes in ESG representation: initial investigations in UK annual reports. In: LREC 2022 Workshop Language Resources and Evaluation Conference 20–25 June 2022, pp. 9–14 (2022)
Reverte, C.: Corporate social responsibility disclosure and market valuation: evidence from Spanish listed firms. Rev. Manage. Sci. 10(2), 411–435 (2016)
Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)
SEC: Securities exchange act of 1934. Securities Exchange Act of 1934 (2012)
Slattery, D.: The power of language in corporate financial reports. Commun. Lang. Work 3(3), 53–63 (2014). https://doi.org/10.7146/claw.v1i3.16555
Standley, T., Zamir, A., Chen, D., Guibas, L., Malik, J., Savarese, S.: Which tasks should be learned together in multi-task learning? In: International Conference on Machine Learning, pp. 9120–9132, PMLR (2020)
Stepišnik-Perdih, T., Pelicon, A., Škrlj, B., Žnidaršič, M., Lončarski, I., Pollak, S.: Sentiment classification by incorporating background knowledge from financial ontologies. In: Proceedings of the 4th FNP Workshop, 2022, to appear
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: analyzing and interpreting Neural Networks for NLP, pp. 353–355, Association for Computational Linguistics, Brussels, Belgium, November 2018. https://doi.org/10.18653/v1/W18-5446, https://aclanthology.org/W18-5446
Worsham, J., Kalita, J.: Multi-task learning for natural language processing in the 2020s: where are we going? Pattern Recogn. Lett. 136, 120–126 (2020)
Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 01, 1 (2021)
Acknowledgements
This work was supported by the Slovenian Research Agency (ARRS) grants for the core programme Knowledge technologies (P2-0103) and the project quantitative and qualitative analysis of the unregulated corporate financial reporting (J5-2554). We also want to thank the students of the SBE for their effort in data annotation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Montariol, S. et al. (2023). Multi-task Learning for Features Extraction in Financial Annual Reports. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1753. Springer, Cham. https://doi.org/10.1007/978-3-031-23633-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-23633-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23632-7
Online ISBN: 978-3-031-23633-4
eBook Packages: Computer ScienceComputer Science (R0)