[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3442442.3451385acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Open access

GOAT at the FinSim-2 task: Learning Word Representations of Financial Data with Customized Corpus

Published: 03 June 2021 Publication History

Abstract

In this paper, we present our approaches for the FinSim 2021 Shared Task on Learning Semantic Similarities for the Financial Domain. The aim of the FinSim shared task is to automatically classify a given list of terms from the financial domain into the most relevant hypernym (or top-level) concept in an external ontology. Two different word representations have been compared in our study, i.e., customized word2vec provided by the shared task and FinBERT. We first create a customized corpus from the given prospectuses and relevant articles from Investopedia. Then we train the domain-specific word2vec embeddings using the customized data with customized word2vec and FinBERT as the initialized embeddings respectively. Our experimental results demonstrate that these customized word embeddings can effectively improve the classification performance and achieve better results than the direct utilization of the provided word embeddings. The class imbalance issue of the given data is also explored. We empirically study the classification performance by employing several different strategies for imbalanced classification problems. Our system ranks 2nd on both Average Accuracy and Mean Rank metrics.

References

[1]
Dogu Araci. 2019. Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063(2019).
[2]
Georgeta Bordea, Els Lefever, and Paul Buitelaar. 2016. Semeval-2016 task 13: Taxonomy extraction evaluation (texeval-2). In Proceedings of the 10th international workshop on semantic evaluation (semeval-2016). 1081–1091.
[3]
Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, and Horacio Saggion. 2018. SemEval-2018 task 9: Hypernym discovery. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018); 2018 Jun 5-6; New Orleans, LA. Stroudsburg (PA): ACL; 2018. p. 712–24. ACL (Association for Computational Linguistics).
[4]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.
[6]
Ismail El Maarouf, Youness Mansar, Virginie Mouilleron, and Dialekti Valsamou-Stanislawski. 2021. The finsim 2020 shared task: Learning semantic representations for the financial domain. In Proceedings of the Second Workshop on Financial Technology and Natural Language Processing. 81–86.
[7]
Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, 1322–1328.
[8]
Pekka Malo, Ankur Sinha, Pekka Korhonen, Jyrki Wallenius, and Pyry Takala. 2014. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology 65, 4(2014), 782–796.
[9]
Youness Mansar, Juyeon Kang, and Ismail El Maarouf. 2021. FinSim-2 2021 Shared Task: Learning Semantic Similarities for the Financial Domain. In Proceedings of The Web Conference 2021 (Virtual Edition).
[10]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).
[11]
Yanmin Sun, Andrew KC Wong, and Mohamed S Kamel. 2009. Classification of imbalanced data: A review. International journal of pattern recognition and artificial intelligence 23, 04(2009), 687–719.
[12]
Chengyu Wang and Xiaofeng He. 2020. Birre: learning bidirectional residual relation embeddings for supervised hypernymy detection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3630–3640.
[13]
Zheng Yu, Haixun Wang, Xuemin Lin, and Min Wang. 2015. Learning term embeddings for hypernymy identification. In Twenty-Fourth International Joint Conference on Artificial Intelligence.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '21: Companion Proceedings of the Web Conference 2021
April 2021
726 pages
ISBN:9781450383134
DOI:10.1145/3442442
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BERT
  2. Word representations
  3. imbalance classification
  4. word2vec

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '21
Sponsor:
WWW '21: The Web Conference 2021
April 19 - 23, 2021
Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)113
  • Downloads (Last 6 weeks)19
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media