research-article

Open access

GOAT at the FinSim-2 task: Learning Word Representations of Financial Data with Customized Corpus

Authors:

Yulong Pei,

Qian ZhangAuthors Info & Claims

WWW '21: Companion Proceedings of the Web Conference 2021

Pages 307 - 310

https://doi.org/10.1145/3442442.3451385

Published: 03 June 2021 Publication History

All formats PDF

Abstract

In this paper, we present our approaches for the FinSim 2021 Shared Task on Learning Semantic Similarities for the Financial Domain. The aim of the FinSim shared task is to automatically classify a given list of terms from the financial domain into the most relevant hypernym (or top-level) concept in an external ontology. Two different word representations have been compared in our study, i.e., customized word2vec provided by the shared task and FinBERT. We first create a customized corpus from the given prospectuses and relevant articles from Investopedia. Then we train the domain-specific word2vec embeddings using the customized data with customized word2vec and FinBERT as the initialized embeddings respectively. Our experimental results demonstrate that these customized word embeddings can effectively improve the classification performance and achieve better results than the direct utilization of the provided word embeddings. The class imbalance issue of the given data is also explored. We empirically study the classification performance by employing several different strategies for imbalanced classification problems. Our system ranks 2nd on both Average Accuracy and Mean Rank metrics.

References

[1]

Dogu Araci. 2019. Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063(2019).

Google Scholar

[2]

Georgeta Bordea, Els Lefever, and Paul Buitelaar. 2016. Semeval-2016 task 13: Taxonomy extraction evaluation (texeval-2). In Proceedings of the 10th international workshop on semantic evaluation (semeval-2016). 1081–1091.

Crossref

Google Scholar

[3]

Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, and Horacio Saggion. 2018. SemEval-2018 task 9: Hypernym discovery. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018); 2018 Jun 5-6; New Orleans, LA. Stroudsburg (PA): ACL; 2018. p. 712–24. ACL (Association for Computational Linguistics).

Crossref

Google Scholar

[4]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.

Crossref

Google Scholar

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.

Google Scholar

[6]

Ismail El Maarouf, Youness Mansar, Virginie Mouilleron, and Dialekti Valsamou-Stanislawski. 2021. The finsim 2020 shared task: Learning semantic representations for the financial domain. In Proceedings of the Second Workshop on Financial Technology and Natural Language Processing. 81–86.

Google Scholar

[7]

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, 1322–1328.

Google Scholar

[8]

Pekka Malo, Ankur Sinha, Pekka Korhonen, Jyrki Wallenius, and Pyry Takala. 2014. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology 65, 4(2014), 782–796.

Digital Library

Google Scholar

[9]

Youness Mansar, Juyeon Kang, and Ismail El Maarouf. 2021. FinSim-2 2021 Shared Task: Learning Semantic Similarities for the Financial Domain. In Proceedings of The Web Conference 2021 (Virtual Edition).

Digital Library

Google Scholar

[10]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).

Google Scholar

[11]

Yanmin Sun, Andrew KC Wong, and Mohamed S Kamel. 2009. Classification of imbalanced data: A review. International journal of pattern recognition and artificial intelligence 23, 04(2009), 687–719.

Google Scholar

[12]

Chengyu Wang and Xiaofeng He. 2020. Birre: learning bidirectional residual relation embeddings for supervised hypernymy detection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3630–3640.

Crossref

Google Scholar

[13]

Zheng Yu, Haixun Wang, Xuemin Lin, and Min Wang. 2015. Learning term embeddings for hypernymy identification. In Twenty-Fourth International Joint Conference on Artificial Intelligence.

Google Scholar

Cited By

View all

Ghosh SChopra ANaskar S(2023)Learning to Rank Hypernyms of Financial Terms Using Semantic Textual SimilaritySN Computer Science10.1007/s42979-023-02134-z4:5Online publication date: 11-Aug-2023
https://dl.acm.org/doi/10.1007/s42979-023-02134-z

Recommendations

A Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain
WWW '21: Companion Proceedings of the Web Conference 2021

Neural networks for language modeling have been proven effective on several sub-tasks of natural language processing. Training deep language models, however, is time-consuming and computationally intensive. Pre-trained language models such as BERT are ...
The FinSim-2 2021 Shared Task: Learning Semantic Similarities for the Financial Domain
WWW '21: Companion Proceedings of the Web Conference 2021

The FinSim-2 is a second edition of FinSim Shared Task on Learning Semantic Similarities for the Financial Domain, colocated with the FinWeb workshop. FinSim-2 proposed the challenge to automatically learn effective and precise semantic models for the ...
PolyU-CBS at the FinSim-2 Task: Combining Distributional, String-Based and Transformers-Based Features for Hypernymy Detection in the Financial Domain
WWW '21: Companion Proceedings of the Web Conference 2021

In this contribution, we describe the systems presented by the PolyU CBS Team at the second Shared Task on Learning Semantic Similarities for the Financial Domain (FinSim-2), where participating teams had to identify the right hypernyms for a list of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

WWW '21: Companion Proceedings of the Web Conference 2021

April 2021

726 pages

ISBN:9781450383134

DOI:10.1145/3442442

Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '21

Sponsor:

SIGWEB

WWW '21: The Web Conference 2021

April 19 - 23, 2021

Ljubljana, Slovenia

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
426
Total Downloads

Downloads (Last 12 months)113
Downloads (Last 6 weeks)19

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ghosh SChopra ANaskar S(2023)Learning to Rank Hypernyms of Financial Terms Using Semantic Textual SimilaritySN Computer Science10.1007/s42979-023-02134-z4:5Online publication date: 11-Aug-2023
https://dl.acm.org/doi/10.1007/s42979-023-02134-z

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

A Comparison of Pre-Trained Language Models for Multi-Class Text Classification in the Financial Domain

The FinSim-2 2021 Shared Task: Learning Semantic Similarities for the Financial Domain

PolyU-CBS at the FinSim-2 Task: Combining Distributional, String-Based and Transformers-Based Features for Hypernymy Detection in the Financial Domain