Topic-aware hierarchical multi-attention network for text classification

480 Accesses
1 Altmetric
Explore all metrics

Abstract

Neural networks, primarily recurrent and convolutional Neural networks, have been proven successful in text classification. However, convolutional models could be limited when classification tasks are determined by long-range semantic dependency. While the recurrent ones can capture long-range dependency, the sequential architecture of which could constrain the training speed. Meanwhile, traditional networks encode the entire document in a single pass, which omits the hierarchical structure of the document. To address the above issues, this study presents T-HMAN, a Topic-aware Hierarchical Multiple Attention Network for text classification. A multi-head self-attention coupled with convolutional filters is developed to capture long-range dependency via integrating the convolution features from each attention head. Meanwhile, T-HMAN combines topic distributions generated by Latent Dirichlet Allocation (LDA) with sentence-level and document-level inputs respectively in a hierarchical architecture. The proposed model surpasses the accuracies of the current state-of-the-art hierarchical models on five publicly accessible datasets. The ablation study demonstrates that the involvement of multiple attention mechanisms brings significant improvement. The current topic distributions are fixed vectors generated by LDA, the topic distributions will be parameterized and updated simultaneously with the model weights in future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

TAE: Topic-aware encoder for large-scale multi-label text classification

Article 01 April 2024

Hierarchical Convolutional Attention Networks Using Joint Chinese Word Embedding for Text Classification

CRAN: A Hybrid CNN-RNN Attention-Based Model for Text Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Rubin V, Conroy N, Chen Y, Cornwell S (2016) Fake news or truth? using satirical cues to detect potentially misleading news. In: Proceedings of the Second Workshop on Computational Approaches to Deception Detection, pp 7–17
Zhao R, Mao K (2018) Fuzzy bag-of-words model for document representation. IEEE Trans Fuzzy Syst 26(2):794–804
Article Google Scholar
Fortuna B, Galleguillos C, Cristianini N (2009) Detection of bias in media outlets with statistical learning methods. In: Text Mining, pp 57–80. Chapman and Hall/CRC
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Lin C, Ibeke E, Wyner A, Guerin F (2015) Sentiment-topic modeling in text mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(5):246–254
Ibeke E, Lin C, Wyner A, Barawi MH (2017) Extracting and understanding contrastive opinion through topic relevant sentences. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp 395–400
Li Z, Shang W, Yan M (2016) News text classification model based on topic model. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp 1–5 . IEEE
Steinberger J, Křišt’an M (2007) Lsa-based multi-document summarization. In: Proceedings of 8th International PhD Workshop on Systems and Control, vol. 7
Hosseinalipour A, Gharehchopogh FS, Masdari M, Khademi A (2021) Toward text psychology analysis using social spider optimization algorithm. Concurr Comput Pract Exp 33(17):6325
Article Google Scholar
Lu Y, Mei Q, Zhai C (2011) Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inf Retrieval 14(2):178–203
Article Google Scholar
Khataei Maragheh H, Gharehchopogh FS, Majidzadeh K, Sangar AB (2022) A new hybrid based on long short-term memory network with spotted hyena optimization algorithm for multi-label text classification. Mathematics 10(3):488
Article Google Scholar
Jiang Y, Song X, Harrison J, Quegan S, Maynard D (2017) Comparing attitudes to climate change in the media using sentiment analysis based on latent dirichlet allocation. In: Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism, pp 25–30
Keller M, Bengio S (2004) Theme topic mixture model: A graphical model for document representation. In: PASCAL Workshop on Text Mining and Understanding
Zheng J, Cai F, Chen W, Feng C, Chen H (2019) Hierarchical neural representation for document classification. Cognit Comput 11(2):317–327
Article Google Scholar
Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Won K-F, Cha M (2016) Detecting rumors from microblogs with recurrent neural networks. Ijcai
Wei W, Zhang X, Liu X, Chen W, Wang T (2016) pkudblab at semeval-2016 task 6 : A specific convolutional neural network system for effective stance detection. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). https://doi.org/10.18653/v1/s16-1062
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI Conference on Artificial Intelligence
Wang Y, Liu J, Jiang Y, Erdélyi R (2019) Cme arrival time prediction using convolutional neural network. Astrophys J 881(1):15
Article Google Scholar
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1480–1489
Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Xu S, Li H, Yuan P, Wu Y, He X, Zhou B (2020) Self-attention guided copy mechanism for abstractive summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1355–1362
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32
Ambartsoumian A, Popowich F (2018) Self-attention: A better building block for sentiment analysis neural network classifiers. arXiv preprint arXiv:1812.07860
Dosovitskiy A, Beyer L, Kolesnikov, A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. arXiv preprint arXiv:1606.00061
Yin W, Schütze H (2016) Multichannel variable-size convolution for sentence classification. arXiv preprint arXiv:1603.04513
Conneau A, Schwenk H, Barrault L, Lecun Y (2017) Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1107–1116. Association for Computational Linguistics, Valencia, Spain (2017). https://www.aclweb.org/anthology/E17-1104
Gao S, Ramanathan A, Tourassi G (2018) Hierarchical convolutional attention networks for text classification. Technical report, Oak Ridge National Lab.(ORNL), Oak Ridge, TN (United States) (2018)
Abreu J, Fred L, Macêdo D, Zanchettin C (2019) Hierarchical attentional hybrid neural networks for document classification. arXiv preprint arXiv:1901.06610 (2019)
Ruchansky N, Seo S, Liu Y (2017) Csi: A hybrid deep model for fake news detection. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM 17 . https://doi.org/10.1145/3132847.3132877
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015). Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp 2048–2057. PMLR
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733
Kokkinos F, Potamianos A (2017) Structural attention neural networks for improved sentiment analysis. arXiv preprint arXiv:1701.01811
Daniluk M, Rocktäschel T, Welbl J, Riedel S (2017) Frustratingly short attention spans in neural language modeling. arXiv preprint arXiv:1702.04521
Zhou Y, Zhou J, Liu L, Feng J, Peng H, Zheng X (2018) Rnn-based sequence-preserved attention for dependency parsing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Lukasz Kaiser Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems
Jiang Y, Petrak J, Song X, Bontcheva K, Maynard D (2019) Team Bertha von Suttner at SemEval-2019 Task 4: Hyperpartisan News Detection using ELMo Sentence Representation Convolutional Network. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 840–844
Shu K, Cui L, Wang S, Lee D, Liu H (2019) defend: Explainable fake news detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 395–405
Tian B, Zhang Y, Wang J, Xing C (2019) Hierarchical inter-attention network for document classification with multi-task learning. In: IJCAI, pp 3569–3575
Liu T, Hu Y, Wang B, Sun Y, Gao J, Yin B (2022) Hierarchical graph convolutional networks for structured long document classification. IEEE Transactions on Neural Networks and Learning Systems
Li J, Wang C, Fang X, Yu K, Zhao J, Wu X, Gong J (2022) Multi-label text classification via hierarchical transformer-cnn. In: 2022 14th International Conference on Machine Learning and Computing (ICMLC), pp 120–125
Ibeke E, Lin C, Wyner A, Barawi MH (2020) A unified latent variable model for contrastive opinion mining. Front Comput Sci 14(2):404–416. https://doi.org/10.1007/s11704-018-7073-5
Article Google Scholar
Lin C, Ibeke E, Wyner A, Guerin F (2015) Sentiment-topic modeling in text mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(5):246–254. https://doi.org/10.1002/widm.1161
Article Google Scholar
Wu X, Fang L, Wang P, Yu N (2015) Performance of using LDA for Chinese news text classification. In: 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE), pp 1260–1264 . IEEE
Kim D, Seo D, Cho S, Kang P (2019) Multi-co-training for document classification using various document representations: Tf-idf, lda, and doc2vec. Inf Sci 477:15–29
Article Google Scholar
Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp 375–384
Liu Y, Liu Z, Chua T-S, Sun M (2015) Topical word embeddings. In: Twenty-Ninth AAAI Conference on Artificial Intelligence
Xu H, Dong M, Zhu D, Kotov A, Carcone AI, Naar-King S (2016) Text classification with topic-based word embedding and convolutional neural networks. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp 88–97. ACM
Wang Y, Xu W (2018) Leveraging deep learning with lda-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95
Article Google Scholar
Narayan S, Cohen SB, Lapata M (2018) Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. arXiv preprint arXiv:1808.08745
Jiang Y, Wang Y, Maynard XSD (2020) Comparing topic-aware neural networks for bias detection of news. In: Proceedings of 24th European Conference on Artificial Intelligence (ECAI 2020). International Joint Conferences on Artificial Intelligence (IJCAI)
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: International Conference on Machine Learning, pp 1243–1252. PMLR
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289
Kiesel J, Mestre M, Shukla R, Vincent E, Adineh P, Corney D, Stein B, Potthast M (2019) Semeval-2019 task 4: Hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp 829–839
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Article Google Scholar
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

Download references

Author information

Authors and Affiliations

School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao, China
Ye Jiang & Yimin Wang

Authors

Ye Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yimin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yimin Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiang, Y., Wang, Y. Topic-aware hierarchical multi-attention network for text classification. Int. J. Mach. Learn. & Cyber. 14, 1863–1875 (2023). https://doi.org/10.1007/s13042-022-01734-0

Download citation

Received: 25 April 2022
Accepted: 26 November 2022
Published: 14 December 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s13042-022-01734-0

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TAE: Topic-aware encoder for large-scale multi-label text classification

Hierarchical Convolutional Attention Networks Using Joint Chinese Word Embedding for Text Classification

CRAN: A Hybrid CNN-RNN Attention-Based Model for Text Classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Topic-aware hierarchical multi-attention network for text classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TAE: Topic-aware encoder for large-scale multi-label text classification

Hierarchical Convolutional Attention Networks Using Joint Chinese Word Embedding for Text Classification

CRAN: A Hybrid CNN-RNN Attention-Based Model for Text Classification

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now