Abstract
Neural networks, primarily recurrent and convolutional Neural networks, have been proven successful in text classification. However, convolutional models could be limited when classification tasks are determined by long-range semantic dependency. While the recurrent ones can capture long-range dependency, the sequential architecture of which could constrain the training speed. Meanwhile, traditional networks encode the entire document in a single pass, which omits the hierarchical structure of the document. To address the above issues, this study presents T-HMAN, a Topic-aware Hierarchical Multiple Attention Network for text classification. A multi-head self-attention coupled with convolutional filters is developed to capture long-range dependency via integrating the convolution features from each attention head. Meanwhile, T-HMAN combines topic distributions generated by Latent Dirichlet Allocation (LDA) with sentence-level and document-level inputs respectively in a hierarchical architecture. The proposed model surpasses the accuracies of the current state-of-the-art hierarchical models on five publicly accessible datasets. The ablation study demonstrates that the involvement of multiple attention mechanisms brings significant improvement. The current topic distributions are fixed vectors generated by LDA, the topic distributions will be parameterized and updated simultaneously with the model weights in future work.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Rubin V, Conroy N, Chen Y, Cornwell S (2016) Fake news or truth? using satirical cues to detect potentially misleading news. In: Proceedings of the Second Workshop on Computational Approaches to Deception Detection, pp 7–17
Zhao R, Mao K (2018) Fuzzy bag-of-words model for document representation. IEEE Trans Fuzzy Syst 26(2):794–804
Fortuna B, Galleguillos C, Cristianini N (2009) Detection of bias in media outlets with statistical learning methods. In: Text Mining, pp 57–80. Chapman and Hall/CRC
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Lin C, Ibeke E, Wyner A, Guerin F (2015) Sentiment-topic modeling in text mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(5):246–254
Ibeke E, Lin C, Wyner A, Barawi MH (2017) Extracting and understanding contrastive opinion through topic relevant sentences. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp 395–400
Li Z, Shang W, Yan M (2016) News text classification model based on topic model. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp 1–5 . IEEE
Steinberger J, Křišt’an M (2007) Lsa-based multi-document summarization. In: Proceedings of 8th International PhD Workshop on Systems and Control, vol. 7
Hosseinalipour A, Gharehchopogh FS, Masdari M, Khademi A (2021) Toward text psychology analysis using social spider optimization algorithm. Concurr Comput Pract Exp 33(17):6325
Lu Y, Mei Q, Zhai C (2011) Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inf Retrieval 14(2):178–203
Khataei Maragheh H, Gharehchopogh FS, Majidzadeh K, Sangar AB (2022) A new hybrid based on long short-term memory network with spotted hyena optimization algorithm for multi-label text classification. Mathematics 10(3):488
Jiang Y, Song X, Harrison J, Quegan S, Maynard D (2017) Comparing attitudes to climate change in the media using sentiment analysis based on latent dirichlet allocation. In: Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism, pp 25–30
Keller M, Bengio S (2004) Theme topic mixture model: A graphical model for document representation. In: PASCAL Workshop on Text Mining and Understanding
Zheng J, Cai F, Chen W, Feng C, Chen H (2019) Hierarchical neural representation for document classification. Cognit Comput 11(2):317–327
Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Won K-F, Cha M (2016) Detecting rumors from microblogs with recurrent neural networks. Ijcai
Wei W, Zhang X, Liu X, Chen W, Wang T (2016) pkudblab at semeval-2016 task 6 : A specific convolutional neural network system for effective stance detection. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). https://doi.org/10.18653/v1/s16-1062
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI Conference on Artificial Intelligence
Wang Y, Liu J, Jiang Y, Erdélyi R (2019) Cme arrival time prediction using convolutional neural network. Astrophys J 881(1):15
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1480–1489
Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Xu S, Li H, Yuan P, Wu Y, He X, Zhou B (2020) Self-attention guided copy mechanism for abstractive summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1355–1362
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32
Ambartsoumian A, Popowich F (2018) Self-attention: A better building block for sentiment analysis neural network classifiers. arXiv preprint arXiv:1812.07860
Dosovitskiy A, Beyer L, Kolesnikov, A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. arXiv preprint arXiv:1606.00061
Yin W, Schütze H (2016) Multichannel variable-size convolution for sentence classification. arXiv preprint arXiv:1603.04513
Conneau A, Schwenk H, Barrault L, Lecun Y (2017) Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1107–1116. Association for Computational Linguistics, Valencia, Spain (2017). https://www.aclweb.org/anthology/E17-1104
Gao S, Ramanathan A, Tourassi G (2018) Hierarchical convolutional attention networks for text classification. Technical report, Oak Ridge National Lab.(ORNL), Oak Ridge, TN (United States) (2018)
Abreu J, Fred L, Macêdo D, Zanchettin C (2019) Hierarchical attentional hybrid neural networks for document classification. arXiv preprint arXiv:1901.06610 (2019)
Ruchansky N, Seo S, Liu Y (2017) Csi: A hybrid deep model for fake news detection. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM 17 . https://doi.org/10.1145/3132847.3132877
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015). Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp 2048–2057. PMLR
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733
Kokkinos F, Potamianos A (2017) Structural attention neural networks for improved sentiment analysis. arXiv preprint arXiv:1701.01811
Daniluk M, Rocktäschel T, Welbl J, Riedel S (2017) Frustratingly short attention spans in neural language modeling. arXiv preprint arXiv:1702.04521
Zhou Y, Zhou J, Liu L, Feng J, Peng H, Zheng X (2018) Rnn-based sequence-preserved attention for dependency parsing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Lukasz Kaiser Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems
Jiang Y, Petrak J, Song X, Bontcheva K, Maynard D (2019) Team Bertha von Suttner at SemEval-2019 Task 4: Hyperpartisan News Detection using ELMo Sentence Representation Convolutional Network. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 840–844
Shu K, Cui L, Wang S, Lee D, Liu H (2019) defend: Explainable fake news detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 395–405
Tian B, Zhang Y, Wang J, Xing C (2019) Hierarchical inter-attention network for document classification with multi-task learning. In: IJCAI, pp 3569–3575
Liu T, Hu Y, Wang B, Sun Y, Gao J, Yin B (2022) Hierarchical graph convolutional networks for structured long document classification. IEEE Transactions on Neural Networks and Learning Systems
Li J, Wang C, Fang X, Yu K, Zhao J, Wu X, Gong J (2022) Multi-label text classification via hierarchical transformer-cnn. In: 2022 14th International Conference on Machine Learning and Computing (ICMLC), pp 120–125
Ibeke E, Lin C, Wyner A, Barawi MH (2020) A unified latent variable model for contrastive opinion mining. Front Comput Sci 14(2):404–416. https://doi.org/10.1007/s11704-018-7073-5
Lin C, Ibeke E, Wyner A, Guerin F (2015) Sentiment-topic modeling in text mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(5):246–254. https://doi.org/10.1002/widm.1161
Wu X, Fang L, Wang P, Yu N (2015) Performance of using LDA for Chinese news text classification. In: 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE), pp 1260–1264 . IEEE
Kim D, Seo D, Cho S, Kang P (2019) Multi-co-training for document classification using various document representations: Tf-idf, lda, and doc2vec. Inf Sci 477:15–29
Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp 375–384
Liu Y, Liu Z, Chua T-S, Sun M (2015) Topical word embeddings. In: Twenty-Ninth AAAI Conference on Artificial Intelligence
Xu H, Dong M, Zhu D, Kotov A, Carcone AI, Naar-King S (2016) Text classification with topic-based word embedding and convolutional neural networks. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp 88–97. ACM
Wang Y, Xu W (2018) Leveraging deep learning with lda-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95
Narayan S, Cohen SB, Lapata M (2018) Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. arXiv preprint arXiv:1808.08745
Jiang Y, Wang Y, Maynard XSD (2020) Comparing topic-aware neural networks for bias detection of news. In: Proceedings of 24th European Conference on Artificial Intelligence (ECAI 2020). International Joint Conferences on Artificial Intelligence (IJCAI)
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: International Conference on Machine Learning, pp 1243–1252. PMLR
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289
Kiesel J, Mestre M, Shukla R, Vincent E, Adineh P, Corney D, Stein B, Potthast M (2019) Semeval-2019 task 4: Hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp 829–839
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiang, Y., Wang, Y. Topic-aware hierarchical multi-attention network for text classification. Int. J. Mach. Learn. & Cyber. 14, 1863–1875 (2023). https://doi.org/10.1007/s13042-022-01734-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01734-0