[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Topic-aware hierarchical multi-attention network for text classification

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Neural networks, primarily recurrent and convolutional Neural networks, have been proven successful in text classification. However, convolutional models could be limited when classification tasks are determined by long-range semantic dependency. While the recurrent ones can capture long-range dependency, the sequential architecture of which could constrain the training speed. Meanwhile, traditional networks encode the entire document in a single pass, which omits the hierarchical structure of the document. To address the above issues, this study presents T-HMAN, a Topic-aware Hierarchical Multiple Attention Network for text classification. A multi-head self-attention coupled with convolutional filters is developed to capture long-range dependency via integrating the convolution features from each attention head. Meanwhile, T-HMAN combines topic distributions generated by Latent Dirichlet Allocation (LDA) with sentence-level and document-level inputs respectively in a hierarchical architecture. The proposed model surpasses the accuracies of the current state-of-the-art hierarchical models on five publicly accessible datasets. The ablation study demonstrates that the involvement of multiple attention mechanisms brings significant improvement. The current topic distributions are fixed vectors generated by LDA, the topic distributions will be parameterized and updated simultaneously with the model weights in future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://github.com/huggingface/datasets.

  2. https://www.yelp.com/dataset

  3. https://pan.webis.de/semeval19/semeval19-web/

References

  1. Rubin V, Conroy N, Chen Y, Cornwell S (2016) Fake news or truth? using satirical cues to detect potentially misleading news. In: Proceedings of the Second Workshop on Computational Approaches to Deception Detection, pp 7–17

  2. Zhao R, Mao K (2018) Fuzzy bag-of-words model for document representation. IEEE Trans Fuzzy Syst 26(2):794–804

    Article  Google Scholar 

  3. Fortuna B, Galleguillos C, Cristianini N (2009) Detection of bias in media outlets with statistical learning methods. In: Text Mining, pp 57–80. Chapman and Hall/CRC

  4. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  5. Lin C, Ibeke E, Wyner A, Guerin F (2015) Sentiment-topic modeling in text mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(5):246–254

  6. Ibeke E, Lin C, Wyner A, Barawi MH (2017) Extracting and understanding contrastive opinion through topic relevant sentences. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp 395–400

  7. Li Z, Shang W, Yan M (2016) News text classification model based on topic model. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp 1–5 . IEEE

  8. Steinberger J, Křišt’an M (2007) Lsa-based multi-document summarization. In: Proceedings of 8th International PhD Workshop on Systems and Control, vol. 7

  9. Hosseinalipour A, Gharehchopogh FS, Masdari M, Khademi A (2021) Toward text psychology analysis using social spider optimization algorithm. Concurr Comput Pract Exp 33(17):6325

    Article  Google Scholar 

  10. Lu Y, Mei Q, Zhai C (2011) Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inf Retrieval 14(2):178–203

    Article  Google Scholar 

  11. Khataei Maragheh H, Gharehchopogh FS, Majidzadeh K, Sangar AB (2022) A new hybrid based on long short-term memory network with spotted hyena optimization algorithm for multi-label text classification. Mathematics 10(3):488

    Article  Google Scholar 

  12. Jiang Y, Song X, Harrison J, Quegan S, Maynard D (2017) Comparing attitudes to climate change in the media using sentiment analysis based on latent dirichlet allocation. In: Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism, pp 25–30

  13. Keller M, Bengio S (2004) Theme topic mixture model: A graphical model for document representation. In: PASCAL Workshop on Text Mining and Understanding

  14. Zheng J, Cai F, Chen W, Feng C, Chen H (2019) Hierarchical neural representation for document classification. Cognit Comput 11(2):317–327

    Article  Google Scholar 

  15. Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Won K-F, Cha M (2016) Detecting rumors from microblogs with recurrent neural networks. Ijcai

  16. Wei W, Zhang X, Liu X, Chen W, Wang T (2016) pkudblab at semeval-2016 task 6 : A specific convolutional neural network system for effective stance detection. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). https://doi.org/10.18653/v1/s16-1062

  17. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882

  18. Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Thirtieth AAAI Conference on Artificial Intelligence

  19. Wang Y, Liu J, Jiang Y, Erdélyi R (2019) Cme arrival time prediction using convolutional neural network. Astrophys J 881(1):15

    Article  Google Scholar 

  20. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1480–1489

  21. Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130

  22. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint

  23. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  24. Xu S, Li H, Yuan P, Wu Y, He X, Zhou B (2020) Self-attention guided copy mechanism for abstractive summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1355–1362

  25. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32

  26. Ambartsoumian A, Popowich F (2018) Self-attention: A better building block for sentiment analysis neural network classifiers. arXiv preprint arXiv:1812.07860

  27. Dosovitskiy A, Beyer L, Kolesnikov, A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  28. Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. arXiv preprint arXiv:1606.00061

  29. Yin W, Schütze H (2016) Multichannel variable-size convolution for sentence classification. arXiv preprint arXiv:1603.04513

  30. Conneau A, Schwenk H, Barrault L, Lecun Y (2017) Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1107–1116. Association for Computational Linguistics, Valencia, Spain (2017). https://www.aclweb.org/anthology/E17-1104

  31. Gao S, Ramanathan A, Tourassi G (2018) Hierarchical convolutional attention networks for text classification. Technical report, Oak Ridge National Lab.(ORNL), Oak Ridge, TN (United States) (2018)

  32. Abreu J, Fred L, Macêdo D, Zanchettin C (2019) Hierarchical attentional hybrid neural networks for document classification. arXiv preprint arXiv:1901.06610 (2019)

  33. Ruchansky N, Seo S, Liu Y (2017) Csi: A hybrid deep model for fake news detection. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM 17 . https://doi.org/10.1145/3132847.3132877

  34. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015). Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp 2048–2057. PMLR

  35. Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733

  36. Kokkinos F, Potamianos A (2017) Structural attention neural networks for improved sentiment analysis. arXiv preprint arXiv:1701.01811

  37. Daniluk M, Rocktäschel T, Welbl J, Riedel S (2017) Frustratingly short attention spans in neural language modeling. arXiv preprint arXiv:1702.04521

  38. Zhou Y, Zhou J, Liu L, Feng J, Peng H, Zheng X (2018) Rnn-based sequence-preserved attention for dependency parsing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32

  39. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Lukasz Kaiser Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems

  40. Jiang Y, Petrak J, Song X, Bontcheva K, Maynard D (2019) Team Bertha von Suttner at SemEval-2019 Task 4: Hyperpartisan News Detection using ELMo Sentence Representation Convolutional Network. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 840–844

  41. Shu K, Cui L, Wang S, Lee D, Liu H (2019) defend: Explainable fake news detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 395–405

  42. Tian B, Zhang Y, Wang J, Xing C (2019) Hierarchical inter-attention network for document classification with multi-task learning. In: IJCAI, pp 3569–3575

  43. Liu T, Hu Y, Wang B, Sun Y, Gao J, Yin B (2022) Hierarchical graph convolutional networks for structured long document classification. IEEE Transactions on Neural Networks and Learning Systems

  44. Li J, Wang C, Fang X, Yu K, Zhao J, Wu X, Gong J (2022) Multi-label text classification via hierarchical transformer-cnn. In: 2022 14th International Conference on Machine Learning and Computing (ICMLC), pp 120–125

  45. Ibeke E, Lin C, Wyner A, Barawi MH (2020) A unified latent variable model for contrastive opinion mining. Front Comput Sci 14(2):404–416. https://doi.org/10.1007/s11704-018-7073-5

    Article  Google Scholar 

  46. Lin C, Ibeke E, Wyner A, Guerin F (2015) Sentiment-topic modeling in text mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(5):246–254. https://doi.org/10.1002/widm.1161

    Article  Google Scholar 

  47. Wu X, Fang L, Wang P, Yu N (2015) Performance of using LDA for Chinese news text classification. In: 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE), pp 1260–1264 . IEEE

  48. Kim D, Seo D, Cho S, Kang P (2019) Multi-co-training for document classification using various document representations: Tf-idf, lda, and doc2vec. Inf Sci 477:15–29

    Article  Google Scholar 

  49. Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp 375–384

  50. Liu Y, Liu Z, Chua T-S, Sun M (2015) Topical word embeddings. In: Twenty-Ninth AAAI Conference on Artificial Intelligence

  51. Xu H, Dong M, Zhu D, Kotov A, Carcone AI, Naar-King S (2016) Text classification with topic-based word embedding and convolutional neural networks. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp 88–97. ACM

  52. Wang Y, Xu W (2018) Leveraging deep learning with lda-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95

    Article  Google Scholar 

  53. Narayan S, Cohen SB, Lapata M (2018) Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. arXiv preprint arXiv:1808.08745

  54. Jiang Y, Wang Y, Maynard XSD (2020) Comparing topic-aware neural networks for bias detection of news. In: Proceedings of 24th European Conference on Artificial Intelligence (ECAI 2020). International Joint Conferences on Artificial Intelligence (IJCAI)

  55. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: International Conference on Machine Learning, pp 1243–1252. PMLR

  56. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450

  57. Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289

  58. Kiesel J, Mestre M, Shukla R, Vincent E, Adineh P, Corney D, Stein B, Potthast M (2019) Semeval-2019 task 4: Hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp 829–839

  59. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  60. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

  61. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yimin Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, Y., Wang, Y. Topic-aware hierarchical multi-attention network for text classification. Int. J. Mach. Learn. & Cyber. 14, 1863–1875 (2023). https://doi.org/10.1007/s13042-022-01734-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01734-0

Keywords

Navigation