[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval

Published: 01 April 2016 Publication History

Abstract

This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks (RNN) with Long Short-Term Memory (LSTM) cells. The proposed LSTM-RNN model sequentially takes each word in a sentence, extracts its information, and embeds it into a semantic vector. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model is found to automatically attenuate the unimportant words and detect the salient keywords in the sentence. Furthermore, these detected keywords are found to automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These automatic keyword detection and topic allocation abilities enabled by the LSTM-RNN allow the network to perform document retrieval, a difficult language processing task, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform several existing state of the art methods. We emphasize that the proposed model generates sentence embedding vectors that are specially useful for web document retrieval tasks. A comparison with a well known general sentence embedding method, the Paragraph Vector, is performed. The results show that the proposed method in this paper significantly outperforms Paragraph Vector method for web document retrieval task.

References

[1]
I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 3104--3112.
[2]
Q. V. Le and T. Mikolov, "Distributed representations of sentences and documents," Proc. 31st Int. Conf. Mach. Learn., 2014, pp. 1188--1196.
[3]
P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck, "Learning deep structured semantic models for web search using clickthrough data," Proc. 22Nd ACM Int. Conf. Inf. Knowl. Manage., 2013, pp. 2333--2338, ser. CIKM '13. ACM.
[4]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their composition-ality," Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 3111--3119.
[5]
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv:1301.3781, 2013.
[6]
R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba, R. Urtasun, and S. Fidler, "Skip-thought vectors," Adv. Neural Inf. Process. Syst. (NIPS), 2015.
[7]
Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, and S. Fidler, "Aligning books and movies: Towards story-like visual explanations by watching movies and reading books," arXiv:1506.06724, 2015.
[8]
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," NIPS Deep Learn. Workshop, 2014.
[9]
R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning, "Semi-supervised recursive autoencoders for predicting sentiment distributions," Proc. Conf. Empir. Meth. Nat. Lang. Process., 2011, pp. 151--161, ser. EMNLP'11.
[10]
Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil, "A latent semantic model with convolutional-pooling structure for information retrieval," Proc. CIKM, Nov. 2014.
[11]
R. Collobert and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning," Proc. Int. Conf. Mach. Learn. (ICML), 2008.
[12]
N. Kalchbrenner, E. Grefenstette, and P. Blunsom, "A convolutional neural network for modelling sentences," Proc. 52nd Annu. Meeting Assoc. Comput. Linguist., Jun. 2014.
[13]
B. Hu, Z. Lu, H. Li, and Q. Chen, "Convolutional neural network architectures for matching natural language sentences," Adv. Neural Inf. Process. Syst. 27, 2014, pp. 2042--2050.
[14]
J. Zhang, S. Liu, M. Li, M. Zhou, and C. Zong, "Bilingually-constrained phrase embeddings for machine translation," Proc. 52nd Annu. Meeting Assoc. Comput. Linguist. (ACL) (Vol. 1: Long Papers), Baltimore, MD, USA, 2014, pp. 111--121.
[15]
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735--1780, Nov. 1997.
[16]
A. Graves, A. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," Proc. ICASSP, Vancouver, BC, Canada, May 2013, pp. 6645--6649.
[17]
H. Sak, A. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," Proc. Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 2014.
[18]
K. M. Hermann and P. Blunsom, "Multilingual models for compositional distributed semantics," arXiv:1404.4641, 2014.
[19]
D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," Proc. ICLR'15, 2015, Online. {Available}: http://arxiv.org/abs/1409.0473.
[20]
A. Karpathy, J. Johnson, and L. Fei-Fei, "Visualizing and understanding recurrent networks," arXiv:1506.02078, 2015.
[21]
J. L. Elman, "Finding structure in time," Cognitive Sci., vol. 14, no. 2, pp. 179--211, 1990.
[22]
A. J. Robinson, "An application of recurrent nets to phone probability estimation," IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 298--305, Aug. 1994.
[23]
L. Deng, K. Hassanein, and M. Elmasry, "Analysis of the correlation structure for a neural predictive model with application to speech recognition," Neural Netw., vol. 7, no. 2, pp. 331--339, 1994.
[24]
T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur, "Recurrent neural network based language model," Proc. INTERSPEECH, Makuhari, Japan, Sep. 2010, pp. 1045--1048.
[25]
A. Graves, "Sequence transduction with recurrent neural networks," Proc. Representat. Learn. Workshp, ICML, 2012.
[26]
Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu, "Advances in optimizing recurrent networks," Proc. ICASSP, Vancouver, BC, Canada, May 2013, pp. 8624--8628.
[27]
J. Chen and L. Deng, "A primal-dual method for training recurrent neural networks constrained by the echo-state property," Proc. Int. Conf. Learn. Representat. (ICLR), 2014.
[28]
G. Mesnil, X. He, L. Deng, and Y. Bengio, "Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding," Proc. INTERSPEECH, Lyon, France, Aug. 2013.
[29]
L. Deng and J. Chen, "Sequence classification using high-level features extracted from deep neural networks," Proc. ICASSP, 2014, pp. 6844--6848.
[30]
F. A. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with LSTM," Neural Comput., vol. 12, pp. 2451--2471, 1999.
[31]
F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, "Learning precise timing with LSTM recurrent networks," J. Mach. Learn. Res., vol. 3, pp. 115--143, Mar. 2003.
[32]
J. Gao, W. Yuan, X. Li, K. Deng, and J.-Y. Nie, "Smoothing clickthrough data for web search ranking," Proc. 32Nd Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, New York, NY, USA, 2009, pp. 355--362, ser. SIGIR '09, ACM.
[33]
Y. Nesterov, "A method of solving a convex programming problem with convergence rate o (1/k2)," Soviet Math. Doklady, vol. 27, pp. 372--376, 1983.
[34]
I. Sutskever, J. Martens, and G. E. Dahl, G. E. Hinton, "On the importance of initialization and momentum in deep learning," Proc. ICML (3)'13, 2013, pp. 1139--1147.
[35]
R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," Proc. ICML 2013, ser. JMLR, 2013, vol. 28, pp. 1310--1318, JMLR.org.
[36]
K. Järvelin and J. Kekäläinen, "Ir evaluation methods for retrieving highly relevant documents," Proc. 23rd Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2000, pp. 41--48, ser. SIGIR. ACM.
[37]
T. Hofmann, "Probabilistic latent semantic analysis," Proc. Uncert. Artif. Intell. (UAI'99), 1999, pp. 289--296.
[38]
J. Gao, K. Toutanova, and W.-t. Yih, "Clickthrough-based latent semantic models for web search," Proc. Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2011, pp. 675--684, ser. SIGIR '11. ACM.
[39]
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," Proc. Int. Conf. Learn. Represent. (ICLR), 2013, arXiv:1301.3781.
[40]
R. Rehůřek and P. Sojka, "Software framework for topic modelling with large corpora," Proc. LREC Workshop New Challenges for NLP Frameworks, Valletta, Malta, May 2010, pp. 45--50, http://is.muni.cz/publication/884893/en, ELRA.
[41]
G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Large vocabulary continuous speech recognition with context-dependent DBN-HMMs," Proc. IEEE ICASSP, Prague, Czech, May 2011, pp. 4688--4691.
[42]
D. Yu and L. Deng, "Deep learning and its applications to signal and information processing {exploratory DSP}," IEEE Signal Process. Mag., vol. 28, no. 1, pp. 145--154, Jan. 2011.
[43]
G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30--42, Jan. 2012.
[44]
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82--97, Nov. 2012.
[45]
L. Deng, D. Yu and J. Platt, "Scalable stacking and learning for building deep architectures," Proc. ICASSP, Mar. 2012, pp. 2133--2136.
[46]
J. Gao, P. Pantel, M. Gamon, X. He, L. Deng, and Y. Shen, "Modeling interestingness with deep neural networks," Proc. EMNLP, 2014.
[47]
J. Gao, X. He, W. tau Yih, and L. Deng, "Learning continuous phrase representations for translation modeling," Proc. ACL, 2014.

Cited By

View all
  • (2024)Sentiment Analysis of Tourist Reviews on Historical and Cultural Districts Based on Tourism Reviews: A Case Study of Fujian ProvinceProceeding of the 2024 5th International Conference on Computer Science and Management Technology10.1145/3708036.3708099(358-362)Online publication date: 18-Oct-2024
  • (2024)Deep learning time series prediction models in Logging Curve prediction and generationProceedings of the 2024 9th International Conference on Intelligent Information Processing10.1145/3696952.3696964(80-88)Online publication date: 21-Nov-2024
  • (2024)Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-CommerceProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671559(5398-5408)Online publication date: 25-Aug-2024
  • Show More Cited By
  1. Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE/ACM Transactions on Audio, Speech and Language Processing
    IEEE/ACM Transactions on Audio, Speech and Language Processing  Volume 24, Issue 4
    April 2016
    211 pages
    ISSN:2329-9290
    EISSN:2329-9304
    • Editor:
    • H. LI
    Issue’s Table of Contents

    Publisher

    IEEE Press

    Publication History

    Published: 01 April 2016
    Published in TASLP Volume 24, Issue 4

    Author Tags

    1. deep learning
    2. long short-term memory
    3. sentence embedding

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Sentiment Analysis of Tourist Reviews on Historical and Cultural Districts Based on Tourism Reviews: A Case Study of Fujian ProvinceProceeding of the 2024 5th International Conference on Computer Science and Management Technology10.1145/3708036.3708099(358-362)Online publication date: 18-Oct-2024
    • (2024)Deep learning time series prediction models in Logging Curve prediction and generationProceedings of the 2024 9th International Conference on Intelligent Information Processing10.1145/3696952.3696964(80-88)Online publication date: 21-Nov-2024
    • (2024)Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-CommerceProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671559(5398-5408)Online publication date: 25-Aug-2024
    • (2024)Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-trainingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671513(4839-4849)Online publication date: 25-Aug-2024
    • (2024)MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click LabelsCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648327(292-301)Online publication date: 13-May-2024
    • (2024)Bike sharing and cable car demand forecasting using machine learning and deep learning multivariate time series approaches▪Expert Systems with Applications: An International Journal10.1016/j.eswa.2023.122264238:PEOnline publication date: 27-Feb-2024
    • (2024)Attentive deep neural networks for legal document retrievalArtificial Intelligence and Law10.1007/s10506-022-09341-832:1(57-86)Online publication date: 1-Mar-2024
    • (2024)A predictive analytics framework for sensor data using time series and deep learning techniquesNeural Computing and Applications10.1007/s00521-023-09398-936:11(6119-6132)Online publication date: 1-Apr-2024
    • (2024)Deep learning framework for stock price prediction using long short-term memorySoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-024-09836-328:17-18(10557-10567)Online publication date: 1-Sep-2024
    • (2024)Robust Interaction-Based Relevance Modeling for Online e-Commerce SearchMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track10.1007/978-3-031-70378-2_4(55-71)Online publication date: 8-Sep-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media