Abstract
As social media data rapidly grows, sentiment analysis plays an increasingly more important role in classifying users’ opinions, attitudes and feelings expressed in text. However, most studies have been focused on the effectiveness of sentiment analysis, while ignoring the storage efficiency when processing large-scale high-dimensional text data. In this paper, we incorporate the machine learning based sentiment analysis with our proposed Locality Sensitive One-Bit Min-Hash (BitHash) method. BitHash compresses each data sample into a compact binary hash code while preserving the pairwise similarity of the original data. The binary code can be used as a compressed and informative representation in replacement of the original data for subsequent processing, for example, it can be naturally integrated with a classifier like SVM. By using the compact hash code, the storage space is significantly reduced. Experiment on the popular open benchmark dataset shows that, as the hash code length increases, the classification accuracy of our proposed method could approach the state-of-the-art method, while our method only requires a significantly smaller storage space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Broder, A.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences 1997, p. 21 (1997)
Charikar, M.: Similarity estimation techniques from rounding algorithms. In: ACM Symposium on Theory of Computing, pp. 380–388 (2002)
Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification (2003)
Li, P., Shrivastava, A., Moore, J.L., König, A.C.: Hashing algorithms for large-scale learning. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2011)
Liu, B.: Sentiment analysis and subjectivity. Handb. Nat. Lang. Process. 2, 627–666 (2010)
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland (2011). http://www.aclweb.org/anthology/P11-1015
Mesnil, G., Ranzato, M., Mikolov, T., Bengio, Y.: Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews (2014). arXiv preprint arXiv:1412.5335
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH, pp. 1045–1048 (2010)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)
Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)
Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for large-scale search. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2393–2406 (2012)
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 90–94. Association for Computational Linguistics (2012)
Zhu, J., Xing, E.P.: Conditional topic random fields. In: Proceedings of the 27th International Conference on Machine Learning (2010)
Acknowledgments
The work was supported by the National Basic Research Program (973 Program) of China (No. 2013CB329403), National Natural Science Foundation of China (Nos. 61322308, 61332007), and the Tsinghua National Laboratory for Information Science and Technology Big Data Initiative.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, W., Ji, J., Zhu, J., Xu, H., Zhang, B. (2015). Large Scale Sentiment Analysis with Locality Sensitive BitHash. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-28940-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28939-7
Online ISBN: 978-3-319-28940-3
eBook Packages: Computer ScienceComputer Science (R0)