Frequency-Aware Truncated Methods for Sparse Online Learning

Hidekazu Oiwa²³,
Shin Matsushima²³ &
Hiroshi Nakagawa²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6912))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3413 Accesses

Abstract

Online supervised learning with L ₁-regularization has gained attention recently because it generally requires less computational time and a smaller space of complexity than batch-type learning methods. However, a simple L ₁-regularization method used in an online setting has the side effect that rare features tend to be truncated more than necessary. In fact, feature frequency is highly skewed in many applications. We developed a new family of L ₁-regularization methods based on the previous updates for loss minimization in linear online learning settings. Our methods can identify and retain low-frequency occurrence but informative features at the same computational cost and convergence rate as previous works. Moreover, we combined our methods with a cumulative penalty model to derive more robust models over noisy data. We applied our methods to several datasets and empirically evaluated the performance of our algorithms. Experimental results showed that our frequency-aware truncated models improved the prediction accuracy.

Download to read the full chapter text

Chapter PDF

Online Feature Selection by Adaptive Sub-gradient Methods

Unsupervised feature selection based on joint spectral learning and general sparse regression

Article 14 March 2019

Feature selection with MCP\(^2\) regularization

Article 26 April 2018

Keywords

References

Bertsekas, D.P.: Nonlinear Programming. Athena Scientific (1999)
Google Scholar
Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In: Association for Computational Linguistics (ACL), pp. 440–447 (2007), http://www.cs.jhu.edu/~mdredze/datasets/sentiment
Carpenter, B.: Lazy sparse stochastic gradient descent for regularized multinomial logistic regression. Technical report, Alias-i (2008)
Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 7, 551–585 (2006)
MathSciNet MATH Google Scholar
Dredze, M., Crammer, K.: Confidence-weighted linear classification. In: ICML, pp. 264–271 (2008)
Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. In: COLT, pp. 257–269 (2010)
Google Scholar
Duchi, J., Singer, Y.: Efficient Online and Batch Learning Using Forward Backward Splitting. Journal of Machine Learning Research 10, 2899–2934 (2009)
MathSciNet MATH Google Scholar
Lang, K.: Newsweeder: Learning to filter netnews. In: International Conference on Machine Learning (ICML), pp. 331–339 (1995), http://mlg.ucd.ie/datasets
Langford, J., Li, L., Zhang, T.: Sparse Online Learning via Truncated Gradient. J. Mach. Learn. Res. 10, 777–801 (2009)
MathSciNet MATH Google Scholar
Lewis, D.D.: Reuters-21578, http://www.daviddlewis.com/resources/testcollections/reuters21578
Matsushima, S., Shimizu, N., Yoshida, K., Ninomiya, T., Nakagawa, H.: Exact Passive-Aggressive Algorithm for Multiclass Classification Using Support Class. In: SDM, pp. 303–314 (2010)
Google Scholar
Nesterov, F.: Primal-Dual subgradient methods for convex problems. Mathematical Programming 120(1), 221–259 (2009)
Article MathSciNet MATH Google Scholar
Rosenblatt, F.: The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review 65, 386–408 (1958)
Article Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar
Tsuruoka, Y., Tsujii, J., Ananiadou, S.: Stochastic Gradient Descent Training for L1-regularized Log-linear. In: ACL-IJCNLP, pp. 477–485 (2009)
Google Scholar
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. In: Advances in Neural Information Processing Systems, vol. 23 (2009)
Google Scholar
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: International Conference on Machine Learning (ICML), pp. 928–936 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
Hidekazu Oiwa, Shin Matsushima & Hiroshi Nakagawa

Authors

Hidekazu Oiwa
View author publications
You can also search for this author in PubMed Google Scholar
Shin Matsushima
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Nakagawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oiwa, H., Matsushima, S., Nakagawa, H. (2011). Frequency-Aware Truncated Methods for Sparse Online Learning. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23783-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-23783-6_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23782-9
Online ISBN: 978-3-642-23783-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics