Multi-label incremental learning applied to web page categorization

Patrick Marques Ciarelli¹,
Elias Oliveira² &
Evandro O. T. Salles¹

482 Accesses
11 Citations
Explore all metrics

Abstract

Multi-label problems are challenging because each instance may be associated with an unknown number of categories, and the relationship among the categories is not always known. A large amount of data is necessary to infer the required information regarding the categories, but these data are normally available only in small batches and distributed over a period of time. In this work, multi-label problems are tackled using an incremental neural network known as the evolving Probabilistic Neural Network (ePNN). This neural network is capable of continuous learning while maintaining a reduced architecture, so that it can always receive training data when available with no drastic growth of its structure. We carried out a series of experiments on web page data sets and compared the performance of ePNN to that of other multi-label categorizers. On average, ePNN outperformed the other categorizers in four out of five metrics used for evaluation, and the structure of ePNN was less complex than that of the other algorithms evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

CascadeML: An Automatic Neural Network Architecture Evolution and Training Algorithm for Multi-label Classification (Best Technical Paper)

A Neural Network Model for Online Multi-Task Multi-Label Pattern Recognition

Incremental supervised learning: algorithms and applications in pattern recognition

Article 02 February 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Times were obtained using a PC with an Intel Dual Core 2.30 GHz processor with 4 GB of RAM.
Data set available at http://www.inf.ufes.br/alberto/vitoria.tar.gz.
Data available at http://mulan.sourceforge.net/datasets.html.

References

Baeza-Yates R, Ribeiro-Neto B (1998) Modern information retrieval, 1st edn. Addison-Wesley, New York
Google Scholar
Bevington PR, Robinson DK (2003) Data reduction and error analysis for the physical sciences, 3rd edn. Mc Graw Hill, New York
Google Scholar
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771
Article Google Scholar
Bueno R, Traina AJM, Traina JC (2007) Genetic algorithms for approximate similarity queries. Data Knowl Eng 62(3):459–482
Article Google Scholar
Cheng W, Hullermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225
Article Google Scholar
Ciarelli PM, Oliveira E, Salles EOT (2010) An evolving system based on probabilistic neural network. 11th Brazilian symposium on neural networks, pp 1–6
Ciarelli PM, Oliveira E, Salles EOT (2012) An incremental neural network with a reduced architecture. Neural Netw 35:70–81
Article Google Scholar
CNAE (2003) Classificaçõ Nacional de Atividades Econômicas—Fiscal (CNAE-Fiscal) 1.1. Tech. rep., Instituto Brasileiro de Geografia e Estatística (IBGE), Rio de Janeiro, RJ
Comité FD, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision tree from texts and data. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition, vol 2734, pp 35–49
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38
MATH MathSciNet Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, New York
Google Scholar
Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687
Google Scholar
Oliveira E, Ciarelli PM, Badue C, Souza AFD (2008) A comparison between a kNN based approach and a PNN algorithm for a multi-label classification problem. In: 8th international conference on intelligent systems design and applications, pp 628–633
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes—the art of scientific computing, 3rd edn. Cambridge University Press, New York
Google Scholar
Saad R, Halgamuge SK, Li J (2007) Polynomial kernel adaptation and extensions to the SVM classifier learning. Neural Comput Appl 17(1):19–25
Article Google Scholar
Sarinnapakorn K, Kubat M (2008) Induction from multi-label examples in information retrieval systems: a case study. Appl Artif Intell 22(5):407–432
Article Google Scholar
Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168
Article MATH Google Scholar
Souza AFD, Pedroni F, Oliveira E, Ciarelli PM, Henrique WF, Veronese L, Badue C (2009) Automated multi-label text categorization with VG-RAM weightless neural networks. Neurocomputing 72(10–12):2209–2217
Article Google Scholar
Specht DF (1988) Probabilistic neural networks for classification, mapping, or associative memory. IEEE Int Conf Neural Netw 1(24):525–532
Article Google Scholar
Spyromitros E, Tsoumakas G, Vlahavas I (2008) An empirical study of lazy multilabel classification algorithms. SETN ’08: proceedings of the 5th Hellenic conference on artificial intelligence, pp 401–406
Vlassis NA, Papakonstantinou G, Tsanakas P (1999) Mixture density estimation based on maximum likelihood and sequential test statistics. Neural Process Lett 9:63–76
Article Google Scholar
Yu C, Cui B, Wang S, Su J (2007) Efficient index-based kNN join processing for high-dimensional data. Inf Softw Technol 49(4):332–344
Article Google Scholar
Zhang ML, Zhou ZH (2007) ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Article MATH Google Scholar
Zhang ML, Pena JM, Robles V (2009) Feature selection for multi-label naive bayes classification. Inf Sci 179(19):3218–3229
Article MATH Google Scholar
Zhang Z, Chen C, Sun J, Chan KL (2003) EM algorithms for gaussian mixtures with split-and-merge operation. Pattern Recognit 36:1973–1983
Article MATH Google Scholar

Download references

Acknowledgments

We would like to thank Min-Ling Zhang for all the help with the ML-kNN categorization tool and web page data sets. P.M. Ciarelli thanks PPGEE (Programa de Pós-Graduação da Engenharia Elétrica) of UFES (Universidade Federal do Espírito Santo).

Author information

Authors and Affiliations

Departamento de Engenharia Elétrica, Universidade Federal do Espírito Santo, Campus de Goiabeiras, Av. Fernando Ferrari, s/n, Vitória, ES, Brazil
Patrick Marques Ciarelli & Evandro O. T. Salles
Departamento de Ciência da Informação, Universidade Federal do Espírito Santo, Campus de Goiabeiras, Av. Fernando Ferrari, s/n, Vitória, ES, Brazil
Elias Oliveira

Authors

Patrick Marques Ciarelli
View author publications
You can also search for this author in PubMed Google Scholar
Elias Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Evandro O. T. Salles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Marques Ciarelli.

Appendix

Suppose that a training instance x is presented to the neural network and the outputs of all components of each GMM in the neural network are calculated.

If the component s is the most activated, and it is not in the GMM assigned to the class of x, then it is desirable to reduce the output value of s and increase the output value of the component r, which is the most activated in the GMM assigned to the class of x. In other words, if

$$ f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r}) < f_s(x,\mu_{s},\Upsigma_{s},\varphi_{s}), $$

then it is desired to find new values of the receptive field sizes of the components s ($\varphi_{s_{\rm new}}$) and r ($\varphi_{r_{\rm new}}$), such that

$$ f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r_{\rm new}}) \geq f_s(x,\mu_{s},\Upsigma_{s},\varphi_{s_{\rm new}}). $$

(17)

The value of $\varphi_{r_{\rm new}}$ is computed using Eq. (6). To obtain the value of $\varphi_{s_{\rm new}}$, a constant η, 0 < η ≤ 1, should be added to Eq. (17) to achieve equality

$$ \eta f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r_{\rm new}}) = f_s(x,\mu_{s},\Upsigma_{s},\varphi_{s_{\rm new}}). $$

Therefore, from Eq. (2) one obtains

$$ \eta f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r_{\rm new}}) = \frac{1}{\sqrt{2\pi}\varphi_{s_{\rm new}}}\exp\left(\frac{D_s}{\varphi_{s_{\rm new}}^{2}}\right), $$

where D _s = x′^T μ′_s − 1.

To solve this equation, a Taylor expansion is applied to the exponential function to linearize it. Therefore,

$$ \eta f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r_{\rm new}}) = \frac{1}{\sqrt{2\pi}{\varphi_{s_{\rm new}}}} \left[\exp\left(\frac{D_s}{\varphi_{s}^2}\right) - 2\frac{D_s}{\varphi_{s}^{3}} \exp\left(\frac{D_s}{\varphi_{s}^2}\right)(\varphi_{s_{\rm new}} - \varphi_{s}) \right]. $$

The Taylor expansion is valid for small values of $\epsilon = \varphi_{s_{\rm new}} - \varphi_{s}$. After further manipulation, the value of $\varphi_{s_{\rm new}}$ can be obtained using Eq. (18),

$$ \varphi_{s_{\rm new}} = \frac{\varphi_{s} f_s(x,\mu_{s},\Upsigma_{s},\varphi_{s})(\varphi_{s}^{2} + 2D_s)} {\eta\varphi_{s}^{2}f_r(x,\mu_{r},\Upsigma_{r},\varphi_{r_{\rm new}}) + 2D_s f_s(x,\mu_{s},\Upsigma_{s},\varphi_{s})}. $$

(18)

To prevent the value of the receptive field size from being significantly altered when a single training instance is presented, two thresholds are employed. The first is used over the saturated linear function to limit the value and render the Taylor expansion applicable [parameter α in Eq. (7)]. The second threshold is used when the receptive field size is updated [parameter ρ in Eqs. (6)–(7)].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ciarelli, P.M., Oliveira, E. & Salles, E.O.T. Multi-label incremental learning applied to web page categorization. Neural Comput & Applic 24, 1403–1419 (2014). https://doi.org/10.1007/s00521-013-1345-7

Download citation

Received: 18 October 2011
Accepted: 25 January 2013
Published: 27 February 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s00521-013-1345-7

Multi-label incremental learning applied to web page categorization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CascadeML: An Automatic Neural Network Architecture Evolution and Training Algorithm for Multi-label Classification (Best Technical Paper)

A Neural Network Model for Online Multi-Task Multi-Label Pattern Recognition

Incremental supervised learning: algorithms and applications in pattern recognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-label incremental learning applied to web page categorization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CascadeML: An Automatic Neural Network Architecture Evolution and Training Algorithm for Multi-label Classification (Best Technical Paper)

A Neural Network Model for Online Multi-Task Multi-Label Pattern Recognition

Incremental supervised learning: algorithms and applications in pattern recognition

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation