Abstract
Multi-label problems are challenging because each instance may be associated with an unknown number of categories, and the relationship among the categories is not always known. A large amount of data is necessary to infer the required information regarding the categories, but these data are normally available only in small batches and distributed over a period of time. In this work, multi-label problems are tackled using an incremental neural network known as the evolving Probabilistic Neural Network (ePNN). This neural network is capable of continuous learning while maintaining a reduced architecture, so that it can always receive training data when available with no drastic growth of its structure. We carried out a series of experiments on web page data sets and compared the performance of ePNN to that of other multi-label categorizers. On average, ePNN outperformed the other categorizers in four out of five metrics used for evaluation, and the structure of ePNN was less complex than that of the other algorithms evaluated.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Times were obtained using a PC with an Intel Dual Core 2.30 GHz processor with 4 GB of RAM.
Data set available at http://www.inf.ufes.br/alberto/vitoria.tar.gz.
Data available at http://mulan.sourceforge.net/datasets.html.
References
Baeza-Yates R, Ribeiro-Neto B (1998) Modern information retrieval, 1st edn. Addison-Wesley, New York
Bevington PR, Robinson DK (2003) Data reduction and error analysis for the physical sciences, 3rd edn. Mc Graw Hill, New York
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771
Bueno R, Traina AJM, Traina JC (2007) Genetic algorithms for approximate similarity queries. Data Knowl Eng 62(3):459–482
Cheng W, Hullermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225
Ciarelli PM, Oliveira E, Salles EOT (2010) An evolving system based on probabilistic neural network. 11th Brazilian symposium on neural networks, pp 1–6
Ciarelli PM, Oliveira E, Salles EOT (2012) An incremental neural network with a reduced architecture. Neural Netw 35:70–81
CNAE (2003) Classificaçõ Nacional de Atividades Econômicas—Fiscal (CNAE-Fiscal) 1.1. Tech. rep., Instituto Brasileiro de Geografia e Estatística (IBGE), Rio de Janeiro, RJ
Comité FD, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision tree from texts and data. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition, vol 2734, pp 35–49
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, New York
Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687
Oliveira E, Ciarelli PM, Badue C, Souza AFD (2008) A comparison between a kNN based approach and a PNN algorithm for a multi-label classification problem. In: 8th international conference on intelligent systems design and applications, pp 628–633
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes—the art of scientific computing, 3rd edn. Cambridge University Press, New York
Saad R, Halgamuge SK, Li J (2007) Polynomial kernel adaptation and extensions to the SVM classifier learning. Neural Comput Appl 17(1):19–25
Sarinnapakorn K, Kubat M (2008) Induction from multi-label examples in information retrieval systems: a case study. Appl Artif Intell 22(5):407–432
Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168
Souza AFD, Pedroni F, Oliveira E, Ciarelli PM, Henrique WF, Veronese L, Badue C (2009) Automated multi-label text categorization with VG-RAM weightless neural networks. Neurocomputing 72(10–12):2209–2217
Specht DF (1988) Probabilistic neural networks for classification, mapping, or associative memory. IEEE Int Conf Neural Netw 1(24):525–532
Spyromitros E, Tsoumakas G, Vlahavas I (2008) An empirical study of lazy multilabel classification algorithms. SETN ’08: proceedings of the 5th Hellenic conference on artificial intelligence, pp 401–406
Vlassis NA, Papakonstantinou G, Tsanakas P (1999) Mixture density estimation based on maximum likelihood and sequential test statistics. Neural Process Lett 9:63–76
Yu C, Cui B, Wang S, Su J (2007) Efficient index-based kNN join processing for high-dimensional data. Inf Softw Technol 49(4):332–344
Zhang ML, Zhou ZH (2007) ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Zhang ML, Pena JM, Robles V (2009) Feature selection for multi-label naive bayes classification. Inf Sci 179(19):3218–3229
Zhang Z, Chen C, Sun J, Chan KL (2003) EM algorithms for gaussian mixtures with split-and-merge operation. Pattern Recognit 36:1973–1983
Acknowledgments
We would like to thank Min-Ling Zhang for all the help with the ML-kNN categorization tool and web page data sets. P.M. Ciarelli thanks PPGEE (Programa de Pós-Graduação da Engenharia Elétrica) of UFES (Universidade Federal do Espírito Santo).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Suppose that a training instance x is presented to the neural network and the outputs of all components of each GMM in the neural network are calculated.
If the component s is the most activated, and it is not in the GMM assigned to the class of x, then it is desirable to reduce the output value of s and increase the output value of the component r, which is the most activated in the GMM assigned to the class of x. In other words, if
then it is desired to find new values of the receptive field sizes of the components s (\(\varphi_{s_{\rm new}}\)) and r (\(\varphi_{r_{\rm new}}\)), such that
The value of \(\varphi_{r_{\rm new}}\) is computed using Eq. (6). To obtain the value of \(\varphi_{s_{\rm new}}\), a constant η, 0 < η ≤ 1, should be added to Eq. (17) to achieve equality
Therefore, from Eq. (2) one obtains
where D s = x′T μ′ s − 1.
To solve this equation, a Taylor expansion is applied to the exponential function to linearize it. Therefore,
The Taylor expansion is valid for small values of \(\epsilon = \varphi_{s_{\rm new}} - \varphi_{s}\). After further manipulation, the value of \(\varphi_{s_{\rm new}}\) can be obtained using Eq. (18),
To prevent the value of the receptive field size from being significantly altered when a single training instance is presented, two thresholds are employed. The first is used over the saturated linear function to limit the value and render the Taylor expansion applicable [parameter α in Eq. (7)]. The second threshold is used when the receptive field size is updated [parameter ρ in Eqs. (6)–(7)].
Rights and permissions
About this article
Cite this article
Ciarelli, P.M., Oliveira, E. & Salles, E.O.T. Multi-label incremental learning applied to web page categorization. Neural Comput & Applic 24, 1403–1419 (2014). https://doi.org/10.1007/s00521-013-1345-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-013-1345-7