Abstract
This paper presents a state-of-the-art texture analysis method called “randomized neural network based signature” applied to the classification of pap-smear cell images for the Papanicolaou test. For this purpose, we used a well-known benchmark dataset composed of 917 images and compared the aforementioned image signature to other texture analysis methods. The obtained results were promising, presenting accuracy of 87.57% and AUC of 0.8983 using LDA and SVM, respectively. These performance values confirm that the randomized neural network based signature can be applied successfully to this important medical problem.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Texture is among the most important attributes in computer vision and has been the focus of intensive research throughout the years. In a concise term, we can define texture as an arrangement of sub-patterns, which can be pixels, regions or other visual attributes [1]. Obviously, such definition is quite restrict and does not encompass a great variety of images (for instance, smoke, mammograms, fire, water etc.), which present a persistent stochastic pattern with a cloud-like appearance [2].
Even though texture lacks a formal definition, it is a feature easily understood by the human visual system. Such importance has motivated the development of many techniques for the analysis and recognition of texture patterns, making this a field of intense research [3]. Among the many techniques available, there are those that describe the image texture using second-order statistics [4, 5], spectral analysis (e.g., Fourier and Gabor filters) [6,7,8], local binary patterns [9], gravitational systems [10] and agents walking over the texture pattern [1].
Medical image analysis is a field of intense research, with many approaches being developed over the years. For instance, [11] proposed LBP variants as texture descriptors for medical image analysis, which were evaluated in different medical datasets, such as cell phenotype image classification, neonatal facial images classification of pain states and detection of abnormal smear cells. In [12], a fuzzy clustering algorithm was proposed for brain tumor segmentation. The authors stated that the fuzzy clustering enables many cases of uncertainty to be considered during the segmentation process. Breast cancer risk assessment has been the focus of many studies on texture analysis. In [13], two types of texture features are proposed to assess breast cancer risk: textons based on local pixel intensities and features based on oriented tissue structures. In [14], background intensity independent texture features were proposed for mammogram classification. Another topic of intense research is the identification and segmentation of melanocytic skin lesions. Machine learning techniques were used in [15] to select the parameters of a classification framework of melanocytic lesions. The paper [16] presented an approach using a feature learning scheme and normalized graph cuts for skin lesion image segmentation.
This paper proposes to apply a recent and very discriminative texture analysis method to a relevant medical problem, which consists of classifying pap-smear cells to discover pre-cancerous or cancerous stages in the cervix. Section 2 briefly describes the randomized neural network and how to use it to obtain an image signature from its neuron weights. Section 3 presents the pap-smear database, the other texture analysis methods used for comparison and the classification procedure. Section 4 discusses the obtained results, and, finally, Sect. 5 presents some remarks about this work.
2 Randomized Neural Network and its Texture Signature
A randomized neural network [17,18,19,20] is a recent proposal of neural network that has only two neuron layers and a very fast training procedure. In the hidden layer, the weights of the neurons are randomly determined according to a uniform or Gaussian distribution. These weights can be arranged in a matrix
where each line represent the weights of a determined hidden neuron q, p is the number of attributes of an input vector \({{\varvec{x}}}\), and Q is the total of hidden neurons.
Let \(X=\left[ {{{\varvec{x}}}_\mathbf{1}}, {{{\varvec{x}}}_\mathbf{2}}, \dots , {{{\varvec{x}}}_{{\varvec{N}}}}\right] \) and \(D=\left[ {{{\varvec{d}}}_\mathbf{1}}, {{{\varvec{d}}}_\mathbf{2}}, \dots , {{{\varvec{d}}}_{{\varvec{N}}}}\right] \) be matrices representing the input vectors \({{{\varvec{x}}}_{{\varvec{i}}}}\) and their respective labels \({{{\varvec{d}}}_{{\varvec{i}}}}\) (N is the number of feature vectors). Then, after inserting a new first line composed of \(-1\) into X (for bias), we can provide the output of the hidden neurons according to the equation \(Z=\phi (WX)\), where \(\phi (.)\) is a transfer function (in general, logistic or hyperbolic function).
Next, we create a matrix \(Z=\left[ {{{\varvec{z}}}_\mathbf{1}}, {{{\varvec{z}}}_\mathbf{2}}, \dots , {{{\varvec{z}}}_{{\varvec{N}}}}\right] \) representing the output of the hidden neurons for each input feature vector \({{{\varvec{x}}}_{{\varvec{i}}}}\). Again, we insert a new first line composed of \(-1\) into Z (for bias) and the objective is to solve \(D=MZ\), where M represents the weights of the output neurons. The matrix M can be easily obtained after some simple matrix operations, according to the following equation
2.1 Randomized Neural Network Texture Signature
The random neural network texture signature is proposed in the paper [21] and consists of using image pixels as input and label data in order to train a randomized neural network. Next, the weights of the output neuron layer of this trained network are used as the image signature. For this purpose, the image is divided into overlapping windows \(K \times K\) (\(K=\{3,5,7\}\)). For each window, its border pixels are used as input feature vector \({{{\varvec{x}}}_{{\varvec{i}}}}\) and its central pixel is used as the respective scalar label \(d_i\). Thus, we have 8-, 16- and 24-dimensional feature vectors \({{{\varvec{x}}}_{{\varvec{i}}}}\) for the aforementioned window sizes, respectively.
The next step is to determine the values of the matrix W. For this, the paper [21] adopted the Linear Congruent Generator (LCG) [22, 23] to produce pseudorandom values in a uniform distribution. The parameter values for the “seed” and other adjustment parameters are based on the value Q (number of hidden neurons) and p (dimensionality of the input feature vector). All the values of W and each line of the matrix X are normalized to have zero mean and unit variance. Finally, the logistic transfer function is used in all the neurons.
Once these fundamental procedures are determined, it is possible to construct two signatures based on Eq. 2, which becomes a vector \({{\varvec{f}}}=DZ^{T}(ZZ^{T})^{-1}\) because D is also a vector. The first signature considers only one value Q for multiples values K, as follows
The second signature, which consists of the concatenation of the previous signature for different values Q, is determined according to the following equation
A detailed description of the randomized neural network based signature can be found in the paper [21].
3 Experiments
3.1 Pap-smear database
The pap-smear database [24] is a collection of 917 cell images extracted from cervices. The images were obtained at the Herlev University Hospital and were classified into 7 groups, which are: normal superficial squamous epithelial (74 cells); normal intermediate squamous epithelial (70 cells); normal columnar epithelial (98 cells); mild squamous non-keratinizing dysplasia (182 cells); abnormal moderate squamous non-keratinizing dysplasia (146 cells); abnormal severe squamous non-keratinizing dysplasia (197 cells); and abnormal squamous cell carcinoma in situ intermediate (150 cells). These cell images can also be classified into two groups: normal cells (242 images) and abnormal cells (675 images). In our experiments, all the images were converted into grayscale. Moreover, we addressed only the 2-class problem, since the 7-class problem is still a challenge for texture analysis methods. Figure 1 shows one sample of each class.
3.2 Classification Procedure
In the randomized neural network texture signature, we used the same parameter values adopted in the paper [21], that is, \(Q=\{19,39\}\), \(K=\{3,5,7\}\) for the second signature (Eq. 4) in order to establish a fair comparison with the other texture methods, in which we used parameter values according to either their respective papers or the common use. At this point, it is important to mention that, even though the paper [21] proposes a strategy to make the method more robust to rotation, we did not use it for two reasons: first, there is no orientation in the pap-smear cells; second, the method is faster without this strategy.
In order to assess the performance of the method, we compare it to other classical and recent texture analysis. They are: Co-occurrence matrices [5], Wavelets descriptors [25, 26], Tourist Walk [27], Discrete Cosine Transform (DCT) [28], Lacunarity 3D [29], Local binary patterns (LBP) [9], Gray Level Difference Matrix (GLDM) [30, 31] and Complex Network Texture Descriptor (CNTD) [32].
For classification, we used the Linear Discriminant Analysis [33], which is a classical statistical classifier that creates hyperplanes among the groups based on the their centroid vectors and the covariance matrix of all the samples. As strategy validation, we adopted the leave-one-out cross-validation, which uses one sample for testing the remainder for training. This process is repeated N times (N is the number of samples), each time with a different sample for testing. The performance measure is the average of the N accuracies.
We also obtained the AUC (Area Under the ROC Curve) [34] to compare it to the highest AUC values obtained in two recent papers [11, 35], which compared several LBP variants applied to the pap-smear database. To assess the randomized neural network signature, we used the same procedures present in these two works: a Linear Support Vector Machine (SVM) as classifier and the 5-fold cross-validation. The paper [11] does not mention the parameter values used, but the paper [35] uses the default parameter values of the LIBSVM [36], which is a public library for SVM. Thus, for a fair comparison, we also used the default parameter values of this library. Moreover, because 5-fold-cross-validation is not a deterministic strategy, we performed 101 validation runs and adopted the median AUC value as the performance measure of the randomized neural network signature.
4 Results and Discussion
Table 1 shows the comparison of the randomized neural network signature with other grayscale texture analysis methods. As one can see, the neural network approach surpasses all the compared methods in terms of accuracy. One disadvantage of the method is its excessively large number of descriptors. However, it is important to notice that its accuracy is \(1.20\%\) superior to the second best method (wavelet descriptors). This percentage represents 11 more images correctly classified by the method, thus corroborating its efficiency and ability to discriminate pap-smear samples, a challenging database in which any improvement is desirable.
Table 2 shows the median AUC obtained by the neural network signature and all the compared approaches, as well as the highest AUC values present in two recent papers. As it is possible to notice, the randomized neural network signature obtained the second best result among all the methods. Although this performance is already impressive, it is important to emphasize that ENS and MAG1 provide the highest AUC values of the papers [11] and [35], respectively. Thus, considering that the paper [11] applied nine LBP variants to the pap-smear database, and the paper [35] performed more than 50 tests on this same dataset, our obtained result acquires an even higher perspective and demonstrates that the randomized neural network descriptors are very discriminative in pap-smear cell images.
5 Conclusion
This paper presented the application of a very discriminative texture analysis method to the highly relevant medical problem of classifying pap-smear cells. The randomized neural network texture signature obtained a high performance in this problem, surpassing all the compared methods (LDA experiment) and presenting the second best AUC value, which is comparable to the highest results of two recent papers that address the same problem. Thus, it is possible to affirm that the randomized neural network signature is suitable for the pap-smear problem, and, therefore, adds a new tool to the computer vision research focused on the Papanicolaou test.
References
Backes, A.R., Martinez, A.S., Bruno, O.M.: Texture analysis based on maximum contrast walker. Pattern Recogn. Lett. 31(12), 1701–1707 (2010)
Kaplan, L.M.: Extended fractal analysis for texture classification and segmentation. IEEE Trans. Image Process. 8(11), 1572–1585 (1999)
Bhosle, V.V., Pawar, V.P.: Texture segmentation: different methods. Int. J. Soft Comput. Eng. 3, 69–74 (2013)
Zwiggelaar, R.: Texture based segmentation: automatic selection of co-occurrence matrices. In: ICPR, vol. I, pp. 588–591 (2004)
Haralick, R.M.: Statistical and structural approaches to texture. Proc. IEEE 67(5), 786–804 (1979)
Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)
Dawood, H., Dawood, H., Guo, P.: Efficient texture classification using short-time Fourier transform with spatial pyramid matching. In: SMC, pp. 2275–2279. IEEE (2013)
Li, C., Huang, Y., Zhu, L.: Color texture image retrieval based on Gaussian copula models of Gabor wavelets. Pattern Recogn. 64, 118–129 (2017)
Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)
Sá Junior, J.J.M., Backes, A.R.: A simplified gravitational model to analyze texture roughness. Pattern Recogn. 45(2), 732–741 (2012)
Nanni, L., Lumini, A., Brahnam, S.: Local binary patterns variants as texture descriptors for medical image analysis. Artif. Intell. Med. 49(2), 117–125 (2010)
Ananthi, V.P., Balasubramaniam, P., Kalaiselvi, T.: A new fuzzy clustering algorithm for the segmentation of brain tumor. Soft. Comput. 20(12), 4859–4879 (2016)
Li, X.Z., Williams, S., Bottema, M.J.: Texture and region dependent breast cancer risk assessment from screening mammograms. Pattern Recogn. Lett. 36, 117–124 (2014)
Li, X.Z., Williams, S., Bottema, M.J.: Background intensity independent texture features for assessing breast cancer risk in screening mammograms. Pattern Recogn. Lett. 34(9), 1053–1062 (2013)
Capdehourat, G., Corez, A., Bazzano, A., Alonso, R., Musé, P.: Toward a combined tool to assist dermatologists in melanoma detection from dermoscopic images of pigmented skin lesions. Pattern Recogn. Lett. 32(16), 2187–2196 (2011)
Flores, E.S., Scharcanski, J.: Segmentation of melanocytic skin lesions using feature learning and dictionaries. Expert Syst. Appl. 56, 300–309 (2016)
Schmidt, W.F., Kraaijveld, M.A., Duin, R.P.W.: Feedforward neural networks with random weights. In: Proceedings of the 11th IAPR International Conference on Pattern Recognition, Conference B: Pattern Recognition Methodology and Systems, vol. II, pp. 1–4 (1992)
Pao, Y.H., Takefuji, Y.: Functional-link net computing: theory, system architecture, and functionalities. IEEE Comput. J. 25(5), 76–79 (1992)
Pao, Y.H., Park, G.H., Sobajic, D.J.: Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2), 163–180 (1994)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Sá Junior, J.J.M., Backes, A.R.: ELM based signature for texture classification. Pattern Recogn. 51, 395–401 (2016)
Lehmer, D.H.: Mathematical methods in large scale computing units. Ann. Comput. Lab. Harvard Univ. 26, 141–146 (1951)
Park, S.K., Miller, K.W.: Random number generators: good ones are hard to find. Commun. ACM 31(10), 1192–1201 (1988)
Jantzen, J., Norup, J., Dounias, G., Bjerregaard, B.: Pap-smear benchmark data for pattern classification. In: Proceedings of the NiSIS 2005, NiSIS, pp. 1–9 (2005)
Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)
Chang, T., Kuo, C.J.: Texture analysis and classification with tree-structured wavelet transform. IEEE Trans. Image Process. 2(4), 429–441 (1993)
Backes, A.R., Gonçalves, W.N., Martinez, A.S., Bruno, O.M.: Texture analysis and classification using deterministic tourist walk. Pattern Recogn. 43(3), 685–694 (2010)
Ng, I., Tan, T., Kittler, J.: On local linear transform and Gabor filter representation of texture. In: International Conference on Pattern Recognition, pp. 627–631 (1992)
Backes, A.R.: A new approach to estimate lacunarity of texture images. Pattern Recogn. Lett. 34(13), 1455–1461 (2013)
Weszka, J.S., Dyer, C.R., Rosenfeld, A.: A comparative study of texture measures for terrain classification. IEEE Trans. Syst. Man. Cybern. 6(4), 269–285 (1976)
Kim, J.K., Park, H.W.: Statistical textural features for detection of microcalcifications in digitized mammograms. IEEE Trans. Med. Imaging 18(3), 231–238 (1999)
Backes, A.R., Casanova, D., Bruno, O.M.: Texture analysis and classification: a complex network-based approach. Inf. Sci. 219, 168–180 (2013)
Webb, A.R.: Statistical Pattern Recognition, 2nd edn. Wiley, Chichester (2002)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Nanni, L., Lumini, A., Brahnam, S.: Survey on LBP based texture descriptors for image classification. Expert Syst. Appl. 39(3), 3634–3641 (2012)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Acknowledgments
Jarbas Joaci de Mesquita Sá Junior thanks CNPq (National Council for Scientific and Technological Development, Brazil) (Grant: 152054/2016-2 and 453835/2017-1) for the financial support of this work. André R. Backes gratefully acknowledges the financial support of CNPq (Grant #302416/2015-3) and FAPEMIG (Foundation to the Support of Research in Minas Gerais) (Grant #APQ-03437-15). Odemir M. Bruno gratefully acknowledges the financial support of CNPq (307797/2014-7 and 484312/2013-8) and FAPESP (14/08026-1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
de Mesquita Sá Junior, J.J., Backes, A.R., Bruno, O.M. (2018). Pap-smear Image Classification Using Randomized Neural Network Based Signature. In: Mendoza, M., Velastín, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2017. Lecture Notes in Computer Science(), vol 10657. Springer, Cham. https://doi.org/10.1007/978-3-319-75193-1_81
Download citation
DOI: https://doi.org/10.1007/978-3-319-75193-1_81
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75192-4
Online ISBN: 978-3-319-75193-1
eBook Packages: Computer ScienceComputer Science (R0)