Abstract
Colonoscopy is the most recommended test for preventing/detecting colorectal cancer. Nowadays, digital videos can be recorded during colonoscopy procedures in order to develop diagnostic support tools. Once video-frames are annotated, machine learning algorithms have been commonly used in the classification of normal-vs-abnormal frames. However, automatic analysis of colonoscopy videos becomes a challenging problem since segments of a video annotated as abnormal, such as cancer or polypos, may contain blurry, sharp and bright frames. In this paper, a method based on texture analysis, using Local Binary Patterns on the frequency domain, is presented. The method aims to automatically classify colonoscopy video frames into either informative or non-informative. The proposed method is evaluated using videos annotated by gastroenterologists for training a support vector machines classifier. Experimental evaluation shown values of accuracy over 97%.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Automatic detection of polyps and cancer, in colonoscopy videos, is frequently based on machine learning algorithms using supervised learning [3, 12, 13]. Machine learning algorithms build models based on training sets and those training sets are created with annotations manually made by gastroenterologists. Those annotations are done by segmenting colonic video files into shots representing endoscopic findings (lesions). For instance, a file annotation of an observed lesion (cancer) is 10:14:30 (time begin) and 10:16:27 (time end). It is highly possible that shots annotated as lesion contain frames with blur, low contrast, noise and/or brightness—called non-informative.
In fact, machine learning is an inverse and ill-posed problem, there is a set of assumptions that a learning algorithm makes about the true function that it is trying to learn a model off. In general, the training of a classifier should be done including a sufficiently big number of frames representing most of the possible frame configurations. Otherwise the classifier is obviously bad trained and does not provide the expected result. In our case, annotations—of cancer or polyps—contain non-informative frames that may affect the learning of a model and, consequently, produce a classifier with low accuracy. Thus, a preprocessing is needed before annotating colonoscopy videos and building machine learning models for the classification of normal-vs-abnormal frames.
Research has been conducted on identifying non-informative frames in colonoscopy videos. Methods based on edge detection and brightness segmentation in order to remove non-informative frames from colonoscopy videos are presented in [14, 19]. Other techniques based on color transformations [10], lumen detection [6, 9], video tracking framework [11], global features [8, 20], and texture analysis [14] were proposed to address this problem.
Results reported in [14] indicate that precision, sensitivity, specificity, and accuracy for the edge-based and the clustering-based classification techniques are greater than 90% and 95%, respectively, using the specular reflection detection technique. However, the comparison done by Rungseekajee and Phongsuphap in [18] between their proposed approach and the edge-based classification in [14] yields lower values for the edge-based classification presented in [14]. In [18] precision 90%, sensitivity 61%, specificity 50%, and accuracy 60% are reported.
In [2], we proposed a method based on edge detection using the hypothesis that non-informative frames usually do not contain many edges. Experimental evaluation showed values of accuracy and precision over 95%. In this paper, we explore the use of texture analysis to automatically classify informative and non-informative frames, in colonoscopy videos. Local Binary Pattern (LBP) operator [15] is used as texture descriptor and it is calculated on the frequency domain, and the Support Vector Machines (SVM) [5] is used for building a classifier. The proposed method aims to preprocess data sets before being used in machine learning algorithms by eliminating non-informative frames. Experimental evaluation shown values of accuracy over 97%.
2 Automatic Classification of Non-Infomative Frames
For the sake of completeness, a definition of non-informative and informative frames is presented in order to clarify the meaning in the application domain. The former corresponds to a frame out-of-focus, containing bubbles and/or light reflection artifacts due to wall contact and/or light reflections on water used to clean the colon wall and/or motion blur. The latter corresponds to a frame with well-defined content and spread over whole frame.
A description of the proposed automatically identification of non-informative frames is as follows. Given a video sequence, each frame is converted into gray-level scale and gray-level frame is transformed into the frequency domain using the Discrete Fourier Transform (DFT). Then, the LBP operator is applied at each pixel and a histogram is built to represent the content per frame. Finally, a classifier is created using the linear-SVM algorithm.
2.1 Frequency Domain Transform
The results shown in Fig. 1 indicate that the informative frame contain more low frequencies than the higher ones. The non-informative frame contains components of all frequencies, with smaller magnitudes for higher frequencies. The non-informative frame also contains a dominating direction in the Fourier image, passing vertically through the centre. These originate from the regular patterns in the original frame. We can observe that the frequency domain provides discriminant information about the frame content.
Each frame is converted into gray-scale and then transformed into the Fourier domain using the following equation [1].
where \(M \times N\) is the frame dimension, f(x, y) is a value at (x, y) position in the spatial domain and the exponential term is the basis function corresponding to each point F(u, v) in the Fourier space. The equation can be interpreted as the value of each point F(u, v) is obtained by multiplying the spatial domain with the corresponding base function and summing the result.
2.2 Texture Analysis
Initially, the texture analysis on the frequency spectrum was conducted using Haraclik features such as Angular Second Moment, Contrast, Correlation, Dissimilarity, Entropy, Energy and Uniformity [17], as it was done in [14]. However, the obtained results in experiments were not discriminant enough for the classification using SVM.
The Local Binary Pattern (LBP) is a common texture descriptor that contains several advantages, such as invariance and low computational cost [15]. In our approach, the LBP works on the frequency spectrum using a \(3\times 3\) kernel, where pixels around at the central pixel are thresholded with the value at the central pixel. A binary string is obtained as result of the LBP thresholding, it is converted into a decimal number and used to replace the central pixel value. Finally, a histogram of the LBP decimal numbers is calculated. The obtained histograms are used as frame content representations to classify colonoscopy frames. Figure 2 illustrates the LBP calculation.
2.3 Classification Using Support Vector Machines
The original algorithm for SVMs was proposed by Vapnik and Lerner in 1963 [21]. The algorithm solves a classification problem for linearly separable data. The algorithm finds a separating hyperplane that has the maximum distance from the closest input points. The hyperplane, if exists, is called maximum-margin hyperplane. The decision rule only depends on the dot product of the training vector and the unknown point.
We chose Support Vector Machines (SVMs) for building the classifier based on the problem, i.e. a binary classification, and the available training data, i.e. limited training dataset size. The training dataset are annotated by gastroenterologists and they usually have not spare time for this task.
3 Experimental Evaluation
The performance of the proposed method is evaluated in this section. Data set, experimental settings and evaluation criteria are presented. Tests were conducted using a Laptop with Windows 8 Pro X64, Intel (R) Core (TM) i5 @ 2.60 GHz and 4,00 GB RAM and the implementation was done using the C++ programing language. The classification of colonoscopy frames is performed using the LIBSVM Library [4] and its implementation requires the following parameters: a feature matrix that is constructed with the histograms of LBP descriptors; a column vector of labels that contain a class value. We used the class value 1 for informative frames and \(-1\) for non-informative ones. And the cost value C; a low cost value, such as \(c=1\), has a minor sensibility in the classification errors than a big one.
Several videos of complete colonoscopic procedures were recorded at the Hospital Universitario del Valle and three videos were selected to present evaluation results. Those videos have length of 6, 12 and 14 min, respectively, frame resolution of \(636 \times 480\) and they were recorded at 10 fps, using MP4 format and H264 compression. Frames are extracted—using the FFmpeg Multimedia Framework [7]—from video sequences. A total of 600 frames—taken 100 informative and 100 non-informative per video—were used. The selected frames were manually annotated by a gastroenterologist.
A set of metrics commonly used to evaluate the performance of a binary classification is employed [16]. The metrics are based on the confusion matrix and correspond to Sensitivity, Specificity, Accuracy, Precision and F-Measure.
3.1 Evaluation Conditions
The proposed method was evaluated using three evaluation conditions. The evaluation conditions are illustrated in the Fig. 3 and described as follows.
-
(A)
The LBP is calculated using the frequency domain image. The obtained descriptor is an array of 256 length.
-
(B)
The frequency domain image is divided into \(4\times 4\) blocks and the LBP is calculated on each block. This yields 16 histograms that are concatenated. The obtained descriptor is an array of 4096 length.
-
(C)
The frequency domain image is divided into \(4\times 4\) blocks and the \(2\times 2\) central blocks are used in the calculation of the LBP. This yields 4 histograms that are concatenated. The obtained descriptor is an array of 1024 length.
3.2 Results and Discussion
The SVM classification model is calculated using 80% of the data set—400 frames—as a training set and the remaining frames as test set—200 frames.
Table 1 contains the confusion matrices calculated under the three evaluation conditions and presents the number of frames correctly using TI for symbolising True Informative and TN for symbolising True Non-Informative, and the number of frames incorrectly classified using FI for symbolising False Informative and FN for symbolising False Non-Informative. The evaluation condition C yields the lowest false informative whilst the evaluation condition B yields the lowest false non-informative.
Having in mind that performance measures are calculated for assessing the capability of correctly classifying non-informative frames, Table 2 contains the obtained performance measure values using the evaluation conditions. The highest values of accuracy and precision were obtained under the evaluation condition C and the lowest values were obtained in the evaluation condition B.
Obtained results may be interpreted looking at Fig. 3 where frequency magnitudes get smaller for higher frequencies, that are located in the external blocks. In this way, LBP calculated using the four central blocks encodes the frequency domain.
Table 3 contains the results using the edge-based approach presented in [2]. Our proposed approach, using LBP computed at the central blocks in the frequency domain, outperforms the edge-based approaches proposed by us in [2] using the same training image set.
4 Final Remarks
In this paper, a method based on texture analysis was proposed to classify colonoscopy frames into two categories: informative and non-informative. The method uses the LBP descriptor as texture feature that are calculated on the frequency domain and the SVM as classifier.
The proposed classification method is able to correctly detect frames without relevant information, that should not be used for training machine learning algorithms in the classification of normal-vs-abnormal frames.
Moreover, the proposed method may be used to significantly reduce duration of videos—if frames classified as non-informative are deleted—before being analysed by gastroenterologists.
Metrics used to evaluate the performance of the proposed method shown that the accuracy and the precision are over 95% when the central blocks in the frequency domain are used to calculate the LBP descriptor and in general the proposed method outperforms the proposed approach in [2].
References
Acharya, T., Ray, A.K.: Image Processing: Principles and Applications. Wiley Interscience, Reading (2005)
Ballesteros, C., Trujillo, M., Mazo, C.: Automatic classification of non-informative frames in colonoscopy videos. In: 6th Latin American Conference on Networked and Electronic Media. LACNEM, facultad de Minas, Universidad Nacional sede Medellín (2015)
Bernal, J., Gil, D., Sánchez, C., Sánchez, F.J.: Discarding non informative regions for efficient colonoscopy image analysis. In: Luo, X., Reichl, T., Mirota, D., Soper, T. (eds.) CARE 2014. LNCS, vol. 8899, pp. 1–10. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13410-9_1
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software: http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Fanand, Y., Meng, M.Q., Li, B.: A novel method for informative frame selection in wireless capsule endoscopy video. In: 33rd Annual International Conference of the IEEE EMBS, Boston, Massachusetts, USA, pp. 4864–4867 (2011)
FFMPEG: Ffmpeg, multimedia framework (2015). https://www.ffmpeg.org/. Accessed 06 Nov 2015
Grega, M., Leszczuk, M., Duplaga, M., Fraczek, R.: Algorithms for automatic recognition of non-informative frames in video recordings of bronchoscopic procedures. Inf. Technol. Biomed. 69, 535–545 (2010)
Hwang, S., Lee, J., Cao, Y., Liu, D., Tavanapong, W., Wong, J., Oh, J., de Groen, P.: Automatic measurement of quality metrics for colonoscopy videos. In: International Collegiate Programming Contest, Singapore, pp. 912–921 (2005)
Khunand, P., Zhuo, Z., Yang, L., Liyuan, L., Jiang, L.: Feature Selection and Classification for Wireless Capsule Endoscopic Frames. Institute for Infocomm Research, Singapore (2009)
Liuand, J., Subramanian, K.R., Yoo, T.S.: A robust method to track colonoscopy videos with non-informative images. Int. J. Comput. Assist. Radiol. Surg. 08(4), 575–592 (2013)
Manivannan, S., Wang, R., Trucco, E., Hood, A.: Automatic normal-abnormal video frame classification for colonoscopy. In: 2013 IEEE 10th International Symposium on Biomedical Imaging (ISBI), pp. 644–647 (2013)
Manivannan, S., Wang, R., Trujillo, M.P., Hoyos, J.A., Trucco, E.: Video-specific SVMs for colonoscopy image classification. In: Luo, X., Reichl, T., Mirota, D., Soper, T. (eds.) CARE 2014. LNCS, vol. 8899, pp. 11–21. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13410-9_2
Oh, J., Hwang, S., Lee, J., Tavanapong, W., Wong, J., de Groen, P.C.: Informative frame classification for endoscopy video. Med. Image Anal. 11(1), 110–127 (2007)
Matti, P., Abdenour, H., Zhao, G., Ahonen, T.: Computer Vision Using Local Binary Patterns. Springer, Dordrecht (2011)
Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 02(1), 37–63 (2011)
Puetz, A., Olsen, R.: Haralick texture feature expanded into the spectral domain. In: Processing of SPIE, p. 6233 (2006)
Rangseekajee, N., Phongsuphap, S.: Endoscopy video frame classification using edge-based information analysis. In: Computing in Cardiology, pp. 549–552 (2011)
Rungseekajee, N., Lohvithee, M., Nilkhamhang, I.: Informative frame classification method for real-time analysis of colonoscopy video. In: 6th International Conference ECTI-CON 2009, vol. 02, no. 1, pp. 1076–1079 (2009)
Seguil, S., Drozdzal, M., Vilarino, F., Malagelada, C., Azpiroz, F., Radeva, P., Vitria, J.: Categorization and segmentation of intestinal content frames for wireless capsule endoscopy. IEEE Trans. Inf. Technol. Biomed. 16(6), 2341–2352 (2006). New York City, USA
Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 774–780 (1963)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ballesteros, C., Trujillo, M., Mazo, C., Chaves, D., Hoyos, J. (2017). Automatic Classification of Non-informative Frames in Colonoscopy Videos Using Texture Analysis. In: Beltrán-Castañón, C., Nyström, I., Famili, F. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2016. Lecture Notes in Computer Science(), vol 10125. Springer, Cham. https://doi.org/10.1007/978-3-319-52277-7_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-52277-7_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52276-0
Online ISBN: 978-3-319-52277-7
eBook Packages: Computer ScienceComputer Science (R0)