Abstract
In this paper, the research looks at improving clothing parsing using superpixels features extractor network (SP-FEN). Clothing parsing using a fully convolutional network has two parts: an encoder and decoder. The encoder lowers the dimensionality and produces a low-resolution prediction, while the decoder tries to upscale the prediction and returns it to the size of the input image. Typically, fine-grained details get lost in the encoding part of the model is not recovered well in the decoder part. To fix this issue, skip connections are typically used in recovering and adding more fine-grained details to the final prediction. A new method is proposed to introduce superpixels features to the decoder by adding a side network (SP-FEN) that extracts features from superpixels representation of the input image using the SLIC Algorithm. SP-FEN then produces a meaningful superpixels features to be injected into the decoder. The SP-FEN is learning to choose specific features to be fed to the decoder part to boost the outputs overall quality. The proposed method has shown to enhance the MIoU accuracy using the refined Fashionista V1.0 dataset and CFPD dataset. The results showed that the proposed approach achieved superior performance with pixel-wise segmentation and clothing parsing.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al. (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Bossard L, Dantone M, Leistner C, Wengert C, Quack T, Van Gool L (2012) Apparel classification with style. In: Asian conference on computer vision, Springer, Berlin, pp 321–335
Chao X, Huiskes MJ, Gritti T, Ciuhu C (2009) A framework for robust feature selection for real-time fashion style recommendation. In: Proceedings of the 1st international workshop on Interactive multimedia for consumer electronics, ACM, New York, pp 35–42
Chen H, Gallagher A, Girod B (2012) Describing clothing by semantic attributes. In: European conference on computer vision, Springer, BErlin, pp 609–623
Chen H, Xu ZJ, Liu ZQ, Zhu SC (2006) Composite templates for cloth modeling and sketching. In: 2006 IEEE computer society conference on computer vision and pattern recognition, IEEE, New York, vol 1, pp 943–950
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915
Chen Q, Huang J, Feris R, Brown LM, Dong J, Yan S (2015) Deep domain adaptation for describing people based on fine-grained clothing attributes. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, New York, pp 5315–5324
Cheng HD, Jiang XH, Sun Y, Wang J (2001) Color image segmentation: advances and prospects. Pattern Recogn 34(12):2259–2281
Di W, Wah C, Bhardwaj A, Piramuthu R, Sundaresan N (2013) Style finder: Fine-grained clothing style detection and retrieval. In: 2013 IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE, New York, pp 8–13
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: International conference on machine learning, pp 647–655
Dong J, Chen Q, Xia W, Huang Z, Yan S (2013) A deformable mixture parsing model with parselets. In: 2013 IEEE international conference on computer vision (ICCV), IEEE, New York, pp 3408–3415
Efford N (2000) Digital image processing: a practical introduction using java (with CD-ROM). Addison-Wesley Longman Publishing Co. Inc, Boston
Feris R, Bobbitt R, Brown L, Pankanti S (2014) Attribute-based people search: Lessons learnt from a practical surveillance system. In: Proceedings of international conference on multimedia retrieval, ACM, New York
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gonzalez RC, Woods RE (2018) Digital image processing. Pearson, New York, NY
Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from rgb-d images for object detection and segmentation. In: European conference on computer vision, Springer, Berlin, pp 345–360
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hsu E, Paz C, Shen S (2011) Clothing image retrieval for smarter shopping. EE368, Department of Electrical and Engineering, Stanford University
Hu Z, Yan H, Lin X (2008) Clothing segmentation using foreground and background estimation based on the constrained delaunay triangulation. Pattern Recogn 41(5):1581–1592
Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. arXiv:1608.06993
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360
Ji W, Li X, Zhuang Y, El Farouk Bourahla O, Ji Y, Li S, Cui J (2018) Semantic locality-aware deformable network for clothing segmentation. In: Proceedings of the 27th international joint conference on artificial intelligence, AAAI Press, pp 764–770
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Liang X, Lin L, Yang W, Luo P, Huang J, Yan S (2016) Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Trans Multimedia 18(6):1175–1186
Liu S, Feng J, Domokos C, Xu H, Huang J, Hu Z, Yan S (2014) Fashion parsing with weak color-category labels. IEEE Trans Multimedia 16(1):253–265
Liu S, Feng J, Song Z, Zhang T, Lu H, Xu C, Yan S (2012) Hi, magic closet, tell me what to wear! In: Proceedings of the 20th ACM international conference on Multimedia, ACM, New York, pp 619–628
Liu S, Song Z, Liu G, Xu C, Lu H, Yan S (2012) Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, New York, pp 3330–3337
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Marmanis D, Wegner JD, Galliani S, Schindler K, Datcu M, Stilla U (2016) Semantic segmentation of aerial images with an ensemble of cnss. ISPRS Ann Photogramm Remote Sens Spat Inf Sci 2016(3):473–480
Mizuochi M, Kanezaki A, Harada T (2014) Clothing retrieval based on local similarity with multiple images. In: Proceedings of the 22nd ACM international conference on Multimedia, ACM, pp 1165–1168
Redi M (2013) Novel methods for semantic and aesthetic multimedia retrieval. Ph.D. thesis, Université Nice Sophia Antipolis
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3859–3869
Shi S, Wang Q, Xu P, Chu X (2016) Benchmarking state-of-the-art deep learning software tools. In: 2016 7th international conference on cloud computing and big data (CCBD), IEEE, pp 99–104
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Song Z, Wang M, Hua XS, Yan S (2011) Predicting occupation via human clothing and contexts. In: 2011 IEEE international conference on computer vision (ICCV), IEEE, pp 1084–1091
Sonka M, Hlavac V, Boyle R (2014) Image processing, analysis, and machine vision. Cengage Learning, Boston
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, pp 4278–4284
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tangseng P, Wu Z, Yamaguchi K (2017) Looking at outfit to parse clothing. arXiv preprint arXiv:1703.01386
Taylor L, Nitschke G (2017) Improving deep learning using generic data augmentation. arXiv preprint arXiv:1708.06020
Vaquero DA, Feris RS, Tran D, Brown L, Hampapur A, Turk M (2009) Attribute-based people search in surveillance environments. In: 2009 Workshop on applications of computer vision (WACV), IEEE, pp 1–8
van der Walt S, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, Gouillart E, Yu T (2014) the scikit-image contributors: scikit-image: image processing in Python. PeerJ 2:e453. https://doi.org/10.7717/peerj.453
Wang H, Peng X, Xiao X, Liu Y (2017) Bslic: Slic superpixels based on boundary term. Symmetry 9(3):31
Wang LL, Chien CC et al (2007) Color texture segmentation for clothing based on finite prolate spheroidal sequences. Asian J Health Inf Sci 1(4):425–445
Weber M, Bauml M, Stiefelhagen R (2011) Part-based clothing segmentation for person retrieval. In: 2011 8th IEEE international conference on advanced video and signal-based surveillance (AVSS), IEEE, pp 361–366
Wong SC, Gatt A, Stamatescu V, McDonnell MD (2016) Understanding data augmentation for classification: when to warp? In: 2016 international conference on digital image computing: techniques and applications (DICTA), IEEE, pp 1–6
Wu X, Zhao B, Liang LL, Peng Q (2013) Clothing extraction by coarse region localization and fine foreground/background estimation. In: International conference on multimedia modeling, Springer, Berlin, pp 316–326
Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2012) Parsing clothing in fashion photographs. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3570–3577
Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2015) Retrieving similar styles to parse clothing. IEEE Trans Pattern Anal Mach Intell 37(5):1028–1040
Yang M, Yu K (2011) Real-time clothing recognition in surveillance videos. In: 2011 18th IEEE international conference on image processing (ICIP), IEEE, pp 2937–2940
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimedia 19(6):1245–1256
Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. arXiv preprint arXiv:1612.01105
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ihsan, A.M., Loo, C.K., Naji, S.A. et al. Superpixels Features Extractor Network (SP-FEN) for Clothing Parsing Enhancement. Neural Process Lett 51, 2245–2263 (2020). https://doi.org/10.1007/s11063-019-10173-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10173-y