Abstract
In the Big Data era, there is a need for powerful visual-based analytics tools when pictures have replaced texts and become main contents on the Internet. Hence, in this study, we explore convolutional neural networks with a goal of resolving clothing style classification and retrieval tasks. To reduce training complexity, low-level and mid-level features were learned in the deep models on large-scale datasets and then transfer learning is incorporated by fine-tuning pre-trained models using the clothing dataset. However, a large amount of collected data needs huge computations for tuning parameters. Therefore, one architecture inspired from Adaboost is designed to use multiple deep nets that are trained with a sub-dataset. Thus, the training time can be accelerated if each net is computed in one client node in a distributed computing environment. Moreover, to increase system flexibility, two architectures with multiple deep nets with two outputs are proposed for binary-class classification. Therefore, when new classes are added, no additional computation is needed for all training data. In order to integrate output responses from multiple nets, classification rules are proposed as well. Experiments are performed to compare existing systems with hand-crafted features. According to the results, the proposed system can provide significant improvements on three public clothing datasets for style classifications, particularly on the large dataset with 80,000 images where an improvement of 18% in accuracy was recognized.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arel I, Rose DC, Karnowski TP (2010) Deep machine learning—a new frontier in artificial intelligence research. IEEE Comput Intell Mag 5(4):13–18
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Recog Mach Intell 35(8):1798–1828
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: International conference on neural information processing systems, pp 153–160
Bossard L, Dantone M, Leistner C, Wengert C, Quack T, Gool LV (2013) Apparel classification with style. In: Asia conference on computer vision, vol 7727, pp 321–335
Chen XW, Lin X (2014) Big data deep learning: challenges and perspectives. IEEE Access 2:514–525
Chen JC, Liu CF (2015) Visual-based deep learning for clothing from large database. In: ASE BigData & SocialInformatcis
Chen JC, Xue BF, Lin Kawuu W (2015a) Dictionary learning for discovering visual elements of fashion styles. In: CEC workshop
Chen Q, Huang J, Feris R, Brown LM, Dong J, Yan S (2015b) Deep domain adaptation for describing people based on fine-grained clothing attributes. In: IEEE conference on computer vision and pattern recognition, pp 5315–5324
Ciresan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep big simple neural nets excel on handwritten digit recognition. Neural Comput 22(12):3207–3220
Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pretrained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–41
Dean J (2012) Large scale distributed deep networks. In: International conference on neural information processing systems, pp 1232–1240
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. ACM Mag 51(1):107–113
Deng J, Berg AC, Li FF (2011) Hierarchical semantic indexing for large scale image retrieval. In: IEEE conference on computer vision and pattern recognition, pp 785–792
Di W, Wah C, Bhardwaj A, Piramuthu R, Sundaresan N (2013) Style finder: fine-grained clothing style recognition and retrieval. In: IEEE conference on computer vision and pattern recognition workshops, pp 8–13
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) DeCAF: a deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531
Efrati A (2013) How deep learning works at Apple. Information. https://www.theinformation.com/How-Deep-Learning-Works-at-Apple-Beyond
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Recog Mach Intell 35(8)
Gantz J, Reinsel D (2011) Extracting value from chaos. IDC iView. https://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 580–587
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier networks. In: International conference on artificial intelligence and statistics, pp 315–323
Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. arXiv preprint. arXiv:1302.4389
Hinton G, Osindero S (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Hinton G, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing coadaptation of feature detectors. arXiv:1207.0508
Huang J, Feris RS, Chen Q, Yan S (2015) Cross-domain image retrieval with a dual attribute-aware ranking network. arXiv preprint arXiv:1505.07922
Jagadeesh V, Piramuthu R, Bhardwaj A, Di W, Sundaresan N (2014) Large scale visual recommendations from street fashion images. In: ACM SIGKDD International conference on knowledge discovery and data mining, pp 1925–1934
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Caffe DT (2014) Caffe: convolutional architecture for fast feature embedding. In: International conference on multimedia, pp 675–678
Jones N (2014) Computer science: the learning machines. Nature 505(7482):146–148
Kalantidis Y, Kennedy L, Li LJ (2013) Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In: ACM international conference in multimedia retrieval, pp 105–112
Khosla N, Venkataraman V (2015) Building image-based shoe search using convolutional neural networks. In: CS231n course project reports
Kiapour MH, Yamaguchi K, Berg AC, Berg TL (2014) Hipster wars: discovering elements of fashion styles. In: European conference on computer vision, pp 472–488
Krizhevsky A (2012) Cuda-convnet. https://code.google.com/p/cuda-convnet/
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp 1106–1114
Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition, pp 3361–3368
Le Q, Ranzato M, Monga R, Devin M, Chen K, Corrado G, Dean J, Ng A (2012) Building high-level features using large scale unsupervised learning. In: International conference on machine learning, pp 81–88
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. IEEE Proc 86(11):2278–2324
Lin M, Chen Q, Yan S (2013) Network in network. In: International conference on learning representations. arXiv:1312.4400
Lin K, Yang HF, Liu KH, Hsiao JH, Chen CS (2015) Rapid clothing retrieval via deep learning of binary codes and hierarchical search. In: ACM international conference in multimedia retrieval, pp 499–502
Liu C, Yuen J, Torralba A (2011) Nonparametric scene parsing via label transfer. IEEE Trans Pattern Recog Mach Intell 33(12):2368–2382
Liu S, Feng J, Song Z, Zhang T, Lu H, Xu C, Yan S (2012) Hi, magic closet, tell me what to wear! In: International conference on multimedia, pp 619–628
Liu S, Feng J, Domokos C, Xu H, Huang J, Hu Z, Yan S (2014) Fashion parsing with weak color-category labels. IEEE Trans Multimedia 16(1):253–265
Liu S, Liang X, Liu L, Shen X, Yang J, Xu C, Lin L, Cao X, Yan S (2015) Matching-CNN meets KNN: quasi-parametric human parsing. arXiv:1504.01220
Long J, Zhang N, Darrell T (2014) Do convnets learn correspondence. In: International conference on neural information processing systems, pp 1601–1609
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91110
Mohamed A, Dahl G, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. TPAMI 24(7):971–987
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, pp 1717–1724
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops, pp 512–519
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229
Socher R, Huang EH, Pennington J, Ng AY, Manning CD (2011a) Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: International conference on neural information processing systems, pp 801–809
Socher R, Lin C, Ng A (2011b) Parsing natural scenes and natural language with recursive neural Networks. In: International conference on machine learning, pp 129–136
Song Z, Wang, Hua MX, Yan S (2011) Predicting occupation via human clothing and contexts. In: International conference on computer vision, pp 1084–1091
Sukumar SR (2014) Machine learning in the big data era: are we there yet? In: ACM SIGKDD conference on knowledge discovery and data mining: workshop on data science for social good
Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: IEEE conference on computer vision and pattern recognition. arXiv:1412.1265
Tung F, Little JJ (2014) Collage parsing: nonparametric scene parsing by adaptive overlapping windows. ECCV 8694:511–5252
Wang Y, Yu D, Ju Y, Acero A (2011) Voice search. In: Language understanding: systems for extracting semantic information from speech, pp 119–146
Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2012) Parsing clothing in fashion photographs. In: IEEE conference on computer vision and pattern recognition, pp 3570–3577
Yamaguchi K, Kiapour MH, Berg TL (2013) Paper doll parsing: retrieving similar styles to parse clothing items. In: International conference on computer vision, pp 3519–3526
Yamaguchi K, Berg TL, Ortiz LE (2014) Chic or social: visual popularity analysis in online fashion networks. In: ACM conference on multimedia, pp 773–776
Yang W, Luo P, Lin L (2014) Clothing co-parsing by joint image segmentation and labeling. In: IEEE conference on computer vision and pattern recognition, pp 3182–3189
Zhang N, Paluri M, Ranzato M, Darrell T, Bourdev L (2014) PANDA: pose aligned networks for deep attribute modeling. In: IEEE conference on computer vision and pattern recognition, pp 1637–1644
Acknowledgements
This work is supported by National Science Council (NSC), Taiwan, under Contract of MOST 104-2221-E-151-028. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Communicated by C.-H. Chen.
Rights and permissions
About this article
Cite this article
Chen, JC., Liu, CF. Deep net architectures for visual-based clothing image recognition on large database. Soft Comput 21, 2923–2939 (2017). https://doi.org/10.1007/s00500-017-2585-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2585-8