[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Deep net architectures for visual-based clothing image recognition on large database

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In the Big Data era, there is a need for powerful visual-based analytics tools when pictures have replaced texts and become main contents on the Internet. Hence, in this study, we explore convolutional neural networks with a goal of resolving clothing style classification and retrieval tasks. To reduce training complexity, low-level and mid-level features were learned in the deep models on large-scale datasets and then transfer learning is incorporated by fine-tuning pre-trained models using the clothing dataset. However, a large amount of collected data needs huge computations for tuning parameters. Therefore, one architecture inspired from Adaboost is designed to use multiple deep nets that are trained with a sub-dataset. Thus, the training time can be accelerated if each net is computed in one client node in a distributed computing environment. Moreover, to increase system flexibility, two architectures with multiple deep nets with two outputs are proposed for binary-class classification. Therefore, when new classes are added, no additional computation is needed for all training data. In order to integrate output responses from multiple nets, classification rules are proposed as well. Experiments are performed to compare existing systems with hand-crafted features. According to the results, the proposed system can provide significant improvements on three public clothing datasets for style classifications, particularly on the large dataset with 80,000 images where an improvement of 18% in accuracy was recognized.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Arel I, Rose DC, Karnowski TP (2010) Deep machine learning—a new frontier in artificial intelligence research. IEEE Comput Intell Mag 5(4):13–18

    Article  Google Scholar 

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Recog Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  • Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: International conference on neural information processing systems, pp 153–160

  • Bossard L, Dantone M, Leistner C, Wengert C, Quack T, Gool LV (2013) Apparel classification with style. In: Asia conference on computer vision, vol 7727, pp 321–335

  • Chen XW, Lin X (2014) Big data deep learning: challenges and perspectives. IEEE Access 2:514–525

  • Chen JC, Liu CF (2015) Visual-based deep learning for clothing from large database. In: ASE BigData & SocialInformatcis

  • Chen JC, Xue BF, Lin Kawuu W (2015a) Dictionary learning for discovering visual elements of fashion styles. In: CEC workshop

  • Chen Q, Huang J, Feris R, Brown LM, Dong J, Yan S (2015b) Deep domain adaptation for describing people based on fine-grained clothing attributes. In: IEEE conference on computer vision and pattern recognition, pp 5315–5324

  • Ciresan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep big simple neural nets excel on handwritten digit recognition. Neural Comput 22(12):3207–3220

    Article  Google Scholar 

  • Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pretrained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–41

    Article  Google Scholar 

  • Dean J (2012) Large scale distributed deep networks. In: International conference on neural information processing systems, pp 1232–1240

  • Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. ACM Mag 51(1):107–113

  • Deng J, Berg AC, Li FF (2011) Hierarchical semantic indexing for large scale image retrieval. In: IEEE conference on computer vision and pattern recognition, pp 785–792

  • Di W, Wah C, Bhardwaj A, Piramuthu R, Sundaresan N (2013) Style finder: fine-grained clothing style recognition and retrieval. In: IEEE conference on computer vision and pattern recognition workshops, pp 8–13

  • Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) DeCAF: a deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531

  • Efrati A (2013) How deep learning works at Apple. Information. https://www.theinformation.com/How-Deep-Learning-Works-at-Apple-Beyond

  • Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Recog Mach Intell 35(8)

  • Gantz J, Reinsel D (2011) Extracting value from chaos. IDC iView. https://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf

  • Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 580–587

  • Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier networks. In: International conference on artificial intelligence and statistics, pp 315–323

  • Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. arXiv preprint. arXiv:1302.4389

  • Hinton G, Osindero S (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  • Hinton G, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing coadaptation of feature detectors. arXiv:1207.0508

  • Huang J, Feris RS, Chen Q, Yan S (2015) Cross-domain image retrieval with a dual attribute-aware ranking network. arXiv preprint arXiv:1505.07922

  • Jagadeesh V, Piramuthu R, Bhardwaj A, Di W, Sundaresan N (2014) Large scale visual recommendations from street fashion images. In: ACM SIGKDD International conference on knowledge discovery and data mining, pp 1925–1934

  • Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Caffe DT (2014) Caffe: convolutional architecture for fast feature embedding. In: International conference on multimedia, pp 675–678

  • Jones N (2014) Computer science: the learning machines. Nature 505(7482):146–148

    Article  Google Scholar 

  • Kalantidis Y, Kennedy L, Li LJ (2013) Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In: ACM international conference in multimedia retrieval, pp 105–112

  • Khosla N, Venkataraman V (2015) Building image-based shoe search using convolutional neural networks. In: CS231n course project reports

  • Kiapour MH, Yamaguchi K, Berg AC, Berg TL (2014) Hipster wars: discovering elements of fashion styles. In: European conference on computer vision, pp 472–488

  • Krizhevsky A (2012) Cuda-convnet. https://code.google.com/p/cuda-convnet/

  • Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp 1106–1114

  • Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition, pp 3361–3368

  • Le Q, Ranzato M, Monga R, Devin M, Chen K, Corrado G, Dean J, Ng A (2012) Building high-level features using large scale unsupervised learning. In: International conference on machine learning, pp 81–88

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. IEEE Proc 86(11):2278–2324

    Article  Google Scholar 

  • Lin M, Chen Q, Yan S (2013) Network in network. In: International conference on learning representations. arXiv:1312.4400

  • Lin K, Yang HF, Liu KH, Hsiao JH, Chen CS (2015) Rapid clothing retrieval via deep learning of binary codes and hierarchical search. In: ACM international conference in multimedia retrieval, pp 499–502

  • Liu C, Yuen J, Torralba A (2011) Nonparametric scene parsing via label transfer. IEEE Trans Pattern Recog Mach Intell 33(12):2368–2382

    Article  Google Scholar 

  • Liu S, Feng J, Song Z, Zhang T, Lu H, Xu C, Yan S (2012) Hi, magic closet, tell me what to wear! In: International conference on multimedia, pp 619–628

  • Liu S, Feng J, Domokos C, Xu H, Huang J, Hu Z, Yan S (2014) Fashion parsing with weak color-category labels. IEEE Trans Multimedia 16(1):253–265

    Article  Google Scholar 

  • Liu S, Liang X, Liu L, Shen X, Yang J, Xu C, Lin L, Cao X, Yan S (2015) Matching-CNN meets KNN: quasi-parametric human parsing. arXiv:1504.01220

  • Long J, Zhang N, Darrell T (2014) Do convnets learn correspondence. In: International conference on neural information processing systems, pp 1601–1609

  • Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91110

    Article  Google Scholar 

  • Mohamed A, Dahl G, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22

    Article  Google Scholar 

  • Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21

    Article  Google Scholar 

  • Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. TPAMI 24(7):971–987

    Article  MATH  Google Scholar 

  • Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, pp 1717–1724

  • Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops, pp 512–519

  • Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229

  • Socher R, Huang EH, Pennington J, Ng AY, Manning CD (2011a) Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: International conference on neural information processing systems, pp 801–809

  • Socher R, Lin C, Ng A (2011b) Parsing natural scenes and natural language with recursive neural Networks. In: International conference on machine learning, pp 129–136

  • Song Z, Wang, Hua MX, Yan S (2011) Predicting occupation via human clothing and contexts. In: International conference on computer vision, pp 1084–1091

  • Sukumar SR (2014) Machine learning in the big data era: are we there yet? In: ACM SIGKDD conference on knowledge discovery and data mining: workshop on data science for social good

  • Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: IEEE conference on computer vision and pattern recognition. arXiv:1412.1265

  • Tung F, Little JJ (2014) Collage parsing: nonparametric scene parsing by adaptive overlapping windows. ECCV 8694:511–5252

    Google Scholar 

  • Wang Y, Yu D, Ju Y, Acero A (2011) Voice search. In: Language understanding: systems for extracting semantic information from speech, pp 119–146

  • Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2012) Parsing clothing in fashion photographs. In: IEEE conference on computer vision and pattern recognition, pp 3570–3577

  • Yamaguchi K, Kiapour MH, Berg TL (2013) Paper doll parsing: retrieving similar styles to parse clothing items. In: International conference on computer vision, pp 3519–3526

  • Yamaguchi K, Berg TL, Ortiz LE (2014) Chic or social: visual popularity analysis in online fashion networks. In: ACM conference on multimedia, pp 773–776

  • Yang W, Luo P, Lin L (2014) Clothing co-parsing by joint image segmentation and labeling. In: IEEE conference on computer vision and pattern recognition, pp 3182–3189

  • Zhang N, Paluri M, Ranzato M, Darrell T, Bourdev L (2014) PANDA: pose aligned networks for deep attribute modeling. In: IEEE conference on computer vision and pattern recognition, pp 1637–1644

Download references

Acknowledgements

This work is supported by National Science Council (NSC), Taiwan, under Contract of MOST 104-2221-E-151-028. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ju-Chin Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Communicated by C.-H. Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, JC., Liu, CF. Deep net architectures for visual-based clothing image recognition on large database. Soft Comput 21, 2923–2939 (2017). https://doi.org/10.1007/s00500-017-2585-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2585-8

Keywords

Navigation