More Web Proxy on the site http://driver.im/

research-article

Watch and Buy: A Practical Solution for Real-time Fashion Product Identification in Live Stream

Authors:

Xuan WangAuthors Info & Claims

WAB'21: Proceedings of the 1st Workshop on Multimodal Product Identification in Livestreaming and WAB Challenge

Pages 23 - 31

https://doi.org/10.1145/3475956.3484482

Published: 22 October 2021 Publication History

Abstract

"Watch and Buy: Multimodal Product Identification(WAB)" challenge is a new task in the field of cross-modal retrieval, which aims to retrieve the relevant products when users watching live streamers selling fashion products. In practice, it is very hard to get the product items accurately and quickly because of large deformations, occlusions and motion blur of product items in a real-world live streaming environment. In this paper, our solution for WAB challenge is presented, which includes the model and training methods of fashion product localization and identification, as well as the detailed strategy for optimization, model assembly, and post-process rank. Experiments show that our strategies for data enhancement, model fusion and result ranking can lead to a better result. Finally, our model is small and efficient with competitive results and attains 0.4915 on test B in the final season, ranking 5th. And our model attains 0.5604 on test A, ranking 1st in the late submission.

References

[1]

Relja Arandjelovic, Petr Gronát, Akihiko Torii, Tomás Pajdla, and Josef Sivic. 2018. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 6 (2018), 1437--1451.

[2]

Sean Bell and Kavita Bala. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 34, 4 (2015), 98:1--98:10.

Digital Library

[3]

Alexey Bochkovskiy, Chien-YaoWang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. CoRR abs/2004.10934 (2020).

[4]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. IEEE, 248--255.

[5]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In CVPR. IEEE, 4690-- 4699.

[6]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR. OpenReview.net.

[7]

Yuying Ge, Ruimao Zhang, Xiaogang Wang, Xiaoou Tang, and Ping Luo. 2019. DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images. In CVPR. IEEE, 5337--5345.

[8]

Ross B. Girshick. 2015. Fast R-CNN. In ICCV. IEEE, 1440--1448.

Digital Library

[9]

Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In CVPR. IEEE, 580--587.

Digital Library

[10]

Albert Gordo, Jon Almazán, Jérôme Revaud, and Diane Larlus. 2016. Deep Image Retrieval: Learning Global Representations for Image Search. In ECCV (6) (Lecture Notes in Computer Science, Vol. 9910). Springer, 241--257.

[11]

Albert Gordo, Jon Almazán, Jérôme Revaud, and Diane Larlus. 2017. End-to-End Learning of Deep Visual Representations for Image Retrieval. Int. J. Comput. Vis. 124, 2 (2017), 237--254.

Digital Library

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 9 (2015), 1904--1916.

Digital Library

[13]

Junshi Huang, Rogério Schmidt Feris, Qiang Chen, and Shuicheng Yan. 2015. Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network. In ICCV. IEEE, 1062--1070.

Digital Library

[14]

Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. 2017. Feature Pyramid Networks for Object Detection. In CVPR. IEEE, 936--944.

[15]

Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path Aggregation Network for Instance Segmentation. In CVPR. IEEE, 8759--8768.

[16]

Ziwei Liu, Ping Luo, Shi Qiu, XiaogangWang, and Xiaoou Tang. 2016. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In CVPR. IEEE, 1096--1104.

[17]

Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic Gradient Descent with Warm Restarts. In ICLR (Poster). OpenReview.net.

[18]

David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110.

Digital Library

[19]

Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, and Bohyung Han. 2017. Large-Scale Image Retrieval with Attentive Deep Local Features. In ICCV. IEEE, 3476--3485.

[20]

Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2019. Fine-Tuning CNN Image Retrieval with No Human Annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41, 7 (2019), 1655--1668.

[21]

Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In CVPR. IEEE, 779-- 788.

[22]

Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In CVPR. IEEE, 6517--6525.

[23]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137--1149.

Digital Library

[24]

Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In ICML (Proceedings of Machine Learning Research, Vol. 97). PMLR, 6105--6114.

[25]

Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2016. Particular object retrieval with integral max-pooling of CNN activations. In ICLR (Poster).

[26]

Feng Wang, Xiang Xiang, Jian Cheng, and Alan Loddon Yuille. 2017. NormFace: L2 Hypersphere Embedding for Face Verification. In ACM MM. ACM, 1041--1049.

Digital Library

[27]

Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, and Dongwei Ren. 2020. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In AAAI. AAAI Press, 12993--13000.

Cited By

Index Terms

Watch and Buy: A Practical Solution for Real-time Fashion Product Identification in Live Stream
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

You Watch, You Give, and You Engage: A Study of Live Streaming Practices in China
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

Despite gaining traction in North America, live streaming has not reached the popularity it has in China, where live- streaming has a tremendous impact on the social behaviors of users. To better understand this socio-technological phenomenon, we ...
Watch Me Code: Programming Mentorship Communities on Twitch.tv

Live streaming-an emerging practice of broadcasting video of oneself in real time to an online audience-is often used by people to portray themselves engaged in a craft such as programming. Viewers of these 'creative streams' gather to watch the streamer ...
FRSFN: A semantic fusion network for practical fashion retrieval
Abstract
In recent years, research related to fashion has made remarkable progress, and the use of image content for fashion retrieval has become one of the effective approaches as well as research hot spots. However, it remains a challenging task due to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WAB'21: Proceedings of the 1st Workshop on Multimodal Product Identification in Livestreaming and WAB Challenge

October 2021

38 pages

ISBN:9781450386777

DOI:10.1145/3475956

General Chairs:
Yueting Zhuang
Zhejiang University, China
,
Xing Tang
Tao Technology, Alibaba Group, China
,
Guilin Wu
Tao Technology, Alibaba Group, China
,
Yahong Han
Tianjin University, China
,
Haihong Tang
Tao Technology, Alibaba Group, China
,
Xiaobo Li
Tao Technology, Alibaba Group, China
,
Xiaohan Wang
Zhejiang University, China
,
Baoming Yan
Tao Technology, Alibaba Group, China
,
Bo Gao
Tao Technology, Alibaba Group, China
,
Yi Yang
University of Technology Sydney, Australia

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Natural Science Foundation of Guangdong
Shenzhen Foundational Research Funding Under Grant

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 24, 2021

Virtual Event, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
216
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)2

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents