More Web Proxy on the site http://driver.im/

short-paper

Few-shot Food Recognition with Pre-trained Model

Authors:

Jingjing ChenAuthors Info & Claims

CEA++ '22: Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications

Pages 45 - 48

https://doi.org/10.1145/3552485.3554939

Published: 10 October 2022 Publication History

Abstract

Food recognition is a challenging task due to the diversity of food. However, conventional training in food recognition networks demands large amounts of labeled images, which is laborious and expensive. In this work, we aim to tackle the challenging few-shot food recognition problem by leveraging the knowledge learning from pre-trained models, e.g., CLIP. Although CLIP has shown a remarkable zero-shot capability on a wide range of vision tasks, it performs poorly in the domain-specific food recognition task. To transfer CLIP's rich prior knowledge, we explore an adapter-based approach to fine-tune CLIP with only a few samples. Thus we combine CLIP's prior knowledge with the new knowledge extracted from the few-shot training set effectively for achieving good performance. Besides, we also design appropriate prompts to facilitate more accurate identification of foods from different cuisines. Experiments demonstrate that our approach achieves quite promising performance on two public food datasets, including VIREO Food-172 and UECFood-256.

Supplementary Material

MP4 File (Few-shot Food Recognition with Pre-trained Model.mp4)

Presentation video of "Few-shot Food Recognition with Pre-trained Model".

Download
12.66 MB

References

[1]

Eduardo Aguilar, Marc Bolaños, and Petia Radeva. 2017. Food recognition using fusion of classifiers based on CNNs. In International Conference on Image Analysis and Processing. Springer, 213--224.

[2]

Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and JiaBin Huang. 2019. A closer look at few-shot classification. arXiv preprint arXiv:1904.04232 (2019).

[3]

Arkabandhu Chowdhury, Mingchao Jiang, Swarat Chaudhuri, and Chris Jermaine. 2021. Few-shot image classification: Just use a library of pre-trained feature extractors and a simple classifier. In ICCV. 9445--9454.

[4]

Stergios Christodoulidis, Marios Anthimopoulos, and Stavroula Mougiakakou. 2015. Food recognition for dietary assessment using deep convolutional neural networks. In International Conference on Image Analysis and Processing. Springer, 458--465.

Digital Library

[5]

Guneet S Dhillon, Pratik Chaudhari, Avinash Ravichandran, and Stefano Soatto. 2019. A baseline for few-shot image classification. arXiv preprint arXiv:1909.02729 (2019).

[6]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR (2021).

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.

[8]

Yuqing Hu, Stéphane Pateux, and Vincent Gripon. 2022. Squeezing backbone feature distributions to the max for efficient few-shot learning. Algorithms 15, 5 (2022), 147.

[9]

Shuqiang Jiang, Weiqing Min, Linhu Liu, and Zhengdong Luo. 2019. Multi-scale multi-view deep feature aggregation for food recognition. IEEE Transactions on Image Processing 29 (2019), 265--276.

[10]

Chong-wah NGO Jing-jing Chen. 2016. Deep-based Ingredient Recognition for Cooking Recipe Retrival. ACM Multimedia (2016).

[11]

Y. Kawano and K. Yanai. 2014. Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation. In Proc. of ECCV Workshop on Transferring and Adapting Source Knowledge in Computer Vision.

[12]

Niki Martinel, Gian Luca Foresti, and Christian Micheloni. 2018. Wide-slice residual networks for food recognition. In WACV. IEEE, 567--576.

[13]

Austin Meyers, Nick Johnston, Vivek Rathod, Anoop Korattikara, Alex Gorban, Nathan Silberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, and Kevin P Murphy. 2015. Im2Calories: towards an automated mobile vision food diary. In ICCV. 1233--1241.

[14]

Paritosh Pandey, Akella Deepthi, Bappaditya Mandal, and Niladri B Puhan. 2017. FoodNet: Recognizing foods using ensemble of deep networks. IEEE Signal Processing Letters 24, 12 (2017), 1758--1762.

[15]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.

[16]

Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B Tenenbaum, and Phillip Isola. 2020. Rethinking few-shot image classification: a good embedding is all you need?. In ECCV(2020). Springer, 266--282.

[17]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[18]

Yan Wang, Wei-Lun Chao, Kilian Q Weinberger, and Laurens van der Maaten. 2019. Simpleshot: Revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623 (2019).

[19]

Hui Wu, Michele Merler, Rosario Uceda-Sosa, and John R Smith. 2016. Learning to make better mistakes: Semantics-aware visual food recognition. In Proceedings of the 24th ACM international conference on Multimedia. 172--176.

Digital Library

[20]

Shulin Yang, Mei Chen, Dean Pomerleau, and Rahul Sukthankar. 2010. Food recognition using statistics of pairwise local features. In CVPR. IEEE, 2249--2256.

[21]

Renrui Zhang, Rongyao Fang, Peng Gao, Wei Zhang, Kunchang Li, Jifeng Dai, Yu Qiao, and Hongsheng Li. 2021. Tip-adapter: Training-free clip-adapter for better vision-language modeling. arXiv preprint arXiv:2111.03930 (2021).

[22]

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2021. Learning to prompt for vision-language models. arXiv preprint arXiv:2109.01134 (2021)

Cited By

Zhang YLi HHuangfu LBalazs LHuang S(2025)Learning complementary visual information for few-shot food recognition by Regional Erasure and ReactivationExpert Systems with Applications10.1016/j.eswa.2024.126174268(126174)Online publication date: Apr-2025
https://doi.org/10.1016/j.eswa.2024.126174
Luo MMin WWang ZSong JJiang S(2023)Ingredient Prediction via Context Learning Network With Class-Adaptive Asymmetric LossIEEE Transactions on Image Processing10.1109/TIP.2023.331895832(5509-5523)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TIP.2023.3318958

Index Terms

Few-shot Food Recognition with Pre-trained Model
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning

Recommendations

Few-shot Food Recognition via Multi-view Representation Learning

This article considers the problem of few-shot learning for food recognition. Automatic food recognition can support various applications, e.g., dietary assessment and food journaling. Most existing works focus on food recognition with large numbers of ...
Better Few-Shot Text Classification with Pre-trained Language Model
Artificial Neural Networks and Machine Learning – ICANN 2021
Abstract
Recently, pre-trained language models achieve extraordinary performance on numerous benchmarks. By learning the general language knowledge from a large pre-train corpus, the language models could fit for a specific downstream task with a ...
Food/Non-food Image Classification and Food Categorization using Pre-Trained GoogLeNet Model
MADiMa '16: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management

Recent past has seen a lot of developments in the field of image-based dietary assessment. Food image classification and recognition are crucial steps for dietary assessment. In the last couple of years, advancements in the deep learning and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CEA++ '22: Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlications

October 2022

66 pages

ISBN:9781450395038

DOI:10.1145/3552485

General Chair:
Yoko Yamakata
The University of Tokyo, Japan
,
Program Chairs:
Atsushi Hashimoto
OMRON SINIC X Corporation, Japan
,
Jingjing Chen
Fudan University, China

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Shanghai Pujiang Program
National Natural Science Foundation of China Project

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 20 of 33 submissions, 61%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
157
Total Downloads

Downloads (Last 12 months)42
Downloads (Last 6 weeks)3

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YLi HHuangfu LBalazs LHuang S(2025)Learning complementary visual information for few-shot food recognition by Regional Erasure and ReactivationExpert Systems with Applications10.1016/j.eswa.2024.126174268(126174)Online publication date: Apr-2025
https://doi.org/10.1016/j.eswa.2024.126174
Luo MMin WWang ZSong JJiang S(2023)Ingredient Prediction via Context Learning Network With Class-Adaptive Asymmetric LossIEEE Transactions on Image Processing10.1109/TIP.2023.331895832(5509-5523)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TIP.2023.3318958

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten