Multi-food detection using a modified swin-transfomer with recursive feature pyramid network

Chao-Yang Lee ORCID: orcid.org/0000-0003-3898-3551¹,
Abida Khanum² &
Pinninti Praneeth Kumar³^na1

297 Accesses
2 Citations
Explore all metrics

Abstract

Humans need food, and the food detection system is a fascinating research topic and a complex weight loss mechanism. Eating healthy and balanced is crucial. Over the last few years, studies on food-related tasks have progressed, but the existing system deals only with single- and multi-food items by using convolution neural networks (CNN), and the recent development of Transformers in computer-vision has outperformed famous networks like ResNet 50, VGG 16, etc., but Transformer-based methods are limited in food-related tasks. For this issue, we improve Swin Transformer-based method by taking the dominance of transformer and CNN, we propose a novel multi-food detection utilizing a modified Swin-Transfomer and Recursive Feature Pyramid Network called (MFD-MST) and Swin-Transformer with spatial extraction block (STSE) as backbone to recognize multi-food item in images to improve local and structural information of image and RFP as neck to enhance This feature representation recognizes multi-food items. STSE solves transformer positional encoding, and RFP can look at feature map twice, helping the model build powerful representations. A prominent dataset, UEC-FOOD 100, Indian Food 28, UEC-FOOD 256 was widely tested to construct a detection system that can distinguish various objects in food photos and classify them into food categories. Our model’s evaluation measures vary. MFD-MST outperforms Swin-Transformer by 2.7% at AP[0.50] and 3.3%, 1.4% on three food datasets respectively. Test results suggest that our system accurately detects food items.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Visual Aware Hierarchy Based Food Recognition

p-Faster R-CNN Algorithm for Food Detection

A Real-Time Junk Food Recognition System Based on Machine Learning

Data Availability

The datasets analyzed during the current study are publicly available. All needed references are provided in the document.

Code Availability

The code use to obtain results during the study is available to the corresponding author at reasonable request.

References

Jiang L, Qiu B, Liu X, Huang C, Lin K (2020) Deepfood: food image analysis and dietary assessment via deep model. IEEE Access 8:47477–47489
Article Google Scholar
Liang H, Wen G, Hu Y, Luo M, Yang P, Xu Y (2020) Mvanet: Multi-task guided multi-view attention network for chinese food recognition. IEEE Trans Multimed 23:3551–3561
Article Google Scholar
Liu C, Liang Y, Xue Y, Qian X, Fu J (2020) Food and ingredient joint learning for fine-grained recognition. IEEE Trans Circ Syst Video Technol 31(6):2480–2493
Article Google Scholar
Mandal B, Puhan NB, Verma A (2018) Deep convolutional generative adversarial network-based food recognition using partially labeled data. IEEE Sens Lett 3(2):1–4
Article Google Scholar
Zhu B, Ngo C-W, Chan W-K (2021) Learning from web recipe-image pairs for food recognition: Problem, baselines and performance. IEEE Trans Multimed 24:1175–1185
Article Google Scholar
Xiao G, Wu Q, Chen H, Cao D, Guo J, Gong Z (2019) A deep transfer learning solution for food material recognition using electronic scales. IEEE Trans Ind Inform 16(4):2290–2300
Article Google Scholar
Arslan B, Memiş S, Sönmez EB, Batur OZ (2021) Fine-grained food classification methods on the uec food-100 database. IEEE Trans Artif Intell 3(2):238–243
Article Google Scholar
Tan RZ, Chew X, Khaw KW (2020) Quantized deep residual convolutional neural network for image-based dietary assessment. IEEE Access 8:111875–111888
Article Google Scholar
Song G, Tao Z, Huang X, Cao G, Liu W, Yang L (2020) Hybrid attention-based prototypical network for unfamiliar restaurant food image few-shot recognition. IEEE Access 8:14893–14900
Article Google Scholar
Razali MN, Moung EG, Yahya F, Hou CJ, Hanapi R, Mohamed R, Hashem IAT (2021) Indigenous food recognition model based on various convolutional neural network architectures for gastronomic tourism business analytics. Information 12(8):322
Article Google Scholar
Jiang S, Min W, Liu L, Luo Z (2019) Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans Image Process 29:265–276
Article MathSciNet Google Scholar
Zhao H, Yap K-H, Kot AC, Duan L (2020) Jdnet: A joint-learning distilled network for mobile visual food recognition. IEEE J Sel Top Sign Process 14(4):665–675
Article Google Scholar
Sainz-De-Abajo B, García-Alonso JM, Berrocal-Olmeda JJ, Laso-Mangas S, De La Torre-Díez I (2020) Foodscan: Food monitoring app by scanning the groceries receipts. IEEE Access 8:227915–227924
Article Google Scholar
Lam MB, Nguyen T-H, Chung W-Y (2020) Deep learning-based food quality estimation using radio frequency-powered sensor mote. IEEE Access 8:88360–88371
Article Google Scholar
Zhou P, Bai C, Xia J, Chen S (2020) Cmrdf: A real-time food alerting system based on multimodal data. IEEE Internet Things J 9(9):6335–6349
Article Google Scholar
Ilyas T, Khan A, Umraiz M, Jeong Y, Kim H (2021) Multi-scale context aggregation for strawberry fruit recognition and disease phenotyping. IEEE Access 9:124491–124504
Article Google Scholar
Liu Z, Wu J, Fu L, Majeed Y, Feng Y, Li R, Cui Y (2019) Improved kiwifruit detection using pre-trained vgg16 with rgb and nir information fusion. IEEE Access 8:2327–2336
Article Google Scholar
Xu X, Wang L, Shu M, Liang X, Ghafoor AZ, Liu Y, Ma Y, Zhu J (2022) Detection and counting of maize leaves based on two-stage deep learning with uav-based rgb image. Remote Sens 14(21):5388
Article Google Scholar
Cai Q, Li J, Li H, Weng Y (2019) Btbufood-60: Dataset for object detection in food field. In: 2019 IEEE International conference on big data and smart computing (BigComp), pp 1–4
Qi J, Liu X, Liu K, Xu F, Guo H, Tian X, Li M, Bao Z, Li Y (2022) An improved yolov5 model based on visual attention mechanism: Application to recognition of tomato virus disease. Comput Electron Agric 194:106780
Article Google Scholar
Rachakonda L, Mohanty SP, Kougianos E (2020) ilog: An intelligent device for automatic food intake monitoring and stress detection in the iomt. IEEE Trans Consum Electron 66(2):115–124
Article Google Scholar
Li J, Xiong J, Chen Z (2021) Food-agnostic dish detection: A simple baseline. IEEE Access 9:125375–125383
Article Google Scholar
Pandey D, Parmar P, Toshniwal G, Goel M, Agrawal V, Dhiman S, Gupta L, Bagler G (2022) Object detection in indian food platters using transfer learning with yolov4. In: 2022 IEEE 38th International conference on data engineering workshops (ICDEW), pp 101–106. https://doi.org/10.1109/ICDEW55742.2022.00021
Wang S, Liu Y, Qing Y, Wang C, Lan T, Yao R (2020) Detection of insulator defects with improved resnest and region proposal network. IEEE Access 8:184841–184850
Article Google Scholar
Cui Y, Yan L, Cao Z, Liu D (2021) Tf-blender: Temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8138-0-8147
Liu D, Liang J, Geng T, Loui A, Zhou T (2023) Tripartite feature enhanced pyramid network for dense prediction. IEEE Trans Image Process
Liu D, Cui Y, Chen Y, Zhang J, Fan B (2020) Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing 409:1–11
Article Google Scholar
Liu D, Cui Y, Yan L, Mousas C, Yang B, Chen Y (2021) Densernet: Weakly supervised visual localization using multi-scale feature aggregation. Proceedings of the AAAI conference on artificial intelligence 35:6101–6109
Article Google Scholar
Wang W, Liang J, Liu D (2022) Learning equivariant segmentation with instance-unique querying. Adv Neural Inf Process Syst 35:12826–12840
Google Scholar
Liu D, Cui Y, Tan W, Chen Y (2021) Sg-net: Spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9816–9825
Liang J, Zhou T, Liu D, Wang W (2023) Clustseg: Clustering for universal segmentation. arXiv preprint arXiv:2305.02187
Liu D, Cui Y, Cao Z, Chen Y (2020) A large-scale simulation dataset: Boost the detection accuracy for special weather conditions. In: 2020 International joint conference on neural networks (IJCNN), pp 1–8. IEEE
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1–14
Google Scholar
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted indows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022. IEEE
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Qiao S, Chen L-C, Yuille A (2021) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10213–10224
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Liu Y-C, Onthoni DD, Mohapatra S, Irianti D, Sahoo PK (2022) Deep-learning-assisted multi-dish food recognition application for dietary intake reporting. Electronics 11(10):1626
Article Google Scholar

Download references

Acknowledgements

This work was financially supported in part by the Ministry of Science and Technology under grant MOST 112-2221-E-224-053.

Author information

Pinninti Praneeth Kumar contributed equally to this work.

Authors and Affiliations

Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Yunlin Country, Taiwan
Chao-Yang Lee
Department of Electrical Engineering, National Cheng Kung University, No.1, University Road, Tainan City, 701, Taiwan
Abida Khanum
Department of Aeronautical Engineering, National Formosa University, Yunlin Country, Taiwan
Pinninti Praneeth Kumar

Authors

Chao-Yang Lee
View author publications
You can also search for this author in PubMed Google Scholar
Abida Khanum
View author publications
You can also search for this author in PubMed Google Scholar
Pinninti Praneeth Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Chao-Yang Lee: Methodology, writing-original draft preparation, Conceptualization review, and funding; Abida Khanum: methodology, Conceptualization review, and editing; Pinninti Praneeth Kumar draft preparation, review and editing. All authors have read and agreed to the published version of the manuscript

Corresponding author

Correspondence to Chao-Yang Lee.

Ethics declarations

Ethics approval

No applicable

Consent for Publication

All authors have approved the manuscript and agree with its publication in the Multimedia Tools and Applications.

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lee, CY., Khanum, A. & Kumar, P.P. Multi-food detection using a modified swin-transfomer with recursive feature pyramid network. Multimed Tools Appl 83, 57731–57757 (2024). https://doi.org/10.1007/s11042-023-17757-w

Download citation

Received: 12 September 2023
Revised: 09 November 2023
Accepted: 25 November 2023
Published: 12 December 2023
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11042-023-17757-w

Multi-food detection using a modified swin-transfomer with recursive feature pyramid network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Visual Aware Hierarchy Based Food Recognition

p-Faster R-CNN Algorithm for Food Detection

A Real-Time Junk Food Recognition System Based on Machine Learning

Data Availability

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent for Publication

Conflict of Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-food detection using a modified swin-transfomer with recursive feature pyramid network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Visual Aware Hierarchy Based Food Recognition

p-Faster R-CNN Algorithm for Food Detection

A Real-Time Junk Food Recognition System Based on Machine Learning

Data Availability

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent for Publication

Conflict of Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation