Abstract
Humans need food, and the food detection system is a fascinating research topic and a complex weight loss mechanism. Eating healthy and balanced is crucial. Over the last few years, studies on food-related tasks have progressed, but the existing system deals only with single- and multi-food items by using convolution neural networks (CNN), and the recent development of Transformers in computer-vision has outperformed famous networks like ResNet 50, VGG 16, etc., but Transformer-based methods are limited in food-related tasks. For this issue, we improve Swin Transformer-based method by taking the dominance of transformer and CNN, we propose a novel multi-food detection utilizing a modified Swin-Transfomer and Recursive Feature Pyramid Network called (MFD-MST) and Swin-Transformer with spatial extraction block (STSE) as backbone to recognize multi-food item in images to improve local and structural information of image and RFP as neck to enhance This feature representation recognizes multi-food items. STSE solves transformer positional encoding, and RFP can look at feature map twice, helping the model build powerful representations. A prominent dataset, UEC-FOOD 100, Indian Food 28, UEC-FOOD 256 was widely tested to construct a detection system that can distinguish various objects in food photos and classify them into food categories. Our model’s evaluation measures vary. MFD-MST outperforms Swin-Transformer by 2.7% at AP[0.50] and 3.3%, 1.4% on three food datasets respectively. Test results suggest that our system accurately detects food items.
Similar content being viewed by others
Data Availability
The datasets analyzed during the current study are publicly available. All needed references are provided in the document.
Code Availability
The code use to obtain results during the study is available to the corresponding author at reasonable request.
References
Jiang L, Qiu B, Liu X, Huang C, Lin K (2020) Deepfood: food image analysis and dietary assessment via deep model. IEEE Access 8:47477–47489
Liang H, Wen G, Hu Y, Luo M, Yang P, Xu Y (2020) Mvanet: Multi-task guided multi-view attention network for chinese food recognition. IEEE Trans Multimed 23:3551–3561
Liu C, Liang Y, Xue Y, Qian X, Fu J (2020) Food and ingredient joint learning for fine-grained recognition. IEEE Trans Circ Syst Video Technol 31(6):2480–2493
Mandal B, Puhan NB, Verma A (2018) Deep convolutional generative adversarial network-based food recognition using partially labeled data. IEEE Sens Lett 3(2):1–4
Zhu B, Ngo C-W, Chan W-K (2021) Learning from web recipe-image pairs for food recognition: Problem, baselines and performance. IEEE Trans Multimed 24:1175–1185
Xiao G, Wu Q, Chen H, Cao D, Guo J, Gong Z (2019) A deep transfer learning solution for food material recognition using electronic scales. IEEE Trans Ind Inform 16(4):2290–2300
Arslan B, Memiş S, Sönmez EB, Batur OZ (2021) Fine-grained food classification methods on the uec food-100 database. IEEE Trans Artif Intell 3(2):238–243
Tan RZ, Chew X, Khaw KW (2020) Quantized deep residual convolutional neural network for image-based dietary assessment. IEEE Access 8:111875–111888
Song G, Tao Z, Huang X, Cao G, Liu W, Yang L (2020) Hybrid attention-based prototypical network for unfamiliar restaurant food image few-shot recognition. IEEE Access 8:14893–14900
Razali MN, Moung EG, Yahya F, Hou CJ, Hanapi R, Mohamed R, Hashem IAT (2021) Indigenous food recognition model based on various convolutional neural network architectures for gastronomic tourism business analytics. Information 12(8):322
Jiang S, Min W, Liu L, Luo Z (2019) Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans Image Process 29:265–276
Zhao H, Yap K-H, Kot AC, Duan L (2020) Jdnet: A joint-learning distilled network for mobile visual food recognition. IEEE J Sel Top Sign Process 14(4):665–675
Sainz-De-Abajo B, García-Alonso JM, Berrocal-Olmeda JJ, Laso-Mangas S, De La Torre-Díez I (2020) Foodscan: Food monitoring app by scanning the groceries receipts. IEEE Access 8:227915–227924
Lam MB, Nguyen T-H, Chung W-Y (2020) Deep learning-based food quality estimation using radio frequency-powered sensor mote. IEEE Access 8:88360–88371
Zhou P, Bai C, Xia J, Chen S (2020) Cmrdf: A real-time food alerting system based on multimodal data. IEEE Internet Things J 9(9):6335–6349
Ilyas T, Khan A, Umraiz M, Jeong Y, Kim H (2021) Multi-scale context aggregation for strawberry fruit recognition and disease phenotyping. IEEE Access 9:124491–124504
Liu Z, Wu J, Fu L, Majeed Y, Feng Y, Li R, Cui Y (2019) Improved kiwifruit detection using pre-trained vgg16 with rgb and nir information fusion. IEEE Access 8:2327–2336
Xu X, Wang L, Shu M, Liang X, Ghafoor AZ, Liu Y, Ma Y, Zhu J (2022) Detection and counting of maize leaves based on two-stage deep learning with uav-based rgb image. Remote Sens 14(21):5388
Cai Q, Li J, Li H, Weng Y (2019) Btbufood-60: Dataset for object detection in food field. In: 2019 IEEE International conference on big data and smart computing (BigComp), pp 1–4
Qi J, Liu X, Liu K, Xu F, Guo H, Tian X, Li M, Bao Z, Li Y (2022) An improved yolov5 model based on visual attention mechanism: Application to recognition of tomato virus disease. Comput Electron Agric 194:106780
Rachakonda L, Mohanty SP, Kougianos E (2020) ilog: An intelligent device for automatic food intake monitoring and stress detection in the iomt. IEEE Trans Consum Electron 66(2):115–124
Li J, Xiong J, Chen Z (2021) Food-agnostic dish detection: A simple baseline. IEEE Access 9:125375–125383
Pandey D, Parmar P, Toshniwal G, Goel M, Agrawal V, Dhiman S, Gupta L, Bagler G (2022) Object detection in indian food platters using transfer learning with yolov4. In: 2022 IEEE 38th International conference on data engineering workshops (ICDEW), pp 101–106. https://doi.org/10.1109/ICDEW55742.2022.00021
Wang S, Liu Y, Qing Y, Wang C, Lan T, Yao R (2020) Detection of insulator defects with improved resnest and region proposal network. IEEE Access 8:184841–184850
Cui Y, Yan L, Cao Z, Liu D (2021) Tf-blender: Temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8138-0-8147
Liu D, Liang J, Geng T, Loui A, Zhou T (2023) Tripartite feature enhanced pyramid network for dense prediction. IEEE Trans Image Process
Liu D, Cui Y, Chen Y, Zhang J, Fan B (2020) Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing 409:1–11
Liu D, Cui Y, Yan L, Mousas C, Yang B, Chen Y (2021) Densernet: Weakly supervised visual localization using multi-scale feature aggregation. Proceedings of the AAAI conference on artificial intelligence 35:6101–6109
Wang W, Liang J, Liu D (2022) Learning equivariant segmentation with instance-unique querying. Adv Neural Inf Process Syst 35:12826–12840
Liu D, Cui Y, Tan W, Chen Y (2021) Sg-net: Spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9816–9825
Liang J, Zhou T, Liu D, Wang W (2023) Clustseg: Clustering for universal segmentation. arXiv preprint arXiv:2305.02187
Liu D, Cui Y, Cao Z, Chen Y (2020) A large-scale simulation dataset: Boost the detection accuracy for special weather conditions. In: 2020 International joint conference on neural networks (IJCNN), pp 1–8. IEEE
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1–14
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted indows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022. IEEE
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Qiao S, Chen L-C, Yuille A (2021) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10213–10224
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Liu Y-C, Onthoni DD, Mohapatra S, Irianti D, Sahoo PK (2022) Deep-learning-assisted multi-dish food recognition application for dietary intake reporting. Electronics 11(10):1626
Acknowledgements
This work was financially supported in part by the Ministry of Science and Technology under grant MOST 112-2221-E-224-053.
Author information
Authors and Affiliations
Contributions
Chao-Yang Lee: Methodology, writing-original draft preparation, Conceptualization review, and funding; Abida Khanum: methodology, Conceptualization review, and editing; Pinninti Praneeth Kumar draft preparation, review and editing. All authors have read and agreed to the published version of the manuscript
Corresponding author
Ethics declarations
Ethics approval
No applicable
Consent for Publication
All authors have approved the manuscript and agree with its publication in the Multimedia Tools and Applications.
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lee, CY., Khanum, A. & Kumar, P.P. Multi-food detection using a modified swin-transfomer with recursive feature pyramid network. Multimed Tools Appl 83, 57731–57757 (2024). https://doi.org/10.1007/s11042-023-17757-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17757-w