[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Multi-food detection using a modified swin-transfomer with recursive feature pyramid network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Humans need food, and the food detection system is a fascinating research topic and a complex weight loss mechanism. Eating healthy and balanced is crucial. Over the last few years, studies on food-related tasks have progressed, but the existing system deals only with single- and multi-food items by using convolution neural networks (CNN), and the recent development of Transformers in computer-vision has outperformed famous networks like ResNet 50, VGG 16, etc., but Transformer-based methods are limited in food-related tasks. For this issue, we improve Swin Transformer-based method by taking the dominance of transformer and CNN, we propose a novel multi-food detection utilizing a modified Swin-Transfomer and Recursive Feature Pyramid Network called (MFD-MST) and Swin-Transformer with spatial extraction block (STSE) as backbone to recognize multi-food item in images to improve local and structural information of image and RFP as neck to enhance This feature representation recognizes multi-food items. STSE solves transformer positional encoding, and RFP can look at feature map twice, helping the model build powerful representations. A prominent dataset, UEC-FOOD 100, Indian Food 28, UEC-FOOD 256 was widely tested to construct a detection system that can distinguish various objects in food photos and classify them into food categories. Our model’s evaluation measures vary. MFD-MST outperforms Swin-Transformer by 2.7% at AP[0.50] and 3.3%, 1.4% on three food datasets respectively. Test results suggest that our system accurately detects food items.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The datasets analyzed during the current study are publicly available. All needed references are provided in the document.

Code Availability

The code use to obtain results during the study is available to the corresponding author at reasonable request.

References

  1. Jiang L, Qiu B, Liu X, Huang C, Lin K (2020) Deepfood: food image analysis and dietary assessment via deep model. IEEE Access 8:47477–47489

    Article  Google Scholar 

  2. Liang H, Wen G, Hu Y, Luo M, Yang P, Xu Y (2020) Mvanet: Multi-task guided multi-view attention network for chinese food recognition. IEEE Trans Multimed 23:3551–3561

    Article  Google Scholar 

  3. Liu C, Liang Y, Xue Y, Qian X, Fu J (2020) Food and ingredient joint learning for fine-grained recognition. IEEE Trans Circ Syst Video Technol 31(6):2480–2493

    Article  Google Scholar 

  4. Mandal B, Puhan NB, Verma A (2018) Deep convolutional generative adversarial network-based food recognition using partially labeled data. IEEE Sens Lett 3(2):1–4

    Article  Google Scholar 

  5. Zhu B, Ngo C-W, Chan W-K (2021) Learning from web recipe-image pairs for food recognition: Problem, baselines and performance. IEEE Trans Multimed 24:1175–1185

    Article  Google Scholar 

  6. Xiao G, Wu Q, Chen H, Cao D, Guo J, Gong Z (2019) A deep transfer learning solution for food material recognition using electronic scales. IEEE Trans Ind Inform 16(4):2290–2300

    Article  Google Scholar 

  7. Arslan B, Memiş S, Sönmez EB, Batur OZ (2021) Fine-grained food classification methods on the uec food-100 database. IEEE Trans Artif Intell 3(2):238–243

    Article  Google Scholar 

  8. Tan RZ, Chew X, Khaw KW (2020) Quantized deep residual convolutional neural network for image-based dietary assessment. IEEE Access 8:111875–111888

    Article  Google Scholar 

  9. Song G, Tao Z, Huang X, Cao G, Liu W, Yang L (2020) Hybrid attention-based prototypical network for unfamiliar restaurant food image few-shot recognition. IEEE Access 8:14893–14900

    Article  Google Scholar 

  10. Razali MN, Moung EG, Yahya F, Hou CJ, Hanapi R, Mohamed R, Hashem IAT (2021) Indigenous food recognition model based on various convolutional neural network architectures for gastronomic tourism business analytics. Information 12(8):322

    Article  Google Scholar 

  11. Jiang S, Min W, Liu L, Luo Z (2019) Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans Image Process 29:265–276

    Article  MathSciNet  Google Scholar 

  12. Zhao H, Yap K-H, Kot AC, Duan L (2020) Jdnet: A joint-learning distilled network for mobile visual food recognition. IEEE J Sel Top Sign Process 14(4):665–675

    Article  Google Scholar 

  13. Sainz-De-Abajo B, García-Alonso JM, Berrocal-Olmeda JJ, Laso-Mangas S, De La Torre-Díez I (2020) Foodscan: Food monitoring app by scanning the groceries receipts. IEEE Access 8:227915–227924

    Article  Google Scholar 

  14. Lam MB, Nguyen T-H, Chung W-Y (2020) Deep learning-based food quality estimation using radio frequency-powered sensor mote. IEEE Access 8:88360–88371

    Article  Google Scholar 

  15. Zhou P, Bai C, Xia J, Chen S (2020) Cmrdf: A real-time food alerting system based on multimodal data. IEEE Internet Things J 9(9):6335–6349

    Article  Google Scholar 

  16. Ilyas T, Khan A, Umraiz M, Jeong Y, Kim H (2021) Multi-scale context aggregation for strawberry fruit recognition and disease phenotyping. IEEE Access 9:124491–124504

    Article  Google Scholar 

  17. Liu Z, Wu J, Fu L, Majeed Y, Feng Y, Li R, Cui Y (2019) Improved kiwifruit detection using pre-trained vgg16 with rgb and nir information fusion. IEEE Access 8:2327–2336

    Article  Google Scholar 

  18. Xu X, Wang L, Shu M, Liang X, Ghafoor AZ, Liu Y, Ma Y, Zhu J (2022) Detection and counting of maize leaves based on two-stage deep learning with uav-based rgb image. Remote Sens 14(21):5388

    Article  Google Scholar 

  19. Cai Q, Li J, Li H, Weng Y (2019) Btbufood-60: Dataset for object detection in food field. In: 2019 IEEE International conference on big data and smart computing (BigComp), pp 1–4

  20. Qi J, Liu X, Liu K, Xu F, Guo H, Tian X, Li M, Bao Z, Li Y (2022) An improved yolov5 model based on visual attention mechanism: Application to recognition of tomato virus disease. Comput Electron Agric 194:106780

    Article  Google Scholar 

  21. Rachakonda L, Mohanty SP, Kougianos E (2020) ilog: An intelligent device for automatic food intake monitoring and stress detection in the iomt. IEEE Trans Consum Electron 66(2):115–124

    Article  Google Scholar 

  22. Li J, Xiong J, Chen Z (2021) Food-agnostic dish detection: A simple baseline. IEEE Access 9:125375–125383

    Article  Google Scholar 

  23. Pandey D, Parmar P, Toshniwal G, Goel M, Agrawal V, Dhiman S, Gupta L, Bagler G (2022) Object detection in indian food platters using transfer learning with yolov4. In: 2022 IEEE 38th International conference on data engineering workshops (ICDEW), pp 101–106. https://doi.org/10.1109/ICDEW55742.2022.00021

  24. Wang S, Liu Y, Qing Y, Wang C, Lan T, Yao R (2020) Detection of insulator defects with improved resnest and region proposal network. IEEE Access 8:184841–184850

    Article  Google Scholar 

  25. Cui Y, Yan L, Cao Z, Liu D (2021) Tf-blender: Temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8138-0-8147

  26. Liu D, Liang J, Geng T, Loui A, Zhou T (2023) Tripartite feature enhanced pyramid network for dense prediction. IEEE Trans Image Process

  27. Liu D, Cui Y, Chen Y, Zhang J, Fan B (2020) Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing 409:1–11

    Article  Google Scholar 

  28. Liu D, Cui Y, Yan L, Mousas C, Yang B, Chen Y (2021) Densernet: Weakly supervised visual localization using multi-scale feature aggregation. Proceedings of the AAAI conference on artificial intelligence 35:6101–6109

    Article  Google Scholar 

  29. Wang W, Liang J, Liu D (2022) Learning equivariant segmentation with instance-unique querying. Adv Neural Inf Process Syst 35:12826–12840

    Google Scholar 

  30. Liu D, Cui Y, Tan W, Chen Y (2021) Sg-net: Spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9816–9825

  31. Liang J, Zhou T, Liu D, Wang W (2023) Clustseg: Clustering for universal segmentation. arXiv preprint arXiv:2305.02187

  32. Liu D, Cui Y, Cao Z, Chen Y (2020) A large-scale simulation dataset: Boost the detection accuracy for special weather conditions. In: 2020 International joint conference on neural networks (IJCNN), pp 1–8. IEEE

  33. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1–14

    Google Scholar 

  34. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted indows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022. IEEE

  35. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  36. Qiao S, Chen L-C, Yuille A (2021) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10213–10224

  37. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  38. Liu Y-C, Onthoni DD, Mohapatra S, Irianti D, Sahoo PK (2022) Deep-learning-assisted multi-dish food recognition application for dietary intake reporting. Electronics 11(10):1626

    Article  Google Scholar 

Download references

Acknowledgements

This work was financially supported in part by the Ministry of Science and Technology under grant MOST 112-2221-E-224-053.

Author information

Authors and Affiliations

Authors

Contributions

Chao-Yang Lee: Methodology, writing-original draft preparation, Conceptualization review, and funding; Abida Khanum: methodology, Conceptualization review, and editing; Pinninti Praneeth Kumar draft preparation, review and editing. All authors have read and agreed to the published version of the manuscript

Corresponding author

Correspondence to Chao-Yang Lee.

Ethics declarations

Ethics approval

No applicable

Consent for Publication

All authors have approved the manuscript and agree with its publication in the Multimedia Tools and Applications.

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, CY., Khanum, A. & Kumar, P.P. Multi-food detection using a modified swin-transfomer with recursive feature pyramid network. Multimed Tools Appl 83, 57731–57757 (2024). https://doi.org/10.1007/s11042-023-17757-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17757-w

Keywords

Navigation