Foodborne Event Detection Based on Social Media Mining: A Systematic Review
Abstract
:1. Introduction
2. Materials and Methods
2.1. Information Sources and Search Strategy
2.2. Selection Process
2.3. Data Extraction
2.4. Assessment of Risk of Bias
3. Results
3.1. Characteristics of the Studies Included
3.2. Characteristics of the Settings Considered
3.3. Keywords Used
3.4. Machine Learning Techniques Applied
3.5. Risk of Bias Assessment
4. Discussion
Limitations
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Pires, S.M.; Desta, B.N.; Mughini-Gras, L.; Mmbaga, B.T.; Fayemi, O.E.; Salvador, E.M.; Gobena, T.; Majowicz, S.E.; Hald, T.; Hoejskov, P.S.; et al. Burden of Foodborne Diseases: Think Global, Act Local. Curr. Opin. Food Sci. 2021, 39, 152–159. [Google Scholar] [CrossRef] [PubMed]
- Lee, H.; Yoon, Y. Etiological Agents Implicated in Foodborne Illness World Wide. Food Sci. Anim. Resour. 2021, 41, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Foodborne Diseases. Available online: https://www.who.int/health-topics/foodborne-diseases (accessed on 16 October 2023).
- Estimating the Burden of Foodborne Diseases. Available online: https://www.who.int/activities/estimating-the-burden-of-foodborne-diseases (accessed on 16 October 2023).
- Monitoring of Foodborne Diseases | EFSA. Available online: https://www.efsa.europa.eu/en/topics/topic/monitoring-foodborne-diseases (accessed on 16 October 2023).
- The European Union One Health 2021 Zoonoses Report | EFSA. Available online: https://www.efsa.europa.eu/en/efsajournal/pub/7666 (accessed on 16 October 2023).
- Estimates of Foodborne Illness in the United States | Estimates of Foodborne Illness | CDC. Available online: https://www.cdc.gov/foodborneburden/index.html (accessed on 16 October 2023).
- Zheng, Y.; Gracia, A.; Hu, L. Predicting Foodborne Disease Outbreaks with Food Safety Certifications: Econometric and Machine Learning Analyses. J. Food Prot. 2023, 86, 100136. [Google Scholar] [CrossRef]
- López-Campos, G.; Martínez-Suárez, J.V.; Aguado-Urda, M.; López-Alonso, V. Detection, Identification, and Analysis of Foodborne Pathogens. In Microarray Detection and Characterization of Bacterial Foodborne Pathogens; López-Campos, G., Martínez-Suárez, J.V., Aguado-Urda, M., López-Alonso, V., Eds.; SpringerBriefs in Food, Health, and Nutrition; Springer: Boston, MA, USA, 2012; pp. 13–32. ISBN 978-1-4614-3250-0. [Google Scholar]
- Quintela, I.A.; Vasse, T.; Lin, C.-S.; Wu, V.C.H. Advances, Applications, and Limitations of Portable and Rapid Detection Technologies for Routinely Encountered Foodborne Pathogens. Front. Microbiol. 2022, 13, 1054782. [Google Scholar] [CrossRef]
- Oldroyd, R.A.; Morris, M.A.; Birkin, M. Identifying Methods for Monitoring Foodborne Illness: Review of Existing Public Health Surveillance Techniques. JMIR Public Health Surveill. 2018, 4, e57. [Google Scholar] [CrossRef] [PubMed]
- Zatsu, V.; Shine, A.E.; Tharakan, J.M.; Peter, D.; Ranganathan, T.V.; Alotaibi, S.S.; Mugabi, R.; Muhsinah, A.B.; Waseem, M.; Nayik, G.A. Revolutionizing the Food Industry: The Transformative Power of Artificial Intelligence—A Review. Food Chem. X 2024, 24, 101867. [Google Scholar] [CrossRef] [PubMed]
- Fallatah, D.I.; Adekola, H.A. Digital Epidemiology: Harnessing Big Data for Early Detection and Monitoring of Viral Outbreaks. Infect. Prev. Pract. 2024, 6, 100382. [Google Scholar] [CrossRef] [PubMed]
- Budd, J.; Miller, B.S.; Manning, E.M.; Lampos, V.; Zhuang, M.; Edelstein, M.; Rees, G.; Emery, V.C.; Stevens, M.M.; Keegan, N.; et al. Digital Technologies in the Public-Health Response to COVID-19. Nat. Med. 2020, 26, 1183–1192. [Google Scholar] [CrossRef] [PubMed]
- Ginsberg, J.; Mohebbi, M.H.; Patel, R.S.; Brammer, L.; Smolinski, M.S.; Brilliant, L. Detecting Influenza Epidemics Using Search Engine Query Data. Nature 2009, 457, 1012–1014. [Google Scholar] [CrossRef] [PubMed]
- Syrowatka, A.; Kuznetsova, M.; Alsubai, A.; Beckman, A.L.; Bain, P.A.; Craig, K.J.T.; Hu, J.; Jackson, G.P.; Rhee, K.; Bates, D.W. Leveraging Artificial Intelligence for Pandemic Preparedness and Response: A Scoping Review to Identify Key Use Cases. NPJ Digit. Med. 2021, 4, 96. [Google Scholar] [CrossRef]
- Chapman, B.; Raymond, B.; Powell, D. Potential of Social Media as a Tool to Combat Foodborne Illness. Perspect. Public Health 2014, 134, 225–230. [Google Scholar] [CrossRef] [PubMed]
- Duzen, Z.; Riveni, M.; Aktas, M.S. Analyzing the Spread of Misinformation on Social Networks: A Process and Software Architecture for Detection and Analysis. Computers 2023, 12, 232. [Google Scholar] [CrossRef]
- Gijsen, V.; Maddux, M.; Lavertu, A.; Gonzalez-Hernandez, G.; Ram, N.; Reeves, B.; Robinson, T.; Ziesenitz, V.; Shakhnovich, V.; Altman, R. #Science: The Potential and the Challenges of Utilizing Social Media and Other Electronic Communication Platforms in Health Care. Clin. Transl. Sci. 2020, 13, 26–30. [Google Scholar] [CrossRef] [PubMed]
- Covidence—Better Systematic Review Management. Available online: https://www.covidence.org/ (accessed on 19 January 2024).
- Moons, K.G.M.; Wolff, R.F.; Riley, R.D.; Whiting, P.F.; Westwood, M.; Collins, G.S.; Reitsma, J.B.; Kleijnen, J.; Mallett, S. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann. Intern. Med. 2019, 170, W1–W33. [Google Scholar] [CrossRef] [PubMed]
- Denecke, K.; Krieck, M.; Otrusina, L.; Smrz, P.; Dolog, P.; Nejdl, W.; Velasco, E. How to Exploit Twitter for Public Health Monitoring? Methods Inf. Med. 2013, 52, 326–339. [Google Scholar] [CrossRef] [PubMed]
- Nsoesie, E.O.; Kluberg, S.A.; Brownstein, J.S. Online Reports of Foodborne Illness Capture Foods Implicated in Official Foodborne Outbreak Reports. Prev. Med. 2014, 67, 264–269. [Google Scholar] [CrossRef]
- Kate, K.; Negi, S.; Kalagnanam, J. Monitoring Food Safety Violation Reports from Internet Forums. In e-Health—For Continuity of Care; IOS Press: Amsterdam, The Netherlands, 2014; pp. 1090–1094. [Google Scholar]
- Widener, M.J.; Li, W. Using Geolocated Twitter Data to Monitor the Prevalence of Healthy and Unhealthy Food References across the US. Appl. Geogr. 2014, 54, 189–197. [Google Scholar] [CrossRef]
- Schomberg, J.P.; Haimson, O.L.; Hayes, G.R.; Anton-Culver, H. Supplementing Public Health Inspection via Social Media. PLoS ONE 2016, 11, e0152117. [Google Scholar] [CrossRef]
- Wang, Z.; Balasubramani, B.S.; Cruz, I.F. Predictive Analytics Using Text Classification for Restaurant Inspections. In Proceedings of the 3rd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics, Redondo Beach, CA, USA, 7–10 November 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 1–4. [Google Scholar]
- Cui, W.; Wang, P.; Du, Y.; Chen, X.; Guo, D.; Li, J.; Zhou, Y. An Algorithm for Event Detection Based on Social Media Data. Neurocomputing 2017, 254, 53–58. [Google Scholar] [CrossRef]
- Sadilek, A.; Caty, S.; DiPrete, L.; Mansour, R.; Schenk, T.; Bergtholdt, M.; Jha, A.; Ramaswami, P.; Gabrilovich, E. Machine-Learned Epidemiology: Real-Time Detection of Foodborne Illness at Scale. NPJ Digit. Med. 2018, 1, 36. [Google Scholar] [CrossRef] [PubMed]
- Effland, T.; Lawson, A.; Balter, S.; Devinney, K.; Reddy, V.; Waechter, H.; Gravano, L.; Hsu, D. Discovering Foodborne Illness in Online Restaurant Reviews. J. Am. Med. Inform. Assoc. 2018, 25, 1586–1592. [Google Scholar] [CrossRef]
- Șerban, O.; Thapen, N.; Maginnis, B.; Hankin, C.; Foot, V. Real-Time Processing of Social Media with SENTINEL: A Syndromic Surveillance System Incorporating Deep Learning for Health Classification. Inf. Process. Manag. 2019, 56, 1166–1184. [Google Scholar] [CrossRef]
- Zhang, M.; Guo, D.; Hu, J.; Jin, W. Risk Prediction and Assessment of Foodborne Disease Based on Big Data. In Proceedings of the 5th ACM SIGSPATIAL International Workshop on the Use of GIS in Emergency Management, Chicago, IL, USA, 5 November 2019; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
- Mejia, J.; Mankad, S.; Gopal, A. A for Effort? Using the Crowd to Identify Moral Hazard in New York City Restaurant Hygiene Inspections. Inf. Syst. Res. 2019, 30, 1363–1386. [Google Scholar] [CrossRef]
- Maharana, A.; Cai, K.; Hellerstein, J.; Hswen, Y.; Munsell, M.; Staneva, V.; Verma, M.; Vint, C.; Wijaya, D.; Nsoesie, E.O. Detecting Reports of Unsafe Foods in Consumer Product Reviews. JAMIA Open 2019, 2, 330–338. [Google Scholar] [CrossRef] [PubMed]
- Glowacki, E.M.; Glowacki, J.B.; Chung, A.D.; Wilcox, G.B. Reactions to Foodborne Escherichia Coli Outbreaks: A Text-Mining Analysis of the Public’s Response. Am. J. Infect. Control 2019, 47, 1280–1282. [Google Scholar] [CrossRef]
- Gao, W.; Fang, Y.; Li, L.; Tao, X. Event Detection in Social Media via Graph Neural Network. In Proceedings of the Web Information Systems Engineering—WISE 2021, Melbourne, VIC, Australia, 26–29 October 2021; Zhang, W., Zou, L., Maamar, Z., Chen, L., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 370–384. [Google Scholar]
- Rizzoli, V.; Mascarello, G.; Pinto, A.; Crovato, S.; Ruzza, M.; Tiozzo, B.; Ravarotto, L. ‘Don’t Worry, Honey: It’s Cooked’: Addressing Food Risk during Pregnancy on Facebook Italian Posts. Foods 2021, 10, 2484. [Google Scholar] [CrossRef] [PubMed]
- Hu, R.; Zhang, D.; Tao, D.; Hartvigsen, T.; Feng, H.; Rundensteiner, E. TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks. arXiv 2022, arXiv:2205.10726. [Google Scholar]
- Tao, D.; Zhang, D.; Hu, R.; Rundensteiner, E.; Feng, H. Crowdsourcing and Machine Learning Approaches for Extracting Entities Indicating Potential Foodborne Outbreaks from Social Media. Sci. Rep. 2021, 11, 21678. [Google Scholar] [CrossRef]
- Erraguntla, M.; Zapletal, J.; Lawley, M. Framework for Infectious Disease Analysis: A Comprehensive and Integrative Multi-Modeling Approach to Disease Prediction and Management. Health Inform. J. 2019, 25, 1170–1187. [Google Scholar] [CrossRef]
- Vasanthakumar, U.; Bryna Goh, J.R.; Hui, S.C.; Lam, K.Y.; Er, B.; Fua’di, M.T.; Aung, K.T. Fine-Tuning Pre-Trained Language Model for Urgency Classification on Food Safety Feedback. In Proceedings of the 2023 10th International Conference on ICT for Smart Society (ICISS), Bandung, Indonesia, 6–7 September 2023; pp. 1–10. [Google Scholar]
- Lee, C.K.H. Predicting Food Safety Violations via Social Media to Improve Public Health Surveillance. ECSM 2023, 10, 109–116. [Google Scholar] [CrossRef]
- Tao, D.; Hu, R.; Zhang, D.; Laber, J.; Lapsley, A.; Kwan, T.; Rathke, L.; Rundensteiner, E.; Feng, H. A Novel Foodborne Illness Detection and Web Application Tool Based on Social Media. Foods 2023, 12, 2769. [Google Scholar] [CrossRef] [PubMed]
- Hu, R.; Zhang, D.; Tao, D.; Zhang, H.; Feng, H.; Rundensteiner, E. UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small Expert-Labeled Tweets for Foodborne Illness Detection. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 15–18 December 2023. [Google Scholar]
- Molenaar, A.; Lukose, D.; Brennan, L.; Jenkins, E.L.; McCaffrey, T.A. Using Natural Language Processing to Explore Social Media Opinions on Food Security: Sentiment Analysis and Topic Modeling Study. J. Med. Internet Res. 2024, 26, e47826. [Google Scholar] [CrossRef] [PubMed]
- Harris, J.K.; Mansour, R.; Choucair, B.; Olson, J.; Nissen, C.; Bhatt, J. Health Department Use of Social Media to Identify Foodborne Illness—Chicago, Illinois, 2013–2014. MMWR Morb. Mortal Wkly. Rep. 2014, 63, 681–685. [Google Scholar] [PubMed]
- Harris, J.K.; Hawkins, J.B.; Nguyen, L.; Nsoesie, E.O.; Tuli, G.; Mansour, R.; Brownstein, J.S. Using Twitter to Identify and Respond to Food Poisoning: The Food Safety STL Project. J. Public Health Manag. Pract. 2017, 23, 577–580. [Google Scholar] [CrossRef] [PubMed]
- Harrison, C.; Jorder, M.; Stern, H.; Stavinsky, F.; Reddy, V.; Hanson, H.; Waechter, H.; Lowe, L.; Gravano, L.; Balter, S. Using Online Reviews by Restaurant Patrons to Identify Unreported Cases of Foodborne Illness—New York City, 2012–2013. MMWR Morb. Mortal Wkly. Rep. 2014, 63, 441–445. [Google Scholar] [PubMed]
- Joaristi, M.; Serra, E.; Spezzano, F. Evaluating the Impact of Social Media in Detecting Health-Violating Restaurants. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Davis, CA, USA, 18–21 August 2016; pp. 626–633. [Google Scholar]
- Sadilek, A.; Kautz, H.; DiPrete, L.; Labus, B.; Portman, E.; Teitel, J.; Silenzio, V. Deploying nEmesis: Preventing Foodborne Illness by Data Mining Social Media. AI Mag. 2017, 38, 37–48. [Google Scholar] [CrossRef]
- Tegtmeyer, R.; Potts, L.; Hart-Davidson, W. Tracing and Responding to Foodborne Illness. In Proceedings of the 30th ACM International Conference on Design of Communication, Seattle, WA, USA, 3–5 October 2012; Association for Computing Machinery: New York, NY, USA, 2012; pp. 369–370. [Google Scholar]
- Zou, B.; Lampos, V.; Gorton, R.; Cox, I.J. On Infectious Intestinal Disease Surveillance Using Social Media Content. In Proceedings of the 6th International Conference on Digital Health Conference, Montreal, QC, Canada, 11–13 April 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 157–161. [Google Scholar]
- Chen, J.; Wang, Y. Social Media Use for Health Purposes: Systematic Review. J. Med. Internet Res. 2021, 23, e17917. [Google Scholar] [CrossRef]
- Ahmed, S.F.; Alam, M.S.B.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Shawkat Ali, A.B.M.; Gandomi, A.H. Deep Learning Modelling Techniques: Current Progress, Applications, Advantages, and Challenges. Artif. Intell. Rev. 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
- Meir, Y.; Tevet, O.; Tzach, Y.; Hodassman, S.; Gross, R.D.; Kanter, I. Efficient Shallow Learning as an Alternative to Deep Learning. Sci. Rep. 2023, 13, 5423. [Google Scholar] [CrossRef] [PubMed]
- Stevens, L.M.; Mortazavi, B.J.; Deo, R.C.; Curtis, L.; Kao, D.P. Recommendations for Reporting Machine Learning Analyses in Clinical Research. Circ. Cardiovasc. Qual. Outcomes 2020, 13, e006556. [Google Scholar] [CrossRef]
- Deiner, M.S.; Deiner, N.A.; Hristidis, V.; McLeod, S.D.; Doan, T.; Lietman, T.M.; Porco, T.C. Use of Large Language Models to Assess the Likelihood of Epidemics from the Content of Tweets: Infodemiology Study. J. Med. Internet Res. 2024, 26, e49139. [Google Scholar] [CrossRef]
- Hasan, A.; Levene, M.; Weston, D.; Fromson, R.; Koslover, N.; Levene, T. Monitoring COVID-19 on Social Media: Development of an End-to-End Natural Language Processing Pipeline Using a Novel Triage and Diagnosis Approach. J. Med. Internet Res. 2022, 24, e30397. [Google Scholar] [CrossRef] [PubMed]
- Espinosa, L.; Salathé, M. Use of Large Language Models as a Scalable Approach to Understanding Public Health Discourse. PLoS Digit. Health 2024, 3, e0000631. [Google Scholar] [CrossRef] [PubMed]
- Narvaez Rojas, C.; Alomia Peñafiel, G.A.; Loaiza Buitrago, D.F.; Tavera Romero, C.A. Society 5.0: A Japanese Concept for a Superintelligent Society. Sustainability 2021, 13, 6567. [Google Scholar] [CrossRef]
- Digital Health. Available online: https://www.aihw.gov.au/reports/australias-health/digital-health (accessed on 20 December 2024).
- Bisbee, J.; Munger, K. The Vibes Are Off: Did Elon Musk Push Academics Off Twitter? In PS: Political Science & Politics; Cambridge University Press: Cambridge, UK, 2024; pp. 1–8. [Google Scholar] [CrossRef]
- Hassoun, A.; Jagtap, S.; Trollman, H.; Garcia-Garcia, G.; Abdullah, N.A.; Goksen, G.; Bader, F.; Ozogul, F.; Barba, F.J.; Cropotova, J.; et al. Food Processing 4.0: Current and Future Developments Spurred by the Fourth Industrial Revolution. Food Control 2023, 145, 109507. [Google Scholar] [CrossRef]
Author | Type | Social Media Platform | Country | Analyzed Language | Involved Structure | Aim of the Study | Results |
---|---|---|---|---|---|---|---|
Cui et al., 2017 [28] | Article | China | Restaurant | Extract data from Weibo (Chinese Twitter) and develop an algorithm for foodborne disease event detection. | Built an SVM classifier to classify each tweet into positive class (foodborne disease case) and negative class (noise) | ||
Denecke et al., 2013 [22] | Article | Twitter, Blogs, Forums, TV, Radio, Online News | EU | English, German | Home, Mass Gathering, Trips, Soccer Game | Study the usefulness of the M-Eco medical system for supporting and monitoring a population’s health status through social media. | A system can reduce overwhelming information to a manageable amount of signals. Experiments yielded a proportion between 5 and 20% of signals regarded as ‘relevant’ by the users; signals were mainly generated from Twitter. |
Effland et al., 2018 [30] | Article | Yelp | USA | English | Restaurant | Development and evaluation of a system to identify mentions of foodborne illness in Yelp restaurant reviews. | Effective information extraction from social media sites. The DOHMH system for foodborne illness surveillance in online restaurant reviews from Yelp was instrumental in identifying ten outbreaks and 8523 reports of foodborne illness associated with NYC restaurants since July 2012. |
Erraguntla et al., 2019 [40] | Article | Facebook, Twitter, ProMED Mail, World Health Organization (WHO), BBC Health News, CDC Morbidity and Mortality Weekly Reports, The Lancet Infectious Diseases, BMC Infectious Disease. | USA | English | Develop the Framework for Infectious Disease Analysis (FIDA), which integrates data and provides situational awareness, visualizations, predictions, and intervention assessments to extract health-related information for health situation awareness. | All the predictive models performed similarly, with SVM slightly better performance. SVM was chosen as the preferred predictive model. | |
Gao et al., 2021 [36] | Article | China | Propose a novel GNN-based model for event detection in social media to evaluate the performance of EDGNN by comparing it with state-of-the-art baseline models over a real-world foodborne disease event dataset. | A new Event Detection model is proposed based on GNN (EDGNN), which showed effectiveness in experiments on a real-world dataset. The presented method does not rely on the unique features of the dataset so that it can be generalized for event detection on various social media platforms. | |||
Glowacki et al., 2019 [35] | Article | USA | English | Text-mining analysis to look at tweets from two foodborne Escherichia coli outbreaks | Demonstration that social media sites such as Twitter can be used as a tool by public health agencies that wish to identify concerns about foodborne disease outbreaks. | ||
Harris et al., 2014 [46] | CDCP weekly report | USA | English | Restaurant | On 23 March 2013, the Chicago Department of Public Health (CDPH) and its civic partners launched FoodBorne Chicago (6), a website to improve food safety in Chicago by identifying and responding to complaints on Twitter about possible foodborne illnesses. | The effectiveness of social media for foodborne illness surveillance suggests mining tweets and restaurant reviews might aid in identifying and taking action on localized foodborne illness complaints that would otherwise go unreported. | |
Harris et al., 2017 [47] | Report | USA | English | Restaurant | Pilot study for evaluation of a Web-based Dashboard (HealthMap Foodborne Dashboard) to identify and respond to tweets About food poisoning from St Louis City residents. | The Web-based Dashboard captured 193 relevant tweets. Our replies to relevant tweets resulted in more filed reports than several previously existing foodborne illness reporting mechanisms in St Louis during the same time frame. | |
Harrison et al., 2014 [48] | CDC Weekly Report | Yelp | USA | English | Restaurant | Identify restaurant reviews on Yelp that refer to unreported foodborne illnesses. | The program identified 893 reviews that required further evaluation by a foodborne disease epidemiologist. Of the 893 reviews, 499 (56%) described an event consistent with foodborne illness, and 468 represented a disease within four weeks of the review or did not provide a period. |
Hu et al., 2022 [38] | Article | USA | English | Develop machine learning-based models for foodborne outbreak detection using Twitter’s publicly available annotated dataset, TWEET-FID. | The construction of TWEET-FID for multiple foodborne illness detection tasks, evaluation of single-task and multi-task models, and the observation of learning from weak labels. | ||
Hu et al., 2023 [44] | Conference | USA | English | Restaurants | Develop EGAL, a deep learning framework to detect foodborne illness from social media posts. | EGAL achieved 86.3% accuracy and 87.9% balanced accuracy (bACC). | |
Joaristi et al., 2016 [49] | Conference | Yelp | USA | English | Restaurant | Propose a new method to detect health-violating restaurants based on Yelp reviews and user behavior. | The proposed classification method improves the inspector’s ability and outperforms previous solutions. |
Kate et al., 2014 [24] | Article | Internet Forums | USA | English | Apply text mining techniques to identify and mine food safety complaints posted by citizens on web data sources. | The platform is a valuable tool to monitor content related to food safety complaints on internet forums and can help improve food safety practices and standards. | |
Lee et al., 2023 [42] | Conference | Yelp | USA | English | Restaurants | Predict food safety violations using Yelp reviews to identify high-risk establishments. | SVM achieved 86% recall, identifying high-risk establishments for inspection. |
Maharana et al., 2019 [34] | Article | Amazon reviews | USA | English | Online retail platform | Develop a framework for early identification of unsafe food products using Amazon product reviews and FDA recall data. | The approach can improve food safety by enabling early identification of unsafe foods, leading to timely recall and limiting the health and economic impact on the public. |
Mejia et al., 2019 [33] | Article | Yelp | USA | English | Restaurant | Examine how online reviews of restaurants can be used to identify hygiene violations and provide insights into restaurants’ hygiene practices between inspections. | The study demonstrates the potential benefits of online review data in informing inspection strategies and outcomes. Social media content and machine learning can address persistent social issues like restaurant hygiene. |
Molenaar et al., 2024 [45] | Article | Australia | English | General Public | Use NLP tools for sentiment analysis and topic modeling to explore food security discussions. | VADER showed 0.478 coherence score; negative sentiment received the highest engagement. | |
Nsoesie et al., 2014 [23] | Article | Yelp | USA | English | Restaurant | Assess whether crowdsourcing via food service reviews can be used as a surveillance tool with the potential to support efforts by local public health departments. | Online illness reports from platforms like Yelp could complement traditional surveillance systems by providing near real-time information on foodborne diseases, implicated foods, and locations. |
Rizzoli et al., 2021 [37] | Article | EU | Italian | Explore how and to what extent knowledge and perceptions of food risks during pregnancy are shared on social networks (Facebook). | The main results show that food risk is not among the most discussed topics, and the least known and debated food risks are the most widespread (e.g., campylobacteriosis). Sometimes, food risks, when addressed, were minimized or denied, and the belief to be ‘less at risk’ than peers for such risk (i.e., optimistic bias)was observed. | ||
Sadilek et al., [50] | Article | USA | English | Restaurant | Development of nEmesis, which applies machine learning to real-time Twitter data and analyzes the text of these tweets to estimate the probability that the user suffers from foodborne illness. | The adaptive inspection process is 64% more effective at identifying problematic venues than the current state of the art. | |
Sadilek et al., 2018 [29] | Article | Google Search | USA | English | Restaurant | Build FINDER, a machine-learned model for real-time detection of foodborne illness using anonymous and aggregated web search and location data. | FINDER improves the accuracy of health inspections; restaurants identified by FINDER are 3.1 times as likely to be deemed unsafe during the inspection as restaurants identified by existing methods. |
Schomberg et al., 2016 [26] | Article | Yelp | USA | English | Restaurant | Present a method able to form robust predictions of health code violation prevalence, identify restaurants with a high risk of health code violation, and validate increased surveillance coverage by using free text and tags created by Yelp reviewers. | The predictive model predicted health code violations in 78% of the restaurants receiving serious citations in our pilot study of 440 restaurants. |
Serban et al., 2019 [31] | Article | USA | English | Restaurant | Real-time processing of social media (Twitter) with SENTINEL: A syndromic surveillance system incorporating deep learning for classifying health-related tweets | The preliminary results are promising, with the system able to detect outbreaks of influenza-like illness symptoms, which existing official sources could then confirm. The Nowcasting module shows that using social media data can improve prediction for multiple diseases over simply using traditional data sources. | |
Tao et al., 2021 [39] | Article | USA | English | Restaurant | Employ Twitter as the data source and modify the language model BERTweet not only to predict if a consumer’s post (a tweet) indicates an incidence of foodborne illness but also to extract critical entities related to the foodborne illness incidence automatically. | Trends in Twitter data can be indicative of real-world foodborne outbreaks. The dual-task BERTweet model effectively extracted food entities, but challenges remain due to the inherent noisiness of social media data. | |
Tao et al., 2023 [43] | Article | USA | English | Restaurants | Develop a web-based tool for real-time foodborne illness outbreak detection using Twitter data. | Pretrained BERT models (BERTweet) showed robust performance for foodborne illness detection. | |
Tegtmeyer et al., 2012 [51] | Conference | USA | English | Determine how to use social web tools to track, trace, and respond to foodborne illness using data streams, analytical tools, bots, and dashboards. | The primary focus of our work is on this dashboard system. The system will allow for different levels of participation from users, a curious browser, affected reported, and active reviewers. | ||
Vasanthakumar et al., 2023 [41] | Conference | Other: feedback data from SFA’s CRMS | Singapore | English | Public feedback | Develop a method to automate the classification of urgency in food safety reports. | Fine-tuned BERT outperformed DistilBERT and XLNet for urgency classification tasks. |
Wang et al., 2017 [27] | Article | Yelp | USA | English | Restaurant | Retrieve and analyze Yelp reviews to predict foodborne illnesses in restaurants to prevent more people from being affected. | After performing text classification, we used the models to predict whether the review indicates foodborne illness with high probabilities. SVM and RNN perform better than others, with higher accuracy and F-scores. |
Widener et al., 2014 [25] | Article | USA | English | Restaurant, Supermarket | Understand how geolocated tweets can be used to explore the prevalence of healthy and unhealthy food across the contiguous United States. Examine whether tweets about unhealthy foods are more common in these areas. | The results show that these disadvantaged census tracts tend to have a lower proportion of tweets about healthy foods with a positive sentiment and a higher proportion of unhealthy tweets in general. These findings substantiate the methods used by the USDA to identify regions that are at risk of having low access to healthy foods. | |
Zhang et al., 2019 [32] | Article | China | Chinese | Restaurant | Improve the temporal and spatial precision of foodborne disease prediction based on big data. | The results determined the scientific issues regarding how to improve the temporal and spatial accuracy of foodborne disease outbreak risk prediction in Beijing. | |
Zou et al., 2016 [52] | Conference | EU | English | Restaurant | Infer IID (infectious intestinal diseases) occurrences from Twitter in England. | Our experimental results regarding predictive performance and semantic interpretation indicate that Twitter data contain a signal that could be strong enough to complement conventional methods for IID surveillance. |
Symptoms | Bacteria/Viruses | General Terms |
---|---|---|
puke, diarrhea, nausea, Vomit *, Throw * up, Tummy upset, Tummy pain, The runs, vomiting, stomach ache, Rat-bite fever, hiccups, wiping nose, vomit, Puk *, Abdominal ache, Abdominal pain, heartburn, # stomachache, Stomach pain, indigestion | Escherichia, Giardia, Listeri *, Norovirus, E.coli/Ecoli/E coli, Rotavirus, Salmonell *, Shigell * Staphylococcus, Helicobacter, shiga toxin-producing E. coli, o157:h7, ehec, enterohemorrhagic E. coli serotype o157:h7, Cyclospora, Cronobacter | stench, employees, humid, septic, Jesus, hell, dishes, the_best, high_quality, adorable, fabulous, craving, favorite, excellent, service, recommend, professional, delicious, wash_hands, burnt, ache, pain, cigarette, asshole, awful, rotten, bathroom, toilet, fuck, microwaved, shit, bitch, sucks, mold, mice, spider, exclaim, filthy, roach, DIRTY, I_found_a, clean, food_poisoning, dirty, truck + sick, stomach + hospital, fish, terrible, horrible, Sick, Food poisoning, Cryptococcosis, Spit *, Heartburn, Acidosis, Gastro *, Cryptosporidiosis, Diarrhea, Amebiasis, Anisakiasis, Ascariasis, Anthrax, Botulism, Brucellosis, Campylobacteriosis, Ciguatoxin, Cysticercosis, Hepatitis A, Roundworm, Diphyllobothr *, Isosporiasis, Leptospirosis, Toxoplasmosis, Viral, food poisoning, # foodpoisoning, I_love, healthy foods, vomiting, Pregnancy, negative and hygiene-related words, prevention, FDA, food reservoirs, food safety, foodborne illness, foodborne transmission, hemolytic uremic syndrome, hus, out-break, phac, Public Health Agency of Canada, romaine lettuce, shiga toxin, stec, United States Food and Drug Administration, flu, cold, pesticide, Christ, Centers for Disease Control, affordable, reflux, CDC, stomach, rotten, pungency, foul, pool, poison, Being mothers in…., bacterium, ill, food poison, pushy, Weaning, bacteria, foodpoisoning, label, oldschool, Mothers, advisory |
Author (Year) | Use of Machine Learning | Tested ML Models | Best ML Model | Best Performance | Value (%) |
---|---|---|---|---|---|
Cui et al., 2017 [28] | Yes, shallow | SVM | SVM | ||
Denecke et al., 2013 [22] | Yes, shallow | SVM, K-means algorithm | SVM | Precision | 92 |
Effland et al., 2018 [30] | Yes, shallow | J4.810 decision tree, Logistic Regression, RF, SVM | Logistic Regression | Precision | 96 |
Erraguntla et al., 2019 [40] | Yes, shallow | SVM, Linear Regression (with nonlinear and interaction terms), decision tree-based boosting | SVM | Average Mean Square Error | 14 |
Gao et al., 2021 [36] | Yes, deep | GNN, EDGNN, BiGRU, BERT, CRFTM, CNN, LSTM | |||
Harris et al., 2014 [46] | Yes | Supervised Learning Algorithm | Supervised Learning Algorithm | ||
Harris et al., 2017 [47] | Yes, shallow | Accuracy | 66.4 | ||
Hu et al., 2022 [38] | Yes, deep | RoBERTa, BiLSTM, MGADE, MV, BSC, | RoBERTa | Accuracy | 84.7 |
Hu et al., 2023 [44] | Yes, deep | EGAL | EGAL | Accuracy | 86.3 |
Joaristi et al., 2016 [49] | Yes, shallow | SVM, Logistic Regression, and Random Forest | F1 Score Value | 95 | |
Kate et al., 2014 [24] | Yes, shallow | Multinomial Naive Bayes, k-NN, SVM | SVM | Recall | 67.3 |
Lee et al., 2023 [42] | Yes, shallow | Decision Tree, Random Forest, SVM | SVM | Recall | 86 |
Maharana et al., 2019 [34] | Yes, deep | SVM, Multinomial Naive Bayes, Weighted Logistic Regression, BERT, Autoencoder Neural Network | BERT | Precision | 78 |
Mejia et al., 2019 [33] | Yes, shallow | Naive Bayes classifier, Linear Regressions | |||
Molenaar et al., 2024 [45] | Yes, shallow | VADER (Sentiment Analysis), Latent Dirichlet Allocation (LDA) | VADER | Coherence Score | 0.47 |
Sadilek et al., [50] | Yes, shallow | SVM | SVM | Recall | 96 |
Sadilek et al., 2018 [29] | Yes, shallow | Supervised Machine-learned Classifier | Roc | 85 | |
Schomberg et al., 2016 [26] | Yes, shallow | Logistic Regression, | Logistic Regression | Auc | 98 |
Serban et al., 2019 [31] | Yes, deep | CNN, SVM, Multinomial Naive Bayes | CNN | Accuracy | 85.4 |
Tao et al., 2021 [39] | Yes, deep | Dual-task BERTweet | Dual-task BERTweet model | Recall | 88.6 |
Tao et al., 2023 [43] | Yes, deep | BERTweet, RoBERTa, BiLSTM, MGADE, LDA | BERTweet | Balanced Accuracy (bACC) | 87.9 |
Vasanthakumar et al., 2023 [41] | Yes, deep | BERT, DistilBERT, XLNet | BERT | Precision | 86 |
Wang et al., 2017 [27] | Yes, shallow | NB, SVM, RF, RNN, generalized linear model | |||
Widener et al., 2014 [25] | Yes, shallow | Logistic Regression | Logistic Regression | ||
Zhang et al., 2019 [32] | Yes, shallow | Bayesian Regression, Linear Regression, ElasticNet Regression, SVR, and GBR | |||
Zou et al., 2016 [52] | Yes, deep | Elastic Net, GP, Skip-gram for word embeddings | Pearson Correlation (R) | 71.1 |
Author (Year) | Domain 1: Participants | Domain 2: Predictors | Domain 3: Outcome | Domain 4: Analysis |
---|---|---|---|---|
Cui et al., 2017 [28] | Unclear Risk | Unclear Risk | Unclear Risk | Unclear Risk |
Denecke et al., 2013 [22] | High Risk | High Risk | High Risk | Unclear Risk |
Effland et al., 2018 [30] | High Risk | High Risk | High Risk | Unclear Risk |
Erraguntla et al., 2019 [40] | High Risk | High Risk | High Risk | Unclear Risk |
Gao et al., 2021 [36] | High Risk | High Risk | High Risk | Unclear Risk |
Glowacki et al., 2019 [35] | Unclear Risk | Unclear Risk | Unclear Risk | Unclear Risk |
Harris et al., 2014 [46] | Unclear Risk | Unclear Risk | Unclear Risk | Unclear Risk |
Harris et al., 2017 [47] | High Risk | High Risk | High Risk | Unclear Risk |
Harrison et al., 2014 [48] | Unclear Risk | Low Risk | Unclear Risk | Unclear Risk |
Hu et al., 2022 [38] | High Risk | High Risk | High Risk | Unclear Risk |
Hu et al., 2023 [44] | Low Risk | Unclear Risk | High Risk | High Risk |
Joaristi et al., 2016 [49] | High Risk | High Risk | High Risk | High Risk |
Kate et al., 2014 [24] | High Risk | High Risk | High Risk | Unclear Risk |
Lee et al., 2023 [42] | Unclear Risk | Unclear Risk | Unclear Risk | Unclear Risk |
Maharana et al., 2019 [34] | High Risk | High Risk | High Risk | High Risk |
Mejia et al., 2019 [33] | High Risk | High Risk | High Risk | Unclear Risk |
Molenaar et al., 2024 [45] | Unclear Risk | Unclear Risk | Unclear Risk | Unclear Risk |
Nsoesie et al., 2014 [23] | High Risk | High Risk | High Risk | High Risk |
Rizzoli et al., 2021 [37] | High Risk | High Risk | High Risk | High Risk |
Sadilek et al., [50] | Unclear Risk | High Risk | High Risk | Unclear Risk |
Sadilek et al., 2018 [29] | Unclear Risk | High Risk | High Risk | Unclear Risk |
Schomberg et al., 2016 [26] | Unclear Risk | Unclear Risk | Unclear Risk | Unclear Risk |
Serban et al., 2019 [31] | High Risk | High Risk | High Risk | High Risk |
Tao et al., 2021 [39] | High Risk | High Risk | High Risk | High Risk |
Tao et al., 2023 [43] | Unclear Risk | Unclear Risk | Unclear Risk | Unclear Risk |
Tegtmeyer et al., 2012 [51] | Unclear Risk | Unclear Risk | Unclear Risk | Unclear Risk |
Vasanthakumar et al., 2023 [41] | Unclear Risk | Unclear Risk | Unclear Risk | Unclear Risk |
Wang et al., 2017 [27] | Unclear Risk | High Risk | High Risk | Unclear Risk |
Widener et al., 2014 [25] | High Risk | High Risk | High Risk | High Risk |
Zhang et al., 2019 [32] | Unclear Risk | High Risk | High Risk | Unclear Risk |
Zou et al., 2016 [52] | Unclear Risk | High Risk | High Risk | Unclear Risk |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Salaris, S.; Ocagli, H.; Casamento, A.; Lanera, C.; Gregori, D. Foodborne Event Detection Based on Social Media Mining: A Systematic Review. Foods 2025, 14, 239. https://doi.org/10.3390/foods14020239
Salaris S, Ocagli H, Casamento A, Lanera C, Gregori D. Foodborne Event Detection Based on Social Media Mining: A Systematic Review. Foods. 2025; 14(2):239. https://doi.org/10.3390/foods14020239
Chicago/Turabian StyleSalaris, Silvano, Honoria Ocagli, Alessandra Casamento, Corrado Lanera, and Dario Gregori. 2025. "Foodborne Event Detection Based on Social Media Mining: A Systematic Review" Foods 14, no. 2: 239. https://doi.org/10.3390/foods14020239
APA StyleSalaris, S., Ocagli, H., Casamento, A., Lanera, C., & Gregori, D. (2025). Foodborne Event Detection Based on Social Media Mining: A Systematic Review. Foods, 14(2), 239. https://doi.org/10.3390/foods14020239