HIV/AIDS continues to be a critical global health issue, with more than 32 million cumulative deaths worldwide [1]. Several studies have characterized the influence of alcohol consumption (i.e., drinking alcohol) in the transmission of HIV [2] and the progression to AIDS [2, 3], making it important to study the intersection between these two public health issues. For example, among men who have sex with men (MSM) in the United States, heavy alcohol consumption was associated with unprotected anal intercourse [4, 5]. Similarly, for both male patrons [6] and female sex workers [7, 8], there was a significant association between drinking before intercourse and unprotected sex.

Analysis of “social big data” has been an evolving tool used by researchers to investigate chronic and infectious diseases [9,10,11]. There are discrepancies in the definition of social big data among researchers, and this definition will likely continue to change as new technologies and approaches are developed [12,13,14,15]. However, for the purpose of this manuscript, we define social big data as being large, near real-time data from technologies that are the result of social interactions and can be used to provide insights about people’s attitudes, behaviors, and social interactions. This would include data sources such as social media, internet search tools (e.g., Google searches), wearable devices, and blogs. Social data are being explored as new powerful tools for disease surveillance and information dissemination [16,17,18,19,20,21]. For example, a review by Bernardo et al. found that 65.5% of included studies showed social media data being correlated with traditional surveillance data (e.g., case reporting) [22]. Other studies have found similar correlations to health data using wearable device, social media, and internet search data [23,24,25,26]. Use of social big data as digital surveillance tools may help to improve the speed of surveillance and estimates of new cases. It may allow public health agencies to more efficiently deploy resources to launch prevention campaigns, inform healthcare providers of new patients, and prevent morbidity and mortality.

There is a need for improved surveillance on alcohol consumption, both broadly and specific to HIV. Traditional HIV and alcohol-related surveillance data, often suffer from lag times in reporting as they are typically measured from surveys, clinical case visits (such as emergency department data) and other methods that require extensive time for collection and analysis. Digital surveillance, or methods that use social big data for surveillance, might help to address this problem and be used to provide near real-time assessments of people’s alcohol consumption and/or alcohol consumption during sexual encounters. These approaches can be tailored for specific at-risk populations, especially during times of an infectious disease outbreak (Charles-Smith et al.). However, there are very limited studies describing how these types of methods might be applied to the intersection between HIV and alcohol consumption, or alcohol consumption more broadly.

In this manuscript, we describe three types of social data sources (i.e., social media data, internet search data, and wearable device data) that might be further studied and used in surveillance of alcohol and HIV, and then discuss the implications and potential of implementing them as additional tools for public health surveillance. Because limited research has been conducted on this topic, we borrow from research approaches used in other fields and give example ideas of how they might be applied and studied in future research on alcohol and HIV.

Social Media

Investigators have already conducted studies on alcohol use and related content (e.g., images, texts, etc.) on social media, especially among youth and young adults. Alcohol content on social media has predominantly depicted the positive aspects of alcohol use behaviors [27, 28] and received positive responses and engagement from viewers [28]. This, in turn, has the capacity to promote positive perceptions about alcohol consumption [29]. For example, Nesi and colleagues examined social media usage and content among middle school aged youth and found that exposure to peers’ alcohol posts on Facebook was predictive of drinking initiation and binge drinking among this population [30]. The peer normative role of alcohol consumption on social media persists in young adults [29, 31, 32].

Alcohol-related content posted on social media not only influences viewers’ drinking behavior but might also be predictive of the users’ own alcohol use. A review by Curtis et al. looked at 19 studies on social media use and self-reported drinking and alcohol-related problems among adolescents and young adults. They found a correlation between the users’ social media engagement with their self-reported drinking and alcohol-related problems, with higher engagement leading to higher consumption and problems [33]. As an example, Litt et al. studied alcohol-related Twitter posts (tweets) among young adults to determine how these Tweets corresponded to the user’s self-reported alcohol use and behaviors. They found that having a higher proportion of alcohol-related tweets predicted willingness to drink, number of drinks a week, negative consequences of drinking, and problem drinking [34]. This effect is likely not unique to one particular social media site, such as Twitter, as studies on Facebook found similar results [35, 36].

In addition to analyzing data from individuals, population-level studies of the impact of social media content and alcohol use have also been conducted. Curtis and colleagues developed models to predict county-level alcohol consumption based on tweets. They found that tweet content was significantly associated with excess alcohol consumption (B. Curtis et al., 2018). Similarly, at the state level, the odds of recent alcohol use have been shown to have increased related to exposure to alcohol use tweets [37]. On a more local level, Hossain et al. compared alcohol consumption between a large urban area and a large suburban/rural area using geo-tagged tweets. They found a positive correlation between alcohol consumption and density of alcohol outlets as depicted in the tweets, which varied by geographic location [38].

Due to the recency of this field of study, there are limited studies looking at the intersection of HIV and alcohol risk behaviors on social media, making this an important area of need. Cornelius and colleagues examined three-months of tweets related among youth in Botswana. Alcohol-related tweets (i.e. wine, beer, Spirits) were the eighth most frequently tweeted content. They also identified alcohol consumption and unsafe sex practice (i.e. sex without a condom) as a trend in the tweets [39]. There have also been a small number of studies looking at the relationship between social media data and other substances (e.g., opioids), as well as studies looking at the relationship between social media data and HIV risk [40, 41]. For example, Cuomo et al. investigated tweets regarding intravenous drug use and HIV transmission risk pre and post an HIV epidemic. They found that Tweets about opioid use were significantly associated with HIV and opioid burden [42].

Although several studies on alcohol-related content on social media and its association with alcohol consumption focus on youth and young adults, limited studies exist among other age groups or high risk populations, especially in the context of HIV risk behavior, making this an important area of future study. Given the relationship between alcohol and HIV, it would be of interest to study the effects of alcohol-related social media posts and its subsequent impact on HIV risk behaviors. It has been shown that study participants not only post about alcohol while drinking, but even when intoxicated [43]. More research is needed to determine whether and how social media users post about sexual risk behaviors and related alcohol use.

Overall, there are a number of research (and ultimately, potential implementation) areas to study that could use social data as methods for surveillance on alcohol and HIV. Because social media data occur in real time, are publicly-available, and include personal health information shared by users, social media data might provide insights about people’s alcohol consumption, including recent experiences consuming alcohol during sex. These data often provide targeted location information, which could help to inform public health efforts on how, where, and when people are engaging in alcohol use during sex. However, it is unknown how much data would be available on this topic, making it an area for future research.

Internet Search

Health researchers have used internet search trends (e.g., Google Trends data) to study a variety of topics including infectious diseases, mental health, substance use, and chronic conditions [18, 41, 44,45,46]. Compared to studies on the relationship between social media and alcohol use, fewer studies have looked at alcohol-related internet search trends. A study conducted by Parker et al. looked at the ability of internet search trends to forecast premature deaths and predicted an increase in state-level alcohol-induced deaths using the data [47]. Other investigators have examined the relative search volume of alcohol-related words on Google and state-level alcohol use, suggesting that alcohol-related query terms were associated with current alcohol use, and that the volume of search was affected by state alcohol policies [37]. Outside of health research, researchers have also studied the impact of economic conditions on alcohol search, finding that unemployment was positively associated with alcohol-related search queries [48].

Similar to social media data/research at the intersection of alcohol and HIV risk, there is a need for more studies using internet search data at the intersection of alcohol and HIV risk. HIV studies have already shown promise using internet search data to predict HIV diagnoses in the US and China [19, 41, 49]. Though it has its limitations, modeling using internet search trend data may be a cost-effective HIV surveillance method [50]. These methods might be especially relevant and immediately implementable in low resource areas that lack surveillance tools [18, 45]. For example, in regions where there are no current methods available to track surveillance, internet search data might be used to provide initial estimates and/or changes in trends in cases without incurring costs. These approaches are therefore immediately implementable in low resources settings where the current alternative to these digital surveillance approaches is to not collect any information. However, it remains important to continually study and gain support of the citizens to ensure proper ethical implementation.

It is important to note that while search data are collected and reported in aggregate (i.e., aggregate number of searches for “alcoholism” within a certain region is reported), social media data can be collected at the individual level. There are therefore trade-offs in the utility of social media versus internet search data, suggesting that approaches integrating both of these data sources could be useful.

Wearable Devices, Apps and Sensors

Smartphone applications (apps) are increasingly being used for monitoring alcohol consumption [51, 52] and broader health promotion [52]. For example, smartphone apps are used as a data collection tool for alcohol consumption [53] and for intervention delivery [54, 55]. A recent HIV study by Trang et al. tested the feasibility of an ecological momentary assessment application (EMA) and wearable device among MSM to monitor HIV related risk behaviors. They collected information about physical (ambulatory heart rate and physical activity) and mental health, risk behaviors, social environment, and geographic locations. Overall, study participants had positive feedback about their experience citing preferences for a more tailored experience (i.e., providing feedback on mood and behavior, MSM-friendly messaging) in the future. Additionally, participants stated the ability for telemedicine and locations of HIV service providers nearby as desirable features for future apps [56]. However, a limiting factor of using data from EMA’s and other apps is that they would require typically individuals to download the app. If a sufficient number of individuals did not download or use the app/EMA frequently, then there would not be sufficient data to provide meaningful results. Although these issues affecting amount of sensor data (e.g., user ability/willingness to download an app and/or share data) currently limit the utility of data science approaches, we expect in the future that these types of data will be increasingly available and able to analyzed based on trends in technology development and data sharing [57].

The use of technology to determine alcohol intoxication has evolved from breath analyzers to include electrochemical biosensors that can detect biofluids such as sweat, interstitial fluid, tears, and saliva [58]. Kai-Chin et al. developed a device designed to measure ethyl glucuronide in sweat and were successful in detecting alcohol consumption after one, two, and three dinks within one hour. This non-invasive approach would allow researchers to study light to moderate drinking among study participants [59]. Other researchers have developed software systems that could allow collection of alcohol consumption data in a more naturalistic setting using a wearable device [60].

There are also studies on the feasibility of using mobile apps as intervention tools to reduce alcohol consumption [61] or minimize relapse [62]. Chih et al. developed a predictive model to provide targeted feedback to study participants using a smartphone app to prevent relapse. After data about alcohol consumption were analyzed by the model, if the participant was predicted to relapse, the smartphone would send a text message to participant and also alert the participant’s counselor [63]. These novel studies provide support for health researchers to incorporate technologies into HIV and alcohol interventions to test efficacy.

What is needed in Future Research?

Both excessive alcohol consumption and HIV are significant public health concerns. They also have overlapping health risks/comorbidities and stigma, making it important to study them together as well as separately [64,65,66,67]. Early detection and immediate response, including the use of new technologies and data surveillance methods, are crucial for the prevention of morbidity and mortality. Social media, internet search, and wearable device/sensor data might be used to serve as an early warning system. Even with a warning of a couple of weeks in advance, these types of tools might greatly assist public health preparation and response. To supplement, or even in place of, manual coding, investigators could use machine learning [68] and deep neural network analysis [69] to comb through social media posts to analyze content for potential outbreaks and predictive risk behaviors in a timely manner [70]. Importantly, due to the changing use of and perceptions of use around technologies, ongoing ethical studies need to be conducted to ensure safety and proper ethical use [71,72,73,74,75,76].

There are a number of potential ways to incorporate social data analyses at the intersection of alcohol-related behaviors and HIV risk among high risk populations. For example, artificial intelligence models might be incorporated to combine current (traditional) case reporting data on alcohol and HIV risk behaviors with social data. As there are limitations with every type of data source, models would likely combine traditional data with as multiple novel data sources (e.g., combining social media, internet search, and/or sensor data rather than using just one of these sources) as possible to address the limitations of each. When combined, models might be able to assist public health departments and epidemiology researchers in their surveillance efforts by providing more timely data that address lag times in reporting traditional data, as well as to provide greater insights into psychological and demographic predictors of alcohol and HIV risk behaviors. Importantly, these models and approaches should be collaborative efforts between scientific researchers, industry, and public health agencies to most rapidly, safely, and effectively address public health needs.

The approaches discussed in this manuscript are becoming increasingly relevant as a result of the COVID-19 pandemic. Due to COVID-19 policies and people’s increased time online, it is likely that an even larger amount of digital data are currently being created and can mined to learn about people’s health behaviors. For example, our team has already conducted some of this work, including studying how mobility data (i.e., movement data acquired from smartphone devices) can be used to inform COVID-19 transmission, use of Instagram/image data to inform adherence to COVID-19 social distancing orders, and to more broadly monitor the effects of internet and social media use on mental health, HIV, and substance use during the pandemic. Future research may focus on how the COVID-19 impacts the use of these approaches for alcohol and HIV research and surveillance.