1 Introduction
Interpersonal interactions are conventionally analyzed through manual observations (e.g., manual annotation of research videos) in behavioral research and healthcare studies. The manual research data extraction process requires much human labor and is limited by the availability of annotation experts (e.g., human video coders). Therefore, developing accurate and automatic analysis has great potential to address the current barrier in research as well as rapid assessment and individualized treatment recommendations in clinical settings.
This work focuses on the quantification of an interpersonal interaction session’s positivity and negativity in the context of conflict-resolution interactions in intimate couples before and after acute alcohol consumption. Here, positive interaction refers to the prosocial or relationship-enhancing behaviors directed toward the partner, including acceptance, relationship-enhancing attributions, self-disclosure, and humor, while negative interaction is hostile or relationship-damaging behaviors directed toward the partner, such as psychological abuse, distress-maintaining attributions, withdrawal, and dysphoric effect [
30]. Acute alcohol consumption may change human behaviors [
57], have a significant impact on marital/relationship functioning [
47], and play an important role in people’s health [
46]. Traditionally, manual observations are applied to understand the acute effects of alcohol on dyadic behaviors [
18,
57]. The overall level of positivity of an interaction session, named positive interaction intensity in the article for simplicity, can be quantified as the ratio between the number of positive behavior occurrences and the total (positive, negative, neutral, and “other”) of behavior occurrences. Similarly, negative interaction intensity can be quantified as the ratio between the number of negative behavior occurrences and the total occurrences.
While automatically detecting all types of behavior occurrences to completely replicate human annotation using technology is still unrealistic, state-of-the-art
computer vision (CV) based human sensing technology enables autonomous and accurate detection and tracking of elementary non-verbal cues [
10]. Non-verbal behaviors (e.g., head shaking, facial expressions, and body leaning), the elements of interaction besides the spoken words [
11], account for a significant portion of interpersonal interactions [
15]. In all interpersonal communication, such as clinical communication between patients and healthcare providers, a great amount of information is conveyed through non-verbal behaviors, such as body gestures, facial movement, interpersonal distance, and so on [
11]. However, there has been no investigation on the feasibility of using these cues to estimate the intensity of interaction, especially under the impact of acute alcohol consumption within intimate couples. If feasible, such technologies have a great potential to be used for convenient behavioral health management, such as self-monitoring and assessing behavioral change over time from baseline functioning for physical and mental health evaluation.
The relations between non-verbal cues and positive and negative interactions are non-trivial. For example, smiling usually shows a positive attitude; however, it may also indicate disbelief or derision and thus plays a negative role in certain contexts. Non-verbal cues of interaction are dynamic and function together to convey information [
45]. Therefore, although a single cue may not clearly indicate if an interaction is positive or negative by itself, fusing multi-modal cues may improve the certainty [
33]. Similarly, a single cue’s level does not necessarily reflect the intensity of an interaction, i.e., how strong the interaction is. By combining multiple cues, we hope to improve the intensity estimation accuracy.
Machine learning has been widely used for data-driven human behavior analysis [
21]. Regression models are effective tools to investigate the association between elementary behavioral cues and high-level activity interpretations [
19]. Nevertheless, very limited works have been conducted to link non-verbal cues to the intensity of positive and negative interactions. Therefore, as a starting point to fill this gap, we tested the feasibility of interaction intensity estimation through machine learning-based regression using CV-based elementary non-verbal behavioral cues. First, we designed new measurements based on elementary cues as features for regression. Then, regression models were trained using the
Couple Conflict Dataset (CCD) [
57], which contains conflict-resolution conversation videos specifically recorded to investigate the impact of acute alcohol consumption on intimate couples [
24]. We analyzed the feasibility and performance of different regression models. The results confirmed the potential of using non-verbal cues to evaluate the interaction intensities. Analyses also demonstrated that alcohol consumption altered non-verbal behavioral cues and increased estimation errors as a consequence. While the ideal solution is training new models using large after-alcohol datasets, data collection in alcohol consumption states is challenging in real life. Thus, we proposed a domain adaptation method to recycle knowledge discovered in no-alcohol states and emphasize new knowledge learned from a small amount of after-alcohol data to improve estimation accuracy. These results provided important references for both future machine-learning designs and real-world applications.
To the best of our knowledge, this work is the first investigation of using non-verbal behavioral cues to estimate the intensity of interaction within intimate couples and study the impact of alcohol consumption in this context. In summary, this work has four unique contributions:
—
We proposed a novel set of non-verbal behavioral features to estimate positive and negative interaction intensities in intimate couples’ interactions. More details are in
Section 3.2.
—
This study investigated the relationship between non-verbal behavioral cues and interaction intensity through regression modeling. Analysis showed that
neural network (NN) outperformed other common regression models by limiting the estimation error to 0.11 overall. The results are presented in
Section 4.2.1.
—
We studied how acute alcohol consumption impacted the estimation. Regression errors of the no-alcohol state and those of the after-alcohol state were compared. Results showed that acute alcohol consumption could significantly increase estimation errors. More details are discussed in
Sections 4.2.2 and
4.2.3.
—
We designed a domain adaptation model to improve after-alcohol estimation. Collecting behavioral data in risky states (e.g., alcohol consumption) is difficult; this new method could significantly improve estimation performance without the need of retraining models using a large after-alcohol dataset. This provides a reference to relieving data collection burdens for human behavior analysis in challenging states. Details can be found in
Section 4.3.
The rest of this article is organized as follows.
Section 2 reviews background work related to primary contributions.
Section 3 presents the research methods.
Section 4 details the analyses and results. Finally,
Section 5 summarizes the study and discusses the contributions, limitations, and future research directions.
4 Results
We first analyzed if there was redundancy among the non-verbal measurements (features), i.e., if multiple measurements presented the same/similar information. Then, the performance of different regression models in estimating Ipos and Ineg are compared. After that, according to the best-performed model, we investigated how the model performed in different training and testing configurations: training and testing within and across states (no-alcohol and after-alcohol). These comparisons helped us understand how alcohol influenced the estimation, for instance, how well people’s no-alcohol behaviors can estimate their interaction after alcohol consumption. Finally, we evaluated the performance of the proposed SIDA framework.
4.1 Relations among Non-Verbal Behavioral Features
To estimate the level of redundancies among the non-verbal features, Pearson’s correlation coefficients were calculated.
Figure 5 shows the results for females in the no-alcohol state (F_NoAlc), males in the no-alcohol state (M_NoAlc), females in the after-alcohol state (F_Alc), and males in the after-alcohol state (M_Alc), respectively.
For all four groups, non-verbal behavioral features were minimally correlated. The only significant correlation found was on the moderate-strong between PER and NER (\(\textit{r}=-0.62\) – \(-0.73,p \lt 0.01\)). If a person showed more positive emotions, she/he tended to show fewer negative emotions within a limited session time, and vice versa. However, since each person might show a different number of neutral emotions, PER \({+}\) NER \({\neq}\)1, and thus we cannot simply derive one number using the other. Therefore, both PER and NER were kept with all other measurements for regression.
4.2 Regression Performance
Each non-verbal behavioral feature was normalized to [0, 1] using the min-max normalization and thus contributed equally to the regression. Due to the limited sample size, a repeated (n = 3) five-fold cross-validation was applied. Root mean square error (RMSE) was used as the outcome measurement.
4.2.1 Estimation Errors in the No-Alcohol State and the After-Alcohol State.
Figure 6(a) and (b) shows the RMSE of each algorithm in the no-alcohol and after-alcohol categories, respectively. NN achieved the lowest RMSEs (except F_NoAlc_Pos in the no-alcohol state), while all others performed similarly. To simplify the presentation, LR was selected as a comparison baseline to demonstrate NN’s advantages. As shown in
Figure 7, the paired two-sided
t-test showed that NN’s RMSEs were significantly lower (
p \({ \lt }\) 0.05) than LR’s in all comparisons except the positive intensity of females (F_NoAlc_Pos) and males (M_NoAlc_Pos) in the no-alcohol state.
In the no-alcohol state, the RMSEs of Ipos were significantly lower than those of Ineg for both females (LR: p \({ \lt }\) 0.01; NN: p \({ \lt }\) 0.01) and males (LR: p \({ \lt }\) 0.01; NN: p \({ \lt }\) 0.01), showing that the association between the measurements and Ipos was less complicated. This pattern was not shown for NN in the after-alcohol state. These indicate that alcohol intake could have changed how non-verbal behaviors were associated with positive and negative interaction intensities. Therefore, in the next section, we conducted a comparison between the no-alcohol state and the after-alcohol state in terms of interaction intensities.
4.2.2 Comparison between the No-Alcohol State and the After-Alcohol State.
Figure 8 shows the comparisons between the no-alcohol state and the after-alcohol state. For both NN and LR, RMSEs in the no-alcohol state were mostly significantly lower than those in the after-alcohol state. Therefore, in general, the results indicate alcohol intake complicates the association between non-verbal behavioral features and interaction intensities. In other words, the data distribution of dependent variables (
Ipos and
Ineg) and/or independent variables (non-verbal behavioral features) might be changed and the connections between the dependent and independent variables might become more complex after alcohol intake. While explaining the biological and psychological causes and mechanisms underneath this phenomenon is beyond the scope of the current work, we can quantify the impact of alcohol through the change of the estimation error on different training and test configurations.
4.2.3 Evaluating the Impact of Alcohol in Terms of Estimation Error.
The estimation models have the potential to be applied to real-life solutions, such as helping alcohol consumers self-monitor how alcohol impacts their interactions. Developing a robust and mature model requires a much larger dataset than the CCD. This reflects data scarcity, one of the major challenges of using machine learning for human behavior analysis. While collecting data in a general and normal state (i.e., when people are not impacted by alcohol) to train models may be realistic, it is much harder to collect data in a risky state (i.e., right after consuming alcohol).
To overcome the data scarcity, in the realm of the current study, we initially explored training a regression model in the no-alcohol state and applying the trained model in the after-alcohol state. This may decrease the estimation accuracy due to data distribution disparities across the training and test samples; however, if the extra error introduced is acceptable in practice, this method is worthwhile as it eliminates the need for the difficult and time-consuming data collection in alcohol consumption states.
Therefore, we quantified how much extra error would be introduced when using models trained by no-alcohol data to estimate the after-alcohol state. This investigation was done through the comparison among three different training and testing configurations: Configuration 1: both training and testing on no-alcohol data (reference). Configuration 2: both training and testing on after-alcohol data (ideal configuration). Configuration 3: training on no-alcohol data and testing on after-alcohol data (the alternative).
We anticipated that the estimation errors in configurations 1 and 2 would be lower than that in configuration 3, due to the change of the association between non-verbal behavioral cues and the intensities.
Figure 9 shows the comparison results. Similar to the earlier content, NN performed better than LR (lower RMSEs) in general. In all cases in LR and NN, RMSEs were at the lowest in configuration 1. In two out of the four cases in LR (F_Neg_LR, M_Neg_LR) and all cases in NN, RMSEs were at the highest in configuration 3. The two exceptions in LR were quite close to the highest. For LR, the average RMSE across both genders and interaction intensities was 0.105 in Configuration 1 and 0.153 in Configuration 3. For NN, the average RMSE across both genders and interaction intensities was 0.095 in Configuration 1 and 0.147 in Configuration 3.
All RMSEs in Configuration 2 were higher than those in Configuration 1. In general, this observation was aligned with the result in
Section 4.2.2 that alcohol intake increased the estimation error. For LR, the RMSEs in configuration 2 were mostly comparable (except M_Pos_LR) to those in configuration 3, while for NN, in three out of four cases (except M_Neg_NN), RMSEs in configuration 2 were significantly lower than those in configuration 3. For LR, the average RMSE across both genders and interaction intensities was 0.145 in configuration 2 and 0.153 in configuration 3, respectively. For NN, the average RMSE across both genders and interaction intensities was 0.123 in configuration 2 and 0.147 in configuration 3, respectively.
So far, there were three major findings from these data analyses: (1) NN outperformed other regression models; (2) Alcohol intake significantly increased estimation errors; (3) The intensity estimation was better achieved by a model trained using data collected in the same state. These laid a foundation for the domain adaptation work described in the next section.
4.3 Domain Adaptation from the No-Alcohol State to the After-Alcohol State
4.3.1 Distribution Change of Interaction Intensities and Non-Verbal Behavioral Features.
Testa et al. [
57] showed that the immediate effects of alcohol consumption on couple interaction behaviors appeared more positive than negative. Therefore, we anticipated that the distribution of
Ipos and
Ineg could have been changed accordingly. The Kolmogorov–Smirnov test was applied to evaluate the distribution change. As shown by the histograms in
Figure 10, we found significant changes in
Ipos for females (
p \({ \lt }\) 0.01) and males (
p \({ \lt }\) 0.01), while the changes in
Ineg were non-significant.
As discussed in
Section 3.4, we anticipated some non-verbal behavioral features would have been changed significantly due to alcohol, while others stayed stable. Kolmogorov-Smirnov test showed that the magnitude of moderate head shaking (
p \({ \lt }\) 0.01) and visual attention (
p \({ \lt }\) 0.05) changed significantly in females, and the magnitude of strong head shaking (
p \({ \lt }\) 0.01) changed significantly in males. The distribution changes of other features were not significant.
Figure 11 illustrates these significant distribution differences, which might be the primary roots of the significant
Ipos distribution changes and thus used as the inputs to the Sta-Ind model in
Figure 4.
4.3.2 Performances of Domain Adaptation.
Since
Ipos was changed significantly after alcohol consumption, this section targets the evaluation of the proposed SIDA framework on
Ipos. To validate the effectiveness and robustness of the SIDA framework, we compared its performance with a baseline, i.e., the NN that achieved the lowest RMSEs in
Section 4.2.1. In domain adaptation, for both males and females, the SIDA framework and the NN were trained using all available no-alcohol data, then updated by a portion of the after-alcohol data. The remaining after-alcohol data were used to test the estimation performance. Therefore, to further understand how the amount of after-alcohol training data impacts the performances of the NN and the SIDA framework, the after-alcohol dataset was split in 4 ways: the 0:10 ratio (0% available for SIDA phase II training, and thus did not apply to SIDA; 100% for testing,), the 2:8 ratio (20% for SIDA phase II training and 80% for testing), the 5:5 ratio (50% for SIDA phase II training and 50% for testing), and the 8:2 ratio (80% for SIDA phase II training and 20% for testing). For each ratio, the after-alcohol dataset was split randomly 15 times for statistical analysis.
Figure 12 illustrates performance comparisons of I
pos between the NN and the SIDA framework on different ratio sets. The SIDA framework achieved lower RMSEs than the NN for both females (a) and males (b), where such differences were significant in males. In addition, a decreasing trend was observed for both the NN and the SIDA framework as the ratio increased. This was expected as the more after-alcohol data was used to update the model, the better the model could catch the alcohol-induced information.
5 Conclusion and Discussion
This study explored the feasibility of estimating interaction intensities using CV-based non-verbal behavioral features in the context of acute alcohol consumption in intimate couples. Results demonstrated that by fusing multiple cues, common regression models estimated interaction intensities (from 0 to 1) with an average error below 0.16. NN could achieve an average error of around 0.11, which might be sufficient for real-world applications that do not require very high precision, such as tracking daily interaction intensity change. Moreover, results showed that the errors in the after-alcohol state were higher than those in the no-alcohol state, indicating alcohol consumption might have increased data complexity. Furthermore, as expected, the cross-state estimation results confirmed that models trained by data collected in a particular state performed the best in the estimation within the same state. Using an NN model trained by no-alcohol data to estimate intensities in the after-alcohol state increased the error significantly. Finally, the SIDA framework demonstrated the feasibility of using domain adaptation to improve estimation performance in the after-alcohol state without the need to collect large datasets. This method helps relieve data collection burdens in high-risk states by leveraging knowledge learned from low-risk data.
Traditional methods to investigate the impact of alcohol on human behaviors, such as interviews and questionnaires, are time- and labor-consuming. The current work was a step toward machine learning-assisted automatic analysis. Note that the machine learning models do not replicate any existing human annotation protocols, and thus, we do not claim the proposed method as a replacement for existing well-validated human annotation protocols/systems (e.g., RMICS [
30]). Instead, our method provides a complementary component that can be used in coordination with the traditional/classic methods to pursue a more comprehensive understanding of the acute behavioral effects of alcohol consumption.
The proposed method leverages state-of-the-art CV technologies to help eliminate the need for time-consuming manual data annotation/analysis, which is expensive and limited to the availability of qualified video coders. Thus, the proposed method helps resolve a barrier that has limited the use of experimental procedures to understand the relationship between acute alcohol consumption and behavior change. CV-based methods can extract human behaviors at a much higher resolution and precision compared to traditional human annotation, offering an angle that may not be captured by human observations.
In summary, this study has four major contributions: (1) the design and quantification of new CV-based non-verbal behavioral features; (2) the exploration of machine learning-based regression models to reveal the association between non-verbal behavioral features and interaction positivity and negativity; (3) the quantification of how alcohol consumption impacts regression performance; (4) the SIDA framework to enable improved estimation in the after-alcohol state using limited new training data.
Meanwhile, there are limitations of the current study that should be pointed out with corresponding future works. First, as a starting point to demonstrate the feasibility of interaction intensity estimation using regression, this work tested mainly classic regression models. In the future, more advanced regression/deep learning models can be designed to target particular characteristics, such as the non-normal feature distributions and automatic feature selections, in alcohol consumption data. Second, advanced CV technologies have the potential to detect more complicated and higher-level interaction cues, such as body language, than the currently proposed basic measurements. These may serve as more deterministic markers of positive and negative interaction intensities and thus offer better estimation accuracies. Third, the proposed SIDA framework presented an idea of fusing Sta-Ind features with Gen-Beh features that are independent of the state. This article shows a basic implementation of the idea, but the SIDA framework is not bounded by this particular implementation and alcohol consumption context. Therefore, it is worthwhile to explore improvements and extend the application of SIDA in the future. Finally, any interpretation of this study was based on the CCD dataset. While it is one of the largest video datasets to study the effect of alcohol on intimate couples, it has limitations, such as low video resolution and the angled camera positioning that caused minor face occlusions. In the future, estimation models are better to be trained and validated using more high-quality data samples if available. Results showed that the estimation was more accurate when participants were sober than after drinking. This indicated the importance of investigating the corresponding biological and psychological mechanism, which may guide more effective design of behavioral measurements and estimation algorithms.
Nevertheless, to the best of our knowledge, this study was one of the first works that leveraged machine learning models to estimate positive and negative interaction intensities using CV-based non-verbal behavioral cues. This work provided important technical and practical references for future automatic alcohol consumption behavior analysis. Relationship discord and conflict exacerbate behavioral and physical health conditions, particularly among those already at high risk due to problematic alcohol use [
16]. Therefore, CV- and machine learning-aided behavior analysis, piloted by the current research, pointed out future ways to help inform individualized treatment recommendations, reducing client burden, for first-line cognitive-behavioral interventions that target abstinent or alcohol-involved individuals or couples to mitigate the effects of conflict on health [
17].