[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Health Status Prediction with Local-Global Heterogeneous Behavior Graph

Published: 12 November 2021 Publication History

Abstract

Health management is getting increasing attention all over the world. However, existing health management mainly relies on hospital examination and treatment, which are complicated and untimely. The emergence of mobile devices provides the possibility to manage people’s health status in a convenient and instant way. Estimation of health status can be achieved with various kinds of data streams continuously collected from wearable sensors. However, these data streams are multi-source and heterogeneous, containing complex temporal structures with local contextual and global temporal aspects, which makes the feature learning and data joint utilization challenging. We propose to model the behavior-related multi-source data streams with a local-global graph, which contains multiple local context sub-graphs to learn short-term local context information with heterogeneous graph neural networks and a global temporal sub-graph to learn long-term dependency with self-attention networks. Then health status is predicted based on the structure-aware representation learned from the local-global behavior graph. We take experiments on the StudentLife dataset, and extensive results demonstrate the effectiveness of our proposed model.

1 Introduction

Health is not only the basic guarantee of human happiness and well-being but also the foundation of economic progress. Keeping in good health needs reasonable health management, which has attracted increasing attention from governments and companies all over the world. Existing health management mainly relies on medical examination and specialized patient treatment in hospitals. However, many citizens usually do not consider going to the hospital for a check-up until they have abnormal physical symptoms. Moreover, regular or irregular medical examination with the professional medical equipment in hospitals can only get some discrete measurements of individuals’ health status at a specific moment. Both reasons bring a lot of challenges for early detection and prevention of diseases, which are the key components of effective health management.
According to a study of the World Health Organization (WHO), personal behaviors and lifestyles account for 60% of factors affecting human health [1]. For example, non-smokers with defective genes are much less likely to suffer from lung disease than regular smokers [2]. A healthy diet and adherence to appropriate exercise can greatly reduce the incidence of diabetes and cardiovascular diseases [3]. Therefore, the real-time and continuous analysis and monitoring of personal behavior and health status are helpful for individuals to enhance self-active health awareness and learn disease prevention knowledge, thus improving health management capabilities [47].
The rapid development of smart portable and wearable devices has promoted the widespread use of various low-power sensors, and the advent of the 5G era has made it possible to collect individual health data streams of multiple sensors in real time. As long as people carry their devices, all their daily routines, diets, and activities are recorded automatically and instantly without extra effort. For example, the daily routine can be recorded by a GPS sensor, a picture showing people’s diet can be captured using a phone camera, and the activity information can be recognized using an accelerometer sensor. These real-time data streams can be transferred to a back-end system for behavior analysis and health status estimation and finally help people improve their health. Compared with the patient records in the hospital, these kinds of data streams not only provide long-term signals to fully describe the individual’s daily behavior and lifestyle but also support continuous data transmission and analysis without interfering with individuals’ daily lives and work.
The health data streams collected from various sensors are multi-source and heterogeneous. On the one hand, the sampling frequency of each sensor is different, which makes the co-processing difficult. On the other hand, the collected sensor data are multimodal (e.g., pictures or videos from cameras, motion signals from accelerometers and gyroscopes, and location coordinates from GPS sensors). Though multi-source sensors can provide complementary information, the feature learning and joint utilization of the multimodal data streams remain a challenge for health status estimation.
There have been a bunch of health status prediction methods based on mobile devices. A few methods rely on non-parametric methods (e.g., K-Means, Mean Shift) with the single-source data as input, such as acceleration signals [51], camera data in smartphones [7], and sound from microphones [50]. Other methods are devoted to taking advantage of information from various sources [41, 46]. However, most of the existing methods do not make full use of the structure information of the multi-source data streams, which affect the further improvement of performance. Practically, the multi-source and heterogeneous health data stream mainly reflects the individual behaviors and lifestyles that contain complex temporal structures with local contextual and global temporal aspects. Local context refers to the behavior in the short term, such as the activities and routines in 1 day. Detailed behavior information such as the activity sequence and location transfer should be considered in local context to get the principal characteristic of the daily behavior. For the global temporal aspect, the temporal dependency among local contexts needs to be captured for representing the long-term comprehensive description of individual behaviors.
Recently, Graph Neural Networks (GNNs) [22, 69, 72] have drawn great attention in modeling interactions in structural data. Taking a graph as input, GNNs propagate messages between nodes according to the edges, and thus learn the representations for both nodes and edges. Most GNNs work on the homogeneous cases where the nodes in a graph belong to one type [6, 26, 52]. Heterogeneous graph neural network [62], a special case for GNN, is devoted to solving the other situation where nodes are of different types. It has been successfully applied in [14, 18, 67], where highly competitive performances are obtained. Inspired by its development, it is promising to model the intra-modality structure and inter-modality interaction of the multi-source and heterogeneous health data with the heterogeneous graph neural networks.
In this article, we propose to predict daily mental health status based on the multi-source wearable sensor data. We build a local-global individual behavior graph (LGIBG) based on the heterogeneous data and then predict the daily health status with the help of heterogeneous graph neural networks. Specifically, we take three kinds of sensor data streams (accelerometer, audio, WiFi) as input and detect middle-level behavior-related concepts (i.e., walking, running, silence) with pretrained backbone models. These concepts are further used to build the local-global individual behavior graph, which consists of multiple local context sub-graphs and a global temporal sub-graph. The local context sub-graphs are created with the concepts detected from daily data streams as heterogeneous nodes that are connected with homogeneous and heterogeneous edges. Next, a densely connected global temporal sub-graph is created on top of the local context sub-graphs. Then we take advantage of the heterogeneous neural network to learn the features of local context sub-graphs and get both the semantic and structural representations. The representation of the global temporal sub-graph is learned with a self-attention network, and it is finally used to predict the health status.
In summary, the contributions of our work are threefold: (1) To effectively represent the behavior-related multi-source data collected from wearable sensors, we build a local-global graph that consists of multiple local context sub-graphs and a global temporal sub-graph. The local-global graph can well describe the short-term context information of individual behaviors and their long-term temporal dependencies. (2) We learn the short-term semantic and structural representations from local context sub-graphs with heterogeneous graph neural networks and the long-term representation from the global temporal sub-graph with self-attention networks. (3) We demonstrate the effectiveness of the proposed method in health prediction on the public dataset Studentlife.

2 Related Work

2.1 Health Status Prediction

Our task is to predict the health status based on personal behaviors with the multi-source sensor data collected in daily life. This task has important medical implications because it can provide early prevention of diseases and complement the clinical treatment in hospital. As we know, most existing health prediction methods can be divided into two categories: Electronic Health Record (EHR)-based methods [45, 48] and mobile sensor-data-based methods [9, 11]. The EHR-based methods are more relevant to medical studies since they use the record data collected during the hospital treatment process with professional medical equipment. However, people have little EHR data before they are detected with diseases, and thus this kind of method, however good, can only provide a piece of the puzzle. More kinds of data about the early-life experiences of patients should also be considered in treatment, which is actually what the second kind of method does. Sensor-data-based methods focus on the personal behavior in daily life and take advantage of mobile devices, which are more convenient to monitor people’s health and also provide additional useful information for clinical treatment. Below we will introduce these two categories in detail.
EHR data are collected during the hospital treatment process and contain nearly all the information of patients, such as diagnoses, medication prescriptions, and clinical notes. In early stages, expert-defined rules are adopted to identify disease based on the EHR data, such as type 2 diabetes [35] and cataracts [53]. Much work based on EHR data has been done with deep learning models, with the disease classification task most commonly. Cheng et al. [15] and Acharya et al. [4] train a CNN model to classify the normal, preictal, and seizure EEG signals. Che et al. [13] propose a multi-class classification task to predict the different stages of Parkinson’s disease with Recurrent Neural Network (RNN). Kam and Kim [34] do binary classification of sepsis by regarding the EHR data as input to a long short-term memory network (LSTM). In addition to the disease classification, the future event prediction is another task that has attracted much attention recently, which aims to predict future medical events according to the historical records. For example, Futoma et al. [20] and Rajkomar et al. [44] use the EHR data from the hospital to predict events such as mortality, readmission, length of stay, and discharge diagnoses with deep feed-forward networks and LSTMs. There has been much work on mental health prediction based on EHR data, among which T1-weighted imaging [45, 48] and functional magnetic resonance imaging (fMRI) [19, 28] are the most commonly used data to study brain structure, with other physiological signals such as electroencephalograms also playing an important role. Recently, machine learning has received more attention for its effect on improving the management of mental health. Costafreda et al. [16] do a depression classification task with SVM using the smoothed gray matter voxel-based intensity values. Rosa et al. [58] propose a sparse L1-norm SVM to predict depression with the feature of region-based functional connectivity. Cai et al. [10] collect the electroencephalogram (EEG) signals of participants and use four classification methods (SVM, KNN, DT, and ANN) to distinguish the depressed participants from normal controls.
Mobile devices provide another way for health status prediction, where diverse sensors can be used to catch various signals of people, thus making it easier to monitor daily behavior and predict health status. Machado et al. [43] calculate the signal magnitude area of the acceleration signal to recognize activities with several cluster algorithms (e.g., K-Means, Mean Shift). Koenig et al. [37] and Banhalmi et al. [7] use the camera in smartphones to monitor heart rate (HR) and heart rate variability (HRV), which are vital signs of cardiovascular health. Stafford et al. [61] and Goel et al. [27] detect the sound of breathing and cough by microphone in smartphones for assessing pulmonary health in a quick and efficient way. Besides the use of data from only one source, many researchers are devoted to taking advantage of information from various sources [31, 56, 71]. Asselbergs et al. [5] integrate accelerometer data, call history, and the short message service pattern to predict mood. Burns et al. [9] predict depression based on GPS, accelerometer, and light sensor data from smartphones. Nag et al. [49] estimate heart health status by combining sensor data from wearable devices and other factors, such as inherent genetic traits, circadian rhythm, and living environmental risks analyzed from cross-modal data, which provides better personalized health insight.
However, all of the above work cannot well explore local and global temporal characteristics of the daily behavior based on multi-source wearable sensors.

2.2 Graph Neural Network

Recently, the emergence of structural data, especially structured graphs, has promoted the development of GNNs [23, 69, 72]. As the early work of GNNs, recurrent graph neural networks (RecGNNs) [21, 59] apply recurrent architectures to learn the node representation, where message passing is done constantly with nodes’ neighborhoods until the node representations are stable. Inspired by the success of Convolutional Neural Networks (CNNs), the convolution operation is also introduced to graph data in both spectural [17, 29, 36] and spatial ways [6, 24, 25]. The spectral approaches adapt the spectral graph theory to design a graph convolution. The spatial approaches inherit the message passing idea in RecGNNs but are different in getting node representations by stacking multiple convolutional layers. Besides RecGNNs and ConvGNNs, many other graph architectures have been developed to cope with different scenarios. For example, graph autoencoders (GAEs) [12, 65] are used to learn the graph embedding by reconstructing the structural information such as the adjacency matrix of graph. Spatial-temporal graph neural networks (STGNNs) [32, 42, 60] aim to model both the spatial and temporal dependency of data and learn the representation of the spatial-temporal graph, which have advantages in the related tasks, such as human action recognition.
Most of the existing GNNs focus on homogeneous graphs where nodes are in the same type and can be calculated in the same way. In comparison, heterogeneous graphs contain diverse types of nodes and edges, leading to a more complicated situation in calculation. On the one hand, different types of nodes may have different semantic meanings and different feature spaces. On the other hand, the heterogeneous graph represents both homogeneous and heterogeneous relations of data. Recently, some work has been done on heterogeneous graphs. Dong et al. [18] propose a path2vec method to learn heterogeneous graph embeddings with a meta-path-based random walk. Chen et al. [14] process different kinds of nodes with several projection matrices used to embed all the nodes into a same space and then do link prediction. Wang et al. [67] further introduce hierarchical attention to heterogeneous graphs to learn attentions for both nodes and meta-paths. Until now, the application of heterogeneous GNNs to individual behavior analysis and health status prediction is yet to be explored.

3 Methods

3.1 Framework Overview

The individual behavior refers to the way that a person lives. Our purpose is to model the individual behavior in a period based on multi-modal data streams collected by wearable devices and then learn effective representations to predict the health status.
As shown in Figure 1, we take multi-source data streams as input and detect behavior-related middle-level concept sequences with pre-trained backbone models. Then, the behavior-related middle-level concept sequences are used to build the behavior graph, which consists of multiple local context sub-graphs and a global temporal sub-graph. Specifically, the concepts are regarded as different types of nodes to build local context sub-graphs. Each local context graph is regarded as a node in the global temporal sub-graph to catch temporal dependency. The representations of the local context sub-graph and global temporal sub-graph are learned by local context modeling and global temporal relation modeling, based on which the final representation of the behavior graph is learned and used to predict the health status.
Fig. 1.
Fig. 1. Overview of our framework. We take multi-source data streams as input and detect concept sequences to build a behavior graph that consists of a local context sub-graph and global temporal sub-graph.

3.2 Behavior-related Concept Detection

We take three kinds of data streams (i.e., accelerometer, microphone, and WiFi) as input and detect behavior-related middle-level concept sequences with three pre-trained backbone models (i.e., activity, audio, location detectors), where each concept sequence and the denotes the specific concept class (e.g., walking detected by the activity detector). Meanwhile, we also obtain timestamp sequences , where and each timestamp is a 2-dimensional vector that represents the start and end time of the detected concept class in the corresponding data stream. More details of the pre-trained backbone models are introduced in Section 4.2, and the notations and their corresponding explanations are shown in Table 1.
Table 1.
NotationExplanation
concept class for type k at the tth moment
a 2D vector of start and end time of
the attribute of ith node from type k.
the embedding of ith node from type k
the weight of edge between node pair i,j
the embedding of edge between node pair i,j
Table 1. Notations and Explanations

3.3 Behavior Graph Building

To capture the temporal structure of the individual behavior, we need to build a behavior graph that contains both the local context information and the long-term relationships from the multimodal data stream. However, a huge densely connected network may increase the computation complexity and impact the performance. For this reason, we decompose the whole graph into two kinds of sub-graphs: local context graphs to explore the local information of individual behaviors in the short term, and a global temporal graph to capture temporal dependency in the long term. The local context graphs are regarded as nodes of the global temporal graph.

3.3.1 Local Context Sub-Graph.

The local context graph is built based on the daily data streams, which could reflect individual behaviors (e.g., activities, audio, locations) from various aspects. Taking these individual behaviors into consideration, the local context graph is actually a heterogeneous graph. As illustrated in Section 3.2, we have detected three types of concept sequences from the data streams. In the following, we explicitly use activity, audio, and location to denote the type names. For the tth time step, we use a sliding time window of 1 day to crop out three concept sub-sequences from the original concept sequences. Each concept sub-sequence represents several consecutive concepts detected in 1 day. Here the denotes the size of the sliding time window that represents the number of timestamps contained in 1 day for the kth type of the concept sequence. Accordingly, we can obtain the timestamp sub-sequences , where . Next, we will introduce how to create the local context graph based on and .
For the tth time step, the local context graph can be formally defined as , where , representing different types of nodes. Each type corresponds to a specific aspect to describe individual behavior. is the set of edges containing both the homogeneous edges, which connect two nodes in the same type, and the heterogeneous edges, which connect two nodes in different types.
The nodes of the local context graph are composed of all the concept classes in . Since there are three types of concept classes, the local context graph has three different node types and thus is a heterogeneous graph. Each node has an attribute and an embedding representation, noted as and for the ith node of type k. For the attribute of the node with concept class , we use its corresponding timestamp to compute the time interval that represents the duration of the concept-related behavior. For the node embedding , we introduce external semantic knowledge to help its learning. Specifically, we extract Glove embeddings [54] corresponding to the concept class name of each node, which are pre-trained on Wikipedia according to the word-by-word co-occurrence. The embeddings are proved effective in capturing semantic meanings of words in many NLP tasks. Therefore, these word embeddings would provide a reasonable representation for nodes at first.
As for edges, homogeneous edges and heterogeneous edges are considered in different ways. Two nodes with the same type k are connected with homogeneous edges if they are temporal neighborhoods in the concept sequence . For example, a node with concept class from the type loation is connected to another node with concept class from the same type if an individual moves to the library from the dormitory with continuous timestamps in reality. The weight of the homogeneous edge is noted as the frequency of two nodes being a neighborhood in the concept sequence. With the edge weight, the homogeneous edge captures specific patterns of an individual’s behavior change information. For heterogeneous nodes (e.g., , , ), we connect them according to their co-occurrences in time, i.e., . For example, a node with concept class from the type audio and a node with concept class from the type location are connected when they are detected from the data streams at the same timestamp. The co-occurrences in time reflect the interactions of heterogenous nodes, which could describe the individual behaviors from more aspects. We do not connect heterogeneous nodes that are temporal neighboring in the concept sequence, because connecting different types of nodes according to their temporal relations has no practical significance. The weight of the heterogeneous edge is the frequency of co-occurrences in time. Note that the weights of both homogeneous edges and heterogeneous edges are written as with as node indexes. Whether represents a homogeneous edge or a heterogeneous edge depends on specific types of node i and node j.

3.3.2 Global Temporal Sub-Graph.

The global temporal sub-graph models the long-term time dependency of the daily information and gets the global information for the whole period, which is used to predict the health status. Formally, the global temporal graph is denoted as , where nodes in refer to local context graphs introduced in Section 3.3.1, and represents interactions between any two local context graphs.

3.4 Local Context Graph Modeling

As introduced in Section 3.3.1, we have created a heterogeneous local context graph at each time step of multi-source data streams. Now we will introduce how to capture local context information of the short-term individual behavior with the heterogeneous graph neural network. The network contains m layers of node message passing modules and edge embedding learning modules. Here we only introduce these two kinds of modules for one layer. As shown in Figure 2, the node message passing module is used to learn the node embeddings and graph semantic representation, while the edge embedding learning module is used to learn the edge embeddings and graph structural representation. Then the final representation of the local context graph is obtained with the combination of the semantic and the structural representation. It is worth noting that all local context graphs share the same network parameters.
Fig. 2.
Fig. 2. Local context graph modeling.

3.4.1 Node Message Passing.

We consider the node message passing process in two ways: homogeneous message passing through homogeneous edges and heterogeneous message passing through heterogeneous edges. At first, we multiply each node embedding with its attribute , which reveals the node importance in time. For simplicity, the representation of each node is still denoted by .
Homogeneous message passing aims to learn information from the same type of nodes according to their edges. For the ith node of type k, its message passing process is done as below:
(1)
where and are learnable matrices. The i and j are node indexes, and is the normalized value of the homogeneous edge weight between the ith node and the jth node defined in Section 3.3.1. The calculations for other types of nodes are in the same way with different projection matrices. By this means, each node gets information from its homogeneous neighborhoods according to their connections.
Heterogeneous message passing manages to capture additional semantic meanings from other types of nodes and thus learns a comprehensive representation for individual behaviors. Since each node has more than one type of heterogeneous neighbor, we add embeddings learned from all types of heterogeneous neighbors to do the message passing, shown as follows:
(2)
where and are learnable matrices. The k and are node type indexes and . The i and j are node indexes. The is the normalized value of the heterogeneous edge weight defined in Section 3.3.1.

3.4.2 Edge Embedding Learning.

Compared with the node embeddings, edge embeddings reveal much more structure information of the graph. For our heterogenous graph, edges are also in different types since they connect different types of nodes. Considering that nodes have been embedded in a common semantic space by the node message passing module, we directly concatenate them and use a projection to extract the edge embedding:
(3)
where k and are node type indexes and means concatenation. The is a learnable matrix. If , is the embedding of the homogeneous edge, otherwise the heterogeneous edge.

3.4.3 Local Context Graph Representations.

For each local context graph, we learn two kinds of representations to capture the short-term behavior information: a semantic representation that reflects semantic meanings and a structural representation that catches information of the graph structure.
We obtain the semantic representation of the local context graph by combining the embeddings of all types of nodes learned with the node message passing module. Although the attribute of a node defined in Section 3.3.1 could reflect its importance, the semantic meaning of the node should also be considered. Because a concept appearing a few times in the concept sequence may contain important factors for the health status prediction, we take advantage of the soft-attention mechanism to determine the importance of different nodes and combine node embeddings to get the semantic representation:
(4)
(5)
where is the semantic representation for the local context graph, is the relevant importance given to each node when blending all nodes together, and q is a trainable vector used as query. The reason for using softmax in Equation (5) mainly lies in two points: (1) The softmax is differentiable and thus can be easily integrated in the graph neural networks for end-to-end training, and (2) the output values of the softmax function are in the range of [0,1] with the sum of 1. With the softmax function, can be interpreted as the relevant importance given to node when blending all nodes together.
For the structural representation, since different edges play various roles for the graph structure, we also use the attention mechanism to calculate their correlations and then combine them to get the graph structural representation. Since the semantic representation contains the global information of the local context graph with semantic meanings of all nodes considered, we treat it as a query vector to help learn more effective attentions for combining edge embeddings:
(6)
(7)
where is the embedding for either homogeneous edges or heterogeneous edges, is the relevant importance given to each edge when blending all edges together, and is a projection matrix for semantic representation .
We get the final representation for each local context graph with the concatenation of its semantic and structural representations as .

3.5 Global Temporal Relation Modeling

The self-attention network (SAN) is introduced in Transformer [64] for the first time, which has a sequence-to-sequence architecture and is popularly used in neural machine translation. Taking a token sequence as input, SAN calculates the attention scores between each token and other tokens with multiple attention heads. Then the token embeddings are updated with other token embeddings according to their attention scores. From this perspective, SAN can be regarded as a graph neural network with a token sequence as fully connected nodes, while the multi-head attention mechanism is a special message passing method. Inspired by this, we implement the global temporal relation modeling with the self-attention network.
Specifically, we can get a sequence of all local context graph representations as illustrated in Section 3.4. For the temporal information among different local graphs, we adopt position embeddings in [64] to encode the relative position of each local context graph, noted as . Therefore, the representation for the ith graph is the sum of and . Then the correlations between any two local context graphs can be calculated with the attention scheme:
(8)
where and are learnable matrices.
The attention scores are scaled and normalized with a Softmax function, which are used to get an attended representation for each local context graph :
(9)
(10)
where is a learnable projection matrix, is the dimension of , T is the number of local context graphs, and is the attended representation for the ith local context graph. Finally, we can get the structure-aware representation of the global temporal graph by adding the attended representations of all local context graphs.

3.6 Objective Function

The final loss function is written as the sum of a classification loss and a node variance constraint: where is a trade-off parameter.
Classification Loss: With the representation of the global temporal graph, we predict the health status label y by a fully connected layer with Softmax activation. Then we calculate the cross-entropy loss:
(11)
where N is the number of instances used in the training process and is the ground-truth label of the health status.
Node Variance Loss: It is worth noting that many GNNs will face the problem of node homogenization after several epoches of node message passing since all nodes exchange information with their neighbors. In the local context graph modeling illustrated in Section 3.4, the message passing is applied not only on homogeneous nodes but also on heterogeneous nodes, which may make the representations of nodes similar. To alleviate this problem, we add a constraint to the node representations to control the variance of all nodes. Specifically, we concatenate all the node embeddings into a matrix noted as , where N is the number of nodes of all types and d is the dimension of the node embedding. Then we calculate the variance and get a vector , where each element represents the variance of the corresponding dimension in E. Finally, the node variance loss is defined as the average of the elements in the variance vector followed by a sigmoid function:
(12)

4 Experiments

4.1 Dataset

We evaluate the performance of our method on the StudentLife dataset [66], which is collected from 48 Dartmouth students over a 10-week term. It contains sensor data, Ecological Momentary Assessment (EMA) data (i.e. stress), pre- and post-survey responses (i.e., PHQ-9 depression scale), and educational data (GPA). During the 10 weeks, students carry their phones throughout the day. Data streams from multiple sensors, including accelerometer, microphone, GPS, Bluetooth, light sensor, phone charge, phone lock, and WiFi, are collected in real time by the mobile phone. Besides, students are asked to respond to various EMA questions and surveys, which are provided by psychologists to measure their mental health status. Educational performance data, such as the grades, are also collected.
In our experiment, we use data streams collected by three representative sensors (i.e., accelerometer, microphone, WiFi) as the input, since these three kinds of sensor data contain the most useful information to reflect the individual behaviors. The reason for collecting location information with WiFi instead of GPS is that most students’ activities are in an indoor environment, where the college’s WiFi AP deployment is more effective to accurately infer the location information than the GPS.
For the ground-truth annotations of the multi-source data streams, we use photographic affect meter (PAM) [55] values in EMA data. The PAM value practically represents a score between 1 and 16, which represents the Positive and Negative Affect Schedule (PANAS) [68] and reflects the instantaneous mental health status of users. The annotations are collected by a mobile application that captures users’ feelings according to users’ preference of specific photos. To keep with the conceptualization of PANAS, which ranges from low pleasure and low arousal to high pleasure and high arousal, the PAM score is further divided into four quadrants: negative valence and low arousal with score 1 to 4, negative valence and high arousal with score 5 to 8, positive valence and low arousal with score 9 to 12, and positive valence and high arousal with score 13 to 16. Following [55], we map the PAM value into the above four classes.
Finally, we use data streams of 30 students who have valid PAM annotations. Specifically, we get 912 samples in total, each sample consists of 3-day multi-modal data streams collected by the sensors of accelerometer, microphone, and WiFi. For each sample, the instantaneous PAM label of the last day is regarded as the ground-truth label of the whole data streams. The sample number for each student and the sample distribution on four classes are shown in Figure 3. Up to now, the mental health status prediction task in our experiment is practically a four-class classification problem. For the training and test, we split our datasets into 10 splits and build 10 tasks on them. Each task takes nine splits for training and the remaining one for test. We also show the average results of all 10 tasks.
Fig. 3.
Fig. 3. Dataset information. (a) The number of samples for each student. (b) The distribution over four classes.
Discussion on data selection. In this article, our aim is to model the personal behavior and predict the health status based on the multi-source sensor data collected from people’s daily lives. To the best of our knowledge, StudentLife is the only public dataset that satisfies our requirements. On the one hand, the data should be collected from both healthy and unhealthy people. On the other hand, the sensor data we take advantage of should be continuous and long term; i.e., the data are recorded as long as the people carry their wearable devices, such as mobile phones and smart wristbands. Some work has been done on this task. For example, [11] uses GPS data to predict depression with an SVM classifier. [9] predicts depression based on GPS, accelerometer, and light sensor data from smartphones. However, all these works do not release their data. Although some medical datasets, such as MIMIC-III [33] and ANDI [57], have been widely used to predict diseases [13] or other medical related events [20], they do not satisfy the need of our task. First, these medical datasets pay more attention to disease analysis, where the studied people are mainly patients. By contrast, our work focuses on personal health, where daily behaviors are considered for both healthy and unhealthy people. Second, these medical datasets are discretely collected from patients during the hospital treatment process with professional medical equipment, such as medical imaging, while the long-term sensor data we use is continuously collected from daily life.

4.2 Implementation Details

As introduced in Section 3, to create the behavior graph that reflects the structure information contained in the multi-source data streams, we first need to detect the behavior-related concepts contained in the data streams. Here, three backbone models proposed in [39] are adopted to get the middle-level semantic concepts from raw sensor data. For the accelerometer, a decision tree model for activity classification is used with the features extracted from the accelerometer stream to infer the concept class (i.e., stationary, walking, running, and unknown). For the microphone, the audio data are classified into concept classes (i.e., silence, voice, noise, and other) with an HMM model. As for the WiFi, students’ WiFi scan logs are first recorded, and then the location concept classes, such as in[dana-library], are inferred according to the WiFi AP deployment information, which results in a number of 9,037 classes. The location classes are in a long-tail distribution with many classes appearing few times, and hence we choose the top 100 most frequent classes, which cover 93% of the location data.
After obtaining the three kinds of detected concepts, we cut the sequences into days and build the local context graph and the global temporal graph to predict mental health status with the method illustrated in Section 3. We use the metrics of accuracy, precision, recall, and F1-score to evaluate our model. Accuracy is the ratio of correctly predicted samples to total predicted samples. Precision, recall, and F1-score are first calculated in each class and then weighted by the sample number of each class.
Discussion on data synchronization. In existing health-related systems and methods for analyzing wearable sensor data, such as the risk situation recognition system [70], synchronization of different sensors is a very important issue. Specifically, the system always contains several devices to collect different kinds of sensor data as well as a smartphone to receive the data from sensors. An algorithm of data synchronization is necessary since the sensors are on different devices and there exist time differences between sensor data generated by sensors and received by smartphones. As for our work, the sensors we use here are all embedded in the same smartphone [66] and have a common reference time naturally and do not have the time error when sending and receiving the data, and thus the synchronization is not essential.

4.3 Compared Methods

Since there are no previous works on PAM prediction on the same dataset, we compare our method with three popularly used conventional machine learning algorithms (i.e., RF [63], KNN [38], and SVM [8]) and two deep learning algorithms (i.e., DNN [40] and LSTM [30]). To apply these baselines on multi-source data streams collected from wearable devices, we compute the behavior feature by extracting a 108-dimensional feature vector to represent the duration time of three kinds of concept classes in 1 day. Specifically, the activity concept takes four elements of the vector, which represent the duration time of stationary, walking, running, unknown in 1 day. The audio concept takes four elements, which represent the duration time of silence, voice, noise, other in 1 day. The location concept takes 100 elements, which represent the duration time of 100 locations student staying in 1 day. The behavior feature contains the principal information of the individual life, such as where did the person go in that day and how long did he or she communicate with other people. We also compare our method with a recent GNN-based method (i.e., HAN [67]) that can capture the structural information of a graph with different kinds of nodes. Details of the compared baselines and the variants of our method are illustrated as follows.
RF [63]: This method uses Random Forest to do the classification. Specifically, we concatenate the behavior features of 3 days to get a 324-dimensional feature. Then we input it to the Random Forest.
KNN [38]: This method uses K-nearest neighbor. We first get the 324-dimensional feature of continuous 3 days as in RF. Then we train a K-nearest neighbor based on the feature.
SVM [8]: This method trains SVM to do the classification. Features are obtained as in RF, and then an SVM is trained to predict the health status.
DNN [40]: This method uses a two-layer deep neural network with the input feature computed as in RF. Each layer is fully connected and the hidden size is 50, which is determined by cross-validation.
LSTM [30]: This method uses an LSTM with the hidden size of 100 to capture the temporal information of sequences. Specifically, we transform the 3-day data into a sequence with the length of 3. Each element in the sequence is a 108-dimensional behavior feature. Then the sequence is input into LSTM, and the hidden state at the last step is used to predict the health status.
HAN [67]: This method uses the heterogeneous attention network [67] instead of our local context graph modeling method to learn the node embedding and graph representation. Specifically, we get the meta-path-based neighbors for each kind of node according to our local context graph. Then the node-level attention and semantic-level attention are performed as HAN [67] to get the local context graph representation, which is finally input into the self-attention network to predict the health status.
Ours\he: This variant of the proposed method omits the heterogeneous message passing in Section 3.4.1 while keeping the homogeneous message passing in the heterogeneous graph neural network.
Ours\ho: This variant omits the homogeneous message passing in Section 3.4.1 while keeping the heterogeneous message passing in the heterogeneous graph neural network. It is used to compare with Ours\he to illustrate the impact of the homogeneous and heterogeneous message passing.
Ours\s: In this variant, we omit the semantic representation learned by Equation (4) and only use the structural representation of the local context graph. The structural representation is then input into the self-attention network to get the global temporal graph representation.
Ours\t: In this variant, we omit the structural representation learned by Equation (6) and only use the semantic representation of the local context graph. The semantic representation is then input into the self-attention network to get the global temporal graph representation.
Ours\g: In this variant, representations of all local context graphs are directly added to get the final global temporal graph representation without using the self-attention network.

4.4 Result Analysis

4.4.1 Performance Comparison.

Here we show both the results on 10 tasks in Table 2 and Table 3 and the average results in Figure 4. It can be seen that our model performs better than baselines on all metrics, and most variants of the proposed method also have good results.
Fig. 4.
Fig. 4. The average results on the 10 tasks.
Table 2.
Table 2. PAM Prediction Results
Table 3.
Table 3. Ablation Study Results
Compared with the traditional machine learning methods, all the deep-learning-based methods perform better. Baseline-LSTM has better results than Baseline-DNN since it takes the temporal information into consideration. The HAN updates the node embedding and graph representation in a meta-path way with several kinds of adjacent matrix. However, this method does not consider the direct connection between heterogeneous nodes, which ignores the semantic interaction between different kinds of nodes and thus performs worse than our method.
As for the ablation study, it can be concluded that each module in our framework plays a significant role in the performance improvement. By comparing the full model and Ours\he as well as Ours\ho, we note that the full model performs better than both of the two variants, which proves the assumption that the message passing module has positive effects on the node embedding learning, and the homogeneous edges and heterogeneous edges succeed in building the homogeneous and heterogeneous node structures. When comparing the performances of the homogeneous message passing Ours\he and the heterogeneous message passing Ours\ho, it can be seen that the heterogeneous message passing performs better than the homogeneous message passing, which may be because the heterogeneous edges help learn more comprehensive embeddings by getting additional information from other types of nodes.
As for the comparison between Ours\t and Ours\s, which use semantic representations and structural representations, respectively, it can be noted that both the semantic representation and the structural representation benefit the prediction process, meaning that these two kinds of representations reveal different aspects of the graph information. The semantic representations perform better than structural representations, which may be because the semantic representation provides more global information of the graph. When we omit the global temporal graph while only using the representation of the local context graph to predict PAM in Ours\g, the performance drops a little, meaning that not only the local context information has a positive effect on individuals’ health status, but also the long-term temporal structure of behaviors makes a difference.
In Table 2, Table 3, and Figure 4, it is worth noting that the results of all methods are below 50%, which demonstrates that it is an extremely challenging task to predict mental health status based on personal behaviors in daily life, especially with limited samples. However, the average accuracy improvement of our method accounts for about 5% of the result of the second best method, HAN, as shown in Figure 4(a), which still shows the advantage of the proposed method. We believe that our model will get better performances on larger-scale datasets.

4.4.2 Parameter Analysis.

Here we investigate the influence of two important hyper-parameters m and dp, which represent the number of layers in the local context graph and the dimension of the final local context graph representation, respectively. We vary m from 1 to 5 and keep other settings fixed; the results on PAM prediction are shown in Figure 5. It can be noted that the performance is improved first with the increase of m. However, when the layer number keeps increasing, the performance drops, because too many layers could make the node embedding less discriminative. As for dp, we vary it from 16 to 256 while keeping other settings fixed. The results are shown in Figure 5. We can see that the best performance is achieved in 64, since the low dimension makes it hard to get useful information, while the high dimension is difficult to train with limited instances.
Fig. 5.
Fig. 5. Parameter analysis of m, dp.

4.4.3 Visualization.

Here we analyze the attention in learning representations from local context graphs and the global temporal graph to figure out the factors that influence mental health. Figure 6 shows the example of an individual’s 3-day data streams. The PAM is predicted with the global representation, which is learned based on representations of three local context graphs.
Fig. 6.
Fig. 6. Visualization of local context graphs and the global temporal graph.
The attention score between two different local context graphs is computed with Equation (9). We show the attention score between any two local graphs by the darkness of the corresponding line. The attention of each graph is represented by the darkness of the text box. We can see that for each local context graph, the attention to itself tends to play a major role, though each one pays attention to all other ones. Besides, it can be seen that the third local context graph has much influence on all three.
We further visualize the concept sequences in the third day, which are used to build the local context graph in that day. In the concept sequences, each text box represents a concept, and different concept sequences are shown in different colors. The importance of each concept is calculated with the attention mechanism in Equation (5) and represented by the darkness of the text box. It is noted that the concepts of stationary, silence, and classroom play a major role in learning the representation from the local context graph. Homogeneous edges that represent the behavior transfers are shown as arrows, while heterogeneous edges that represent behavior co-occurrences are shown as dotted lines. The importance of the edge is also indicated by the darkness. We notice that in general, heterogeneous edges receive more attention than homogeneous edges, although their occurrences are few, which demonstrates the importance of combination of different concepts and validates that our attention mechanism successfully finds useful patterns from the multi-source data streams. Besides, the edges, which have nodes from the location type, tend to get more attention than the ones that have nodes from the activity and audio types, because locations reveal much information about daily lives.

4.5 Application on Grade Prediction

To better illustrate the effectiveness of our model for learning representations from behavior-related data streams, we propose to apply our model on the grade prediction task. The grade annotation is the GPA, which indicates a student’s overall long-term academic performance in a range of 0 to 4.
First, we take our model, which is well trained on the health status prediction task, to extract the global representation for each student’s data stream and use KNN to do the grade prediction. Then we compare our model with a baseline, which takes the 108-dimensional feature introduced in Section 4.3 as the representation for each day, and add them to get the final feature. The same KNN is used to do the grade prediction.
We use the mean absolute errors (MAEs), the coefficient of determination (R2), and Pearson correlation as evaluation metrics for the grade prediction. We adopt the leave-one-out way to evaluate the performance. The average results are shown in Table 4.
Table 4.
MethodGrade Prediction
MAER2Pearson
Hand-crafted feature + KNN0.2960.100.32
Graph representation + KNN0.1950.210.51
Table 4. Grade Prediction Results
As shown, our model outperforms the baseline on three metrics, which demonstrates that our model is effective in learning the structure-aware representation of the individual’s long-term behavior. Moreover, our model has good generalization ability and can be used to extract global behavior-related features in different tasks without model fine-tuning.

5 Conclusion

In this article, we propose a local-global graph to model personal behavior and predict daily mental health status based on the multi-source wearable sensor data. The graph contains multiple local context sub-graphs and a global temporal sub-graph to capture the short-term context information and long-term temporal dependencies of individual behaviors, respectively. We learn the semantic representation and structural representation for the local context graph with a heterogeneous neural network. A self-attention network is designed to learn the representation for the global temporal graph, which is finally used to predict the health status. We perform experiments on the public dataset StudentLife and compare our method with popularly used machine learning and deep learning methods. Our method outperforms all existing methods, which validates its effectiveness. In future work, we will integrate more kinds of data streams to improve the local-global individual graph and try to apply our method on larger-scale multi-source sensor datasets for health prediction.

References

[1]
World Health Organization. 2009. https://www.who.int/mediacentre/multimedia/podcasts/2009/lifestyle-interventions-20090109/en/.
[2]
2017. http://www.sdwsnews.com.cn/a/jiankangtuku/2017/1108/15306.html.
[3]
World Health Organization. 2020. https://www.who.int/.
[4]
U. Rajendra Acharya, Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan, and Hojjat Adeli. 2018. Deep convolutional neural network for the automated detection and diagnosis of seizure using EEG signals. Computers in Biology and Medicine 100 (2018), 270–278.
[5]
Joost Asselbergs, Jeroen Ruwaard, Michal Ejdys, Niels Schrader, Marit Sijbrandij, and Heleen Riper. 2016. Mobile phone-based unobtrusive ecological momentary assessment of day-to-day mood: An explorative study. Journal of Medical Internet Research 18, 3 (2016), e72.
[6]
James Atwood and Don Towsley. 2016. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems. 1993–2001.
[7]
András Bánhalmi, János Borbás, Márta Fidrich, Vilmos Bilicki, Zoltán Gingl, and László Rudas. 2018. Analysis of a pulse rate variability measurement using a smartphone camera. Journal of Healthcare Engineering 2018 (2018), 1–15.
[8]
Christopher J. C. Burges. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 2 (1998), 121–167.
[9]
Michelle Nicole Burns, Mark Begale, Jennifer Duffecy, Darren Gergle, Chris J. Karr, Emily Giangrande, and David C. Mohr. 2011. Harnessing context sensing to develop a mobile intervention for depression. Journal of Medical Internet Research 13, 3 (2011), e55.
[10]
Hanshu Cai, Jiashuo Han, Yunfei Chen, Xiaocong Sha, Ziyang Wang, Bin Hu, Jing Yang, Lei Feng, Zhijie Ding, Yiqiang Chen, et al. 2018. A pervasive approach to EEG-based depression detection. Complexity 2018 (2018), 1–13.
[11]
Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: Unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 1293–1304.
[12]
Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep neural networks for learning graph representations. In 30th AAAI Conference on Artificial Intelligence.
[13]
Chao Che, Cao Xiao, Jian Liang, Bo Jin, Jiayu Zho, and Fei Wang. 2017. An RNN architecture with dynamic temporal matching for personalized predictions of Parkinson’s disease. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 198–206.
[14]
Hongxu Chen, Hongzhi Yin, Weiqing Wang, Hao Wang, Quoc Viet Hung Nguyen, and Xue Li. 2018. PME: Projected metric embedding on heterogeneous networks for link prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1177–1186.
[15]
Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. 2016. Risk prediction with electronic health records: A deep learning approach. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 432–440.
[16]
Sergi G. Costafreda, Carlton Chu, John Ashburner, and Cynthia H. Y. Fu. 2009. Prognostic and diagnostic potential of the structural neuroanatomy of depression. PloS One 4, 7 (2009), e6353.
[17]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844–3852.
[18]
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 135–144.
[19]
Cynthia H. Y. Fu, Janaina Mourao-Miranda, Sergi G. Costafreda, Akash Khanna, Andre F. Marquand, Steve C.R. Williams, and Michael J. Brammer. 2008. Pattern classification of sad facial processing: Toward the development of neurobiological markers in depression. Biological Psychiatry 63, 7 (2008), 656–662.
[20]
Joseph Futoma, Jonathan Morris, and Joseph Lucas. 2015. A comparison of models for predicting early hospital readmissions. Journal of Biomedical Informatics 56 (2015), 229–238.
[21]
Claudio Gallicchio and Alessio Micheli. 2010. Graph echo state networks. In The 2010 International Joint Conference on Neural Networks (IJCNN’10). IEEE, 1–8.
[22]
Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2018. Watch, think and attend: End-to-end video classification via dynamic knowledge evolution modeling. In Proceedings of the 26th ACM International Conference on Multimedia. ACM, 690–699.
[23]
Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2019. Graph convolutional tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4649–4659.
[24]
Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2019. I know the relationships: Zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence 33 (2011), 8303–8311.
[25]
Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2020. Learning to model relationships for zero-shot video classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 99 (2020), 1–1.
[26]
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR.org, 1263–1272.
[27]
Mayank Goel, Elliot Saba, Maia Stiber, Eric Whitmire, Josh Fromm, Eric C. Larson, Gaetano Borriello, and Shwetak N. Patel. 2016. Spirocall: Measuring lung function over a phone call. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 5675–5685.
[28]
Tim Hahn, Andre F. Marquand, Ann-Christine Ehlis, Thomas Dresler, Sarah Kittel-Schneider, Tomasz A. Jarczok, Klaus-Peter Lesch, Peter M. Jakob, Janaina Mourao-Miranda, Michael J. Brammer, et al. 2011. Integrating neurobiological markers of depression. Archives of General Psychiatry 68, 4 (2011), 361–368.
[29]
Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015).
[30]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
[31]
Yi Huang, Xiaoshan Yang, Junyu Gao, Jitao Sang, and Changsheng Xu. 2020. Knowledge-driven egocentric multimodal activity recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 4 (2020), 1–133.
[32]
Ashesh Jain, Amir R. Zamir, Silvio Savarese, and Ashutosh Saxena. 2016. Structural-RNN: Deep learning on spatio-temporal graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5308–5317.
[33]
Alistair E. W. Johnson, Tom J. Pollard, Lu Shen, H. Lehman Li-Wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific Data 3, 1 (2016), 1–9.
[34]
Hye Jin Kam and Ha Young Kim. 2017. Learning representations for the early detection of sepsis with deep neural networks. Computers in Biology and Medicine 89 (2017), 248–255.
[35]
Abel N. Kho, M. Geoffrey Hayes, Laura Rasmussen-Torvik, Jennifer A. Pacheco, William K. Thompson, Loren L. Armstrong, Joshua C. Denny, Peggy L. Peissig, Aaron W. Miller, Wei-Qi Wei, et al. 2012. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. Journal of the American Medical Informatics Association 19, 2 (2012), 212–218.
[36]
Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[37]
Nicole Koenig, Andrea Seeck, Jens Eckstein, Andreas Mainka, Thomas Huebner, Andreas Voss, and Stefan Weber. 2016. Validation of a new heart rate measurement algorithm for fingertip recording of video signals with smartphones. Telemedicine and e-Health 22, 8 (2016), 631–636.
[38]
Jorma Laaksonen and Erkki Oja. 2002. Classification with learning k-nearest neighbors. In Proceedings of International Conference on Neural Networks (ICNN’96).
[39]
Nicholas D. Lane, Mashfiqui Mohammod, Mu Lin, Xiaochao Yang, Hong Lu, Shahid Ali, Afsaneh Doryab, Ethan Berke, Tanzeem Choudhury, and Andrew Campbell. 2011. Bewell: A smartphone application to monitor, model and promote wellbeing. In 5th International ICST Conference on Pervasive Computing Technologies for Healthcare. 23–26.
[40]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[41]
Honggui Li and Maria Trocan. 2019. Deep learning of smartphone sensor data for personal health assistance. Microelectronics Journal 88 (2019), 164–172.
[42]
Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2017. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017).
[43]
Inês P. Machado, A. Luisa Gomes, Hugo Gamboa, Vítor Paixão, and Rui M. Costa. 2015. Human activity data discovery from triaxial accelerometer sensor: Non-supervised learning sensitivity to feature extraction parametrization. Information Processing & Management 51, 2 (2015), 204–214.
[44]
A. Rajkomar, E. Oren, K. Chen, et al. 2018. Scalable and accurate deep learning for electronic health records[J]. npj Digital Medicine 1, 1 (2018), 18.
[45]
Callie L. McGrath, Mary E. Kelley, Paul E. Holtzheimer, Boadie W. Dunlop, W. Edward Craighead, Alexandre R. Franco, R. Cameron Craddock, and Helen S. Mayberg. 2013. Toward a neuroimaging treatment selection biomarker for major depressive disorder. JAMA Psychiatry 70, 8 (2013), 821–829.
[46]
Jun-Ki Min, Afsaneh Doryab, Jason Wiese, Shahriyar Amini, John Zimmerman, and Jason I. Hong. 2014. Toss’n’turn: Smartphone as sleep and sleep quality detector. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 477–486.
[47]
Weiqing Min, Bing-Kun Bao, Shuhuan Mei, Yaohui Zhu, Yong Rui, and Shuqiang Jiang. 2017. You are what you eat: Exploring rich recipe information for cross-region food analysis. IEEE Transactions on Multimedia 20, 4 (2017), 950–964.
[48]
Benson Mwangi, Keith Matthews, and J. Douglas Steele. 2012. Prediction of illness severity in patients with major depression using structural MR brain scans. Journal of Magnetic Resonance Imaging 35, 1 (2012), 64–71.
[49]
Nitish Nag, Vaibhav Pandey, Preston J. Putzel, Hari Bhimaraju, Srikanth Krishnan, and Ramesh Jain. 2018. Cross-modal health state estimation. In Proceedings of the 26th ACM International Conference on Multimedia. 1993–2002.
[50]
Rajalakshmi Nandakumar, Shyamnath Gollakota, and Nathaniel Watson. 2015. Contactless sleep apnea detection on smartphones. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services. 45–57.
[51]
Vincenzo Natale, Maciek Drejak, Alex Erbacci, Lorenzo Tonetti, Marco Fabbri, and Monica Martoni. 2012. Monitoring sleep with a smartphone accelerometer. Sleep and Biological Rhythms 10, 4 (2012), 287–292.
[52]
Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In International Conference on Machine Learning. 2014–2023.
[53]
Peggy L. Peissig, Luke V. Rasmussen, Richard L. Berg, James G. Linneman, Catherine A. McCarty, Carol Waudby, Lin Chen, Joshua C. Denny, Russell A. Wilke, Jyotishman Pathak, et al. 2012. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. Journal of the American Medical Informatics Association 19, 2 (2012), 225–234.
[54]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.
[55]
John P. Pollak, Phil Adams, and Geri Gay. 2011. PAM: A photographic affect meter for frequent, in situ measurement of affect. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 725–734.
[56]
Fan Qi, Xiaoshan Yang, and Changsheng Xu. 2020. Emotion knowledge driven video highlight detection. IEEE Transactions on Multimedia (2020), 1–1. DOI:https://doi.org/10.1109/TKDE.2020.2981333
[57]
Perry G. Ridge, Mark E. Wadsworth, Justin B. Miller, Andrew J. Saykin, Robert C. Green, John S. K. Kauwe, Alzheimer’s Disease Neuroimaging Initiative, et al. 2018. Assembly of 809 whole mitochondrial genomes with clinical, imaging, and fluid biomarker phenotyping. Alzheimer’s & Dementia 14, 4 (2018), 514–519.
[58]
Maria J. Rosa, Liana Portugal, Tim Hahn, Andreas J. Fallgatter, Marta I. Garrido, John Shawe-Taylor, and Janaina Mourao-Miranda. 2015. Sparse network-based models for patient classification using fMRI. Neuroimage 105 (2015), 493–506.
[59]
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2008), 61–80.
[60]
Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, and Xavier Bresson. 2018. Structured sequence modeling with graph convolutional recurrent networks. In International Conference on Neural Information Processing. Springer, 362–373.
[61]
Matthew Stafford, Feng Lin, and Wenyao Xu. 2016. Flappy breath: A smartphone-based breath exergame. In 2016 IEEE 1st International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE’16). IEEE, 332–333.
[62]
Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter 14, 2 (2013), 20–28.
[63]
Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition, Vol. 1. 278–282.
[64]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.
[65]
Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1225–1234.
[66]
Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T. Campbell. 2014. StudentLife: Assessing mental health, academic performance and behavioral trends of college students using smartphones. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 3–14.
[67]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous graph attention network. In The World Wide Web Conference. 2022–2032.
[68]
David Watson, Lee Anna Clark, and Auke Tellegen. 1988. Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology 54, 6 (1988), 1063.
[69]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S. Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems 32, 1 (2020), 4–24.
[70]
Thinhinane Yebda, Jenny Benois-Pineau, Helene Amieva, and Benjamin Frolicher. 2019. Multi-sensing of fragile persons for risk situation detection: Devices, methods, challenges. In 2019 International Conference on Content-Based Multimedia Indexing (CBMI’19). IEEE, 1–6.
[71]
Weiming Zhang, Yi Huang, Wanting Yu, Xiaoshan Yang, Wei Wang, and Jitao Sang. 2019. Multimodal attribute and feature embedding for activity recognition. In Proceedings of the ACM Multimedia Asia. 1–7.
[72]
Ziwei Zhang, Peng Cui, and Wenwu Zhu. 2020. Deep learning on graphs: A survey. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. DOI:https://doi.org/10.1109/TKDE.2020.2981333

Cited By

View all
  • (2024)BEANet: An Energy-efficient BLE Solution for High-capacity Equipment Area NetworkACM Transactions on Sensor Networks10.1145/364128020:3(1-23)Online publication date: 23-Feb-2024
  • (2024)Temporal Graph Attention Model for Enhanced Clinical Risk Prediction2024 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS)10.1109/SCEECS61402.2024.10481970(1-7)Online publication date: 24-Feb-2024
  • (2023)COFlood: Concurrent Opportunistic Flooding in Asynchronous Duty Cycle NetworksACM Transactions on Sensor Networks10.1145/357016319:3(1-21)Online publication date: 1-Mar-2023
  • Show More Cited By

Index Terms

  1. Health Status Prediction with Local-Global Heterogeneous Behavior Graph

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
    November 2021
    529 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3492437
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2021
    Accepted: 01 March 2021
    Revised: 01 January 2021
    Received: 01 August 2020
    Published in TOMM Volume 17, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Health status prediction
    2. graph neural networks
    3. individual behavior

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Key Research and Development Program of China
    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)389
    • Downloads (Last 6 weeks)84
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)BEANet: An Energy-efficient BLE Solution for High-capacity Equipment Area NetworkACM Transactions on Sensor Networks10.1145/364128020:3(1-23)Online publication date: 23-Feb-2024
    • (2024)Temporal Graph Attention Model for Enhanced Clinical Risk Prediction2024 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS)10.1109/SCEECS61402.2024.10481970(1-7)Online publication date: 24-Feb-2024
    • (2023)COFlood: Concurrent Opportunistic Flooding in Asynchronous Duty Cycle NetworksACM Transactions on Sensor Networks10.1145/357016319:3(1-21)Online publication date: 1-Mar-2023
    • (2023)Analyzing the contribution of different passively collected data to predict Stress and Depression2023 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)10.1109/ACIIW59127.2023.10388089(1-4)Online publication date: 10-Sep-2023
    • (2022)Underwater Sensor Multi-Parameter Scheduling for Heterogenous Computing NodesACM Transactions on Sensor Networks10.1145/347651318:3(1-23)Online publication date: 19-Sep-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media