1 Introduction
Health is not only the basic guarantee of human happiness and well-being but also the foundation of economic progress. Keeping in good health needs reasonable health management, which has attracted increasing attention from governments and companies all over the world. Existing health management mainly relies on medical examination and specialized patient treatment in hospitals. However, many citizens usually do not consider going to the hospital for a check-up until they have abnormal physical symptoms. Moreover, regular or irregular medical examination with the professional medical equipment in hospitals can only get some discrete measurements of individuals’ health status at a specific moment. Both reasons bring a lot of challenges for early detection and prevention of diseases, which are the key components of effective health management.
According to a study of the
World Health Organization (WHO), personal behaviors and lifestyles account for 60% of factors affecting human health [
1]. For example, non-smokers with defective genes are much less likely to suffer from lung disease than regular smokers [
2]. A healthy diet and adherence to appropriate exercise can greatly reduce the incidence of diabetes and cardiovascular diseases [
3]. Therefore, the real-time and continuous analysis and monitoring of personal behavior and health status are helpful for individuals to enhance self-active health awareness and learn disease prevention knowledge, thus improving health management capabilities [
47].
The rapid development of smart portable and wearable devices has promoted the widespread use of various low-power sensors, and the advent of the 5G era has made it possible to collect individual health data streams of multiple sensors in real time. As long as people carry their devices, all their daily routines, diets, and activities are recorded automatically and instantly without extra effort. For example, the daily routine can be recorded by a GPS sensor, a picture showing people’s diet can be captured using a phone camera, and the activity information can be recognized using an accelerometer sensor. These real-time data streams can be transferred to a back-end system for behavior analysis and health status estimation and finally help people improve their health. Compared with the patient records in the hospital, these kinds of data streams not only provide long-term signals to fully describe the individual’s daily behavior and lifestyle but also support continuous data transmission and analysis without interfering with individuals’ daily lives and work.
The health data streams collected from various sensors are multi-source and heterogeneous. On the one hand, the sampling frequency of each sensor is different, which makes the co-processing difficult. On the other hand, the collected sensor data are multimodal (e.g., pictures or videos from cameras, motion signals from accelerometers and gyroscopes, and location coordinates from GPS sensors). Though multi-source sensors can provide complementary information, the feature learning and joint utilization of the multimodal data streams remain a challenge for health status estimation.
There have been a bunch of health status prediction methods based on mobile devices. A few methods rely on non-parametric methods (e.g., K-Means, Mean Shift) with the single-source data as input, such as acceleration signals [
51], camera data in smartphones [
7], and sound from microphones [
50]. Other methods are devoted to taking advantage of information from various sources [
41,
46]. However, most of the existing methods do not make full use of the structure information of the multi-source data streams, which affect the further improvement of performance. Practically, the multi-source and heterogeneous health data stream mainly reflects the individual behaviors and lifestyles that contain complex temporal structures with local contextual and global temporal aspects. Local context refers to the behavior in the short term, such as the activities and routines in 1 day. Detailed behavior information such as the activity sequence and location transfer should be considered in local context to get the principal characteristic of the daily behavior. For the global temporal aspect, the temporal dependency among local contexts needs to be captured for representing the long-term comprehensive description of individual behaviors.
Recently,
Graph Neural Networks (GNNs) [
22,
69,
72] have drawn great attention in modeling interactions in structural data. Taking a graph as input, GNNs propagate messages between nodes according to the edges, and thus learn the representations for both nodes and edges. Most GNNs work on the homogeneous cases where the nodes in a graph belong to one type [
6,
26,
52]. Heterogeneous graph neural network [
62], a special case for GNN, is devoted to solving the other situation where nodes are of different types. It has been successfully applied in [
14,
18,
67], where highly competitive performances are obtained. Inspired by its development, it is promising to model the intra-modality structure and inter-modality interaction of the multi-source and heterogeneous health data with the heterogeneous graph neural networks.
In this article, we propose to predict daily mental health status based on the multi-source wearable sensor data. We build a local-global individual behavior graph (LGIBG) based on the heterogeneous data and then predict the daily health status with the help of heterogeneous graph neural networks. Specifically, we take three kinds of sensor data streams (accelerometer, audio, WiFi) as input and detect middle-level behavior-related concepts (i.e., walking, running, silence) with pretrained backbone models. These concepts are further used to build the local-global individual behavior graph, which consists of multiple local context sub-graphs and a global temporal sub-graph. The local context sub-graphs are created with the concepts detected from daily data streams as heterogeneous nodes that are connected with homogeneous and heterogeneous edges. Next, a densely connected global temporal sub-graph is created on top of the local context sub-graphs. Then we take advantage of the heterogeneous neural network to learn the features of local context sub-graphs and get both the semantic and structural representations. The representation of the global temporal sub-graph is learned with a self-attention network, and it is finally used to predict the health status.
In summary, the contributions of our work are threefold: (1) To effectively represent the behavior-related multi-source data collected from wearable sensors, we build a local-global graph that consists of multiple local context sub-graphs and a global temporal sub-graph. The local-global graph can well describe the short-term context information of individual behaviors and their long-term temporal dependencies. (2) We learn the short-term semantic and structural representations from local context sub-graphs with heterogeneous graph neural networks and the long-term representation from the global temporal sub-graph with self-attention networks. (3) We demonstrate the effectiveness of the proposed method in health prediction on the public dataset Studentlife.