Open AccessArticle

A Framework with Elaborate Feature Engineering for Matching Face Trajectory and Mobile Phone Trajectory

Ziqi Dong

^1,†

Furong Tian

^1,†,

Hua Yang

^1,*,

Tao Sun

²,

Wenchuan Zhang

¹ and

Dan Ruan

School of Big Data and Computer Science, Guizhou Normal University, Guiyang 550025, China

School of Electronic Information, Wuhan University, Wuhan 430072, China

Department of Radiation Oncology, University of California, Los Angeles, CA 90095, USA

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2023, 12(6), 1372; https://doi.org/10.3390/electronics12061372

Submission received: 8 February 2023 / Revised: 5 March 2023 / Accepted: 9 March 2023 / Published: 13 March 2023

Download

Browse Figures

Figure 1
The framework of heterogeneous face–phone trajectory matching. "> Figure 2
The face–phone trajectory matching problem (the ’**’ hides the true values for legal considerations). "> Figure 3
Example of Noise point in a trajectory. "> Figure 4
Example of how MGSTWS algorithm works (change of <math display="inline"><semantics> <mrow> <mi>C</mi> <mi>d</mi> <mi>d</mi> <mi>t</mi> <mo>_</mo> <mi>S</mi> <mi>e</mi> <mi>t</mi> </mrow> </semantics></math> at 3 time points (a–c)). "> Figure 5
Visualization of affinity score of face–phone trajectory point pair. "> Figure 6
Visualization of affinity function of face–phone trajectory point pair. "> Figure 7
Silhouette coefficient with different values of k. "> Figure 8
ROC curves of different models. "> Figure 9
Confusion matrix of LightGBM. ">

Versions Notes

Abstract

The advances in positioning techniques have generated massive trajectory data that represent the mobility of objects, e.g., pedestrians and mobile phones. It is important to integrate information from various modalities for subject tracking or trajectory prediction. Our work attempts to match a face with a corresponding mobile phone based on the heterogeneous trajectories. We propose a framework which associates face trajectories with their corresponding mobile phone trajectories using elaborate and explainable features. Our solution includes two stages: an initial selection of phone trajectories for a given face trajectory and a subsequent identification of which phone trajectory provides an exact match to the given face trajectory. In the first stage, we propose a Multi-Granularity SpatioTemporal Window Searching (MGSTWS) algorithm to select candidate mobile phones that are spatiotemporally close to a given face. In the second stage, we first build an affinity function to score face–phone trajectory point pairs selected by MGSTWS, and construct a feature set for building a face–phone trajectory matching determinator which determines whether a phone trajectory matches a given face trajectory. Our well-designed features guarantee high model simplicity and interpretability. Among the feature set, BGST intelligently leverages disassociation between a face and a mobile phone even if there exists some co-occurence for a non-matching face–phone pair. Based on the feature set, we represent the face–phone matching task as a binary classification problem and train various models, among which LightGBM achieves the best performance with 92.6% accuracy, 96.9% precision, 88.5% recall, and 92.5% F1. Our framework is acceptable in most application scenarios and may benefit some downstream tasks. The preselection-refining architecture of our framework guarantees the applicability and efficiency of the face–phone trajectory pair matching frame.

Keywords:

trajectory reconstruction; trajectory matching; trajectory feature engineering; pedestrian tracking; suspect tracking

1. Introduction

Devices such as GPS, CCTV, and telecom base station are sensors and generate massive movement trajectory data of objects such as pedestrians, mobile phones, and vehicles. When used legally and responsibly, these trajectory data can provide valuable information to support many applications. For example, gathering patterns can be used to manage traffic flow [1]. Trajectory prediction can be used to locate missing individuals or track suspects. Other existing typical applications based on trajectory data include popular route recommendation [2], travel time estimation [3], frequent path finding [4], fraud detection [5], urban functional area discovery [6], diagnosis of urban noise [7], estimation of the popularity of real estate [8], human trajectory forecasting in crowds [9], vehicle trajectory prediction [10], animal behavioural modeling [11] and so on.

However, each kind of trajectory data collected by the sensors, e.g., GPS, CCTV, or telecom base station, can only approximately, discretely, and incompletely represent the underlying continuous original trajectory of the corresponding object. Even GPS signals could be interrupted by obstruction or poor reception. It is natural to expect that combining different sensors would yield more accurate and complete approximation of the subject’s continuous trajectory. We call this trajectory utilizing multiple types of sensors as heterogeneous/hybrid trajectory, which implies multi-model observations of the same moving object’s continuous trajectory, in contrast to the ones based on a single sensor type.

However, a major challenge in utilizing hybrid sensor observations for trajectory reconstruction is the identification of correspondence across different sensor modalities. Hybrid trajectory data respectively collected by CCTV and mobile phone are a typical example. Even if every face image is assigned an identification (faceID) using a face image recognition technique and a clustering algorithm, and each mobile phone is assigned a phoneID, it is still difficult to determine the correspondence between faceID and phoneID. Once identifying the correspondence as a prerequisite task is solved, the downstream tasks may be advanced due to rich information within the rich modalities.

This paper aims to match a pedestrian’s faceID to their phoneID. We have received legal clearance and authorizations to perform this study with all the faceIDs and phoneIDs anonymized for privacy protection. Each trajectory is composed of a sequence of trajectory points consisting of a timestamp, geographic position, and unique faceID/phoneID.

In this paper, we propose a framework to associate faceIDs and phoneIDs based on trajectory samples. Our framework consists of two phases: preselecting and identifying. Firstly, we proposed a Multi-Granularity SpatioTemporal Window Searching (MGSTWS) algorithm to preselect the candidate phoneIDs for a given faceID. Secondly, we constructed a feature set and built models to decide whether a candidate phoneID matches a specified faceID. The two modelling stages were actually implemented as a framework specifically consisting of four cascaded parts, as shown in Figure 1. An affinity function (AF) was adopted to quantify the affinity between a trajectory point on a phone trajectory (“phone point”) and a given face trajectory point (“face point”). Another interesting novelty is that we propose to use the Big GeoDis and Small TimeDiff (BGST) metric to indicate the geographical separation within a short timeframe between a faceID and phoneID, and use it as negative evidence of their association.

Our contributions are summarized as follows:

A complete, explainable and applicable framework for face–phone trajectory matching, based on the idea of preselecting–identifying;
A Multi-Granularity SpatioTemporal Window Searching (MGSTWS) algorithm for effectively preselecting candidate phones for a given faceID, with minimal requirement on trajectory length alignment;
Explainable and effective feature engineering for differentiating real matching face–phone pairs and non-matching pairs. Six meaningful features were designed, and classifiers were trained based on these features to decide whether a phone matches to a given face or not;
An affinity function (AF) was constructed to quantify the correspondence possibility for a face–phone trajectory point pair, hence lowering the computation cost and simplifying implementation;
BGST is a strong negative but not absolute evidence that a phone does not match a given face. A Big Geographic position distance but Small Time difference Searching (BGSTS) algorithm was developed to rapidly search BGST events and count the times that BGST happens.

2. Related Work

Prior works mainly focus on single source trajectory matching using similarity measurements, such as matrix matching [12]. By unisource trajectory, we mean that the pairs of trajectories to be determined as real matching or non-matching are collected by the same type of sensor and they always possess similar statistical property. In this paper, we studied the trajectory pair matching based on heterogeneous trajectory data, specifically, face image trajectory and phone trajectory. This is a very fresh and challenging task and we are entrusted to conduct the related research. Very few works can be found that are close enough to the application background we are facing. However, some works do provide us with some inspiration and provide beneficial reference value.

2.1. Similarity-Measure-Based Method

To measure the similarity between trajectories, earlier works mainly depended on designing similarity measures between a pair of trajectories [13]. Similarity measures can be roughly categorized as two types: spatial similarity and spatiotemporal similarity [14]. Spatial similarity views trajectories with similar geometric shapes as similar and ignores the information of time information contained in the trajectory data, such as Euclidean distance (ED) [15], Closest pair distance (CPD) [16], Ordered weighted distance (OWD) [17], Angle measurement of Hausdorff distance (HD) and shape similarity [18]. Spatiotemporal similarity takes into account both the spatial and temporal dimensions of trajectories, such as Dynamic time distortion (DTW) [19], Edit distance on real sequence (EDR) [20], and the Longest common subsequence (LCSS) [21].

These measures mainly depend heavily on expert knowledge so that the similarity is of physical meaning. However, being mainly designed for unisource trajectory data and without considering multi-modality trajectory data such as face and phone trajectory, these measures are not so desirably applicable for heterogeneous trajectory matching. Moreover, most similarity measures, such as LCSS, DTW and EDR, are of high computation cost when operating millions of trajectory data [14].

2.2. Heterogeneous Trajectory Pair Matching

To the best of our knowledge, very few works study face–phone trajectory matching and ours is the first attempt to study face–phone trajectory pair matching. References [22,23] are the most related works in heterogeneous trajectory matching with a focus on vehicle–phone trajectory matching. Reference [22] proposes a framework to perform vehicle–phone trajectory matching based on Automatic License Plate Recognition(ALPR) and cellular signalling data. The framework in [22] is based on some strong assumptions or requirements, e.g., only one mobile phone is in a vehicle, or trajectories with low frequency are pruned. Reference [23] has achieved high accuracy in vehicle–phone trajectory matching by partitioning trajectory segments into three-dimensional space-time cells for parallel processing.

Unlike pedestrians’ random and low-speed movements, vehicles move along existing roads at high speed. Therefore, the time intervals between successive trajectory points are small enough to effectively represent real vehicle trajectories. The high-speed movement of a vehicle also provides a large number of different geographical locations in a vehicle trajectory. These major differences between face–phone trajectory data and vehicle–phone trajectory data deters direct application of methods developed for vehicle–phone matching to face–phone matching. Face–phone trajectory matching obviously suffers from sample sparseness since pedestrians may stay in a place and or move within a very small area for a long time and do not generate informative samples.

2.3. Data Augmentation and Rebalancing

Data augmentation is a way to generate sufficient and diverse training samples so as to improve model performance in machine learning. Data augmentation aims to produce realistically distributed data samples based on reasonable principles, including transformation-based methods [24,25], decomposition methods [24], noise-adding methods [26], etc. These methods are widely used in the field of computer vision. Unlike the above-mentioned methods, we use oversampling technique to address the issue of sample insufficiency and sample imbalance in heterogeneous trajectory data since the above methods are not easily transferred for trajectory sample and network-based methods always lack interpretability. In this paper, the trajectory data suffers insufficient effective sample amount and severe imbalance. Oversampling techniques such as SMOTE [27], Borderline-SMOTE [28] and ADASYN [29], may change minority sample distribution which is exactly what should be preserved. We propose an oversampling method called Clustering-based Oversampling for Minority class (COM) which first clusters minority samples and conducts over-sampling based on the centers of the resulted clusters so as to reduce the risk of changing sample distribution.

3. Notation and Problem Description

Table 1 shows the notations used in this work.

A trajectory point is a 4-tuple in the form of

(F I D, t s m p, l g t d, l t t d)

(P I D, t s m p, l g t d

l t t d)

. A trajectory is a tsmp-ordered sequence of trajectory points with the same

P I D

F I D

. Spatiotemporal window 〈timeDiff, geoDis〉 is a 2-tuple where timeDiff and geoDis respectively represents the time difference and geographic position distance between two trajectory points.

As shown in Figure 2, when a pedestrian walks, his/her face image is taken by a CCTV camera. An

F I D

is assigned to the face image taken using face recognition technique and a certain clustering algorithm. A trajectory point is recorded as the longitude and latitude of the camera, tagged with picture-taking time, providing approximate location of the subject with a resolution of 20 m. The phone trajectory is formed similarly based on cell reception towers to form the

P T D

data set. This work aims to solve the face–phone trajectory matching problem based on above background.

4. Method

Figure 1 shows our framework for face–phone trajectory matching. The input of the framework is the trajectory records collected from real scene, and the output is a classification model which determines whether a phone matches a phone or not. After the classifier is built, the whole framework is also used in real application for finding out the real matching phoneID for a given faceID. Our framework consists of four modules: (i) Data preprocessing. This module mainly handles the noise trajectory points. (ii) Preliminary selection. In this module, MGSTWS algorithm (shown as orange) is proposed to initially select candidate phones for a given specified face, and then the resulted phones are included into a candidate set of phones with high possibilities to be real matching phone with the given face. (iii) Sample construction. This module prepares training and testing samples for module 4. We build six features (shown as red) based on statistical measures which jointly reflect the association between a face trajectory and a phone trajectory. Among the six features, the times of BGST is a highly negative feature designed to capture the occasion that a face and a phone are located far apart within in a short duration. The detail of feature engineering is shown in Section 4.3.1 and all the features are shown in Table 2. Module 3 finally labels the samples for the following module 4. (iv) Classification Model building. Based on samples obtained in module 3, multiple classification models are trained which determine whether a phone matches a face or not and the classifiers are compared to demonstrate which model performs the best. We also call the classification model as “face–phone trajectory matching determinator (FPTMD)”. The details of the four modules are as follows.

4.1. Noise Filtering for Data Preprocessing

Spatial trajectories may not be absolutely precise due to all kinds of causes, such as sensor noise. Figure 3 is an example of one of these cases where the error of a noise point such as

p_{5}

is too high (e.g., several hundred meters away from its actual location) to derive relatively accurate information such as travel speed. It is necessary to deal with such noise trajectory points in trajectories to guarantee the performance of our entire framework.

We adopt a mean filter [14] method to filter the noises. This method first identifies highly noisy points and then corrects them. Since trajectory data are all generated by human movement, we set the upper bound of moving speed at 10 m/s. Once a trajectory point, compared with its predecessor point, yields a speed greater than this upper bound, the point is considered noisy, and its value is estimated as the mean of noisy z of its predecessor trajectory points. As the example shown in Figure 3,

p_{5}

.z =

\sum_{i = 0}^{4}

p_{i}

.z /5 if we use a mean filter with a sliding window size of 5 and the smoothed point of

p_{5}

will be modified to

P_{5}^{^{'}}

. The mean filter method is practical for handling individual noise points such as

p_{5}

in a trajectory if the points around

p_{5}

are composed of dense trajectory points. The mean filter method is adopted when real-time working is required. The filter can consider the trajectory points following a noisy point if the trajectory points following the noise point are available.

4.2. MGSTWS Algorithm for Preliminary Selecting Candidate Phones for a Given Face

For a given face trajectory point, Multi-Granularity SpatioTemporal Window Searching algorithm (abbreviated as MGSTWS) finally obtains phone candidate set by sliding spatiotemporal windows. Algorithm 1 presents the pseudocode of MGSTWS. We believe that a face–phone trajectory pair originated from the same pedestrian’s movement is likely to include face–phone trajectory point pairs which are of both low geoDis and timeDiff. Different from prior work [22], we apply multiple spatiotemporal windows to roughly capture the association in hybrid face–phone trajectories and obtain a candidate phone set for a given faceID. In MGSTWS, we introduce matching confidence c which indicates the number of mobile phone trajectory points appearing in each corresponding window, and the confidence of same one movement will be accumulated along with the time sequence of face trajectory points. The larger value of c, the greater the possibility that the phone trajectory is real matching with a given faceID. So, we order candidate mobile phones by the confidence descending, and the top k (set as 100 in our experiment) mobile phones will be included into the candidate set Cddt_Set.

Figure 4 exemplifies how MGSTWS works. In Figure 4a, the window is a 3-dimensional search scope centered on each face trajectory point in a face trajectory

t r j_f^{1}

. As shown in Figure 4b, for

t r j_f_{1}^{1}

of face trajectory

t r j_f^{1}

, two phone trajectory points

t r j_p_{1}^{1}

and

t r j_p_{1}^{2}

are contained in the search scope which may match trajectory

t r j_f^{1}

t r j_p^{1}

and

t r j_p^{2}

are then viewed as the candidate phones for

t r j_f^{1}

. At this time, the confidence of both

t r j_p^{1}

and

t r j_p^{2}

is 1. When the window slides to

t r j_f_{2}^{1}

t r j_p^{1}

and

t r j_p^{2}

are still in the candidate set with

t r j_p_{2}^{1}

and

t r j_p_{2}^{2}

in the window, and their confidence increases by 1. As shown in Figure 4c, only

t r j_p_{3}^{2}

and

t r j_p_{4}^{2}

are in the window of

t r j_f_{3}^{1}

and

t r j_f_{4}^{1}

. As a result, the matching confidences of the two phoneIDs are 2 and 4, respectively. Additionally,

t r j_p^{2}

is more likely to match the face trajectory

t r j_f^{1}

than

t r j_p^{1}

Here are some details for implementing MGSTWS. (1) The 3-dimentional window is actually implemented as a 2-dimention spatiotemporal window with height

θ

and width

β

respectively representing geoDis scope and timeDiff scope. (2) Binary search is used to obtain candidate phones based on the spatiotemporal windows to avoid brute force searching, so the complexity of the MGSTWS is

Ω

(n×

l o g

(m)), where n is the length of

F T D

and m is the length of

P T D

. (3) To improve the efficiency, we desample face trajectory points if the face trajectory points are too dense possibly due to the related device’s unnecessary over-sampling.

Algorithm 1 Multi-Granularity SpatioTemporal Window Searching (MGSTWS)

Require: $F I D$ $f i d$ , $P T D$ , timeDiff scope $θ$ , geoDis scope $β$
Ensure: Candidate phone set Cddt_Set for $f i d$
$C d d t_S e t \leftarrow$ ⌀
$C d d t_S e t_t e m p \leftarrow$ ⌀ // temporary set stores candidate $P I D$ s and its confidence for given $f i d$ with element in a form of 〈 $P I D$ , confidence〉
Get the trajectory $t r j_f$ of $f i d$
for each face trajectory point $t r j_f_{i} \in t r j_f$ do
$t = t r j_f_{i} . t s m p$
$x = t r j_f_{i} . l g t d$
$y = t r j_f_{i} . l t t d$
$t r a j_p o i n t s$ = {phone trajectory point $t r j_p_{i}$ ∈ PTD
$| (t - θ) \leq (t r j_p_{i} \cdot t s m p) \leq (t + θ)}$ // get the phone trajectory points around $t r j_p_{i}$ within the timeDiff scope $θ$ .
$t r a j_p o i n t s = {t r j_p_{i} \in t r a j_p o i n t s |$
$d i s t_{g} (t r j_p_{i} . l g t d, t r j_p_{i} . l t t d, x, y) \leq β}$ // get the phone trajectory points around $t r j_p_{i}$ within the geoDis scope $β$ , where $d i s t_{g} = 2 R \times a r c s i n (\sqrt{g (\frac{t r i_p_{i} \cdot l t t d - y}{2}) + c o s y \cdot c o s (t r i_p_{i} \cdot l t t d) \cdot g (\frac{t r i_p_{i} \cdot l g t d - x}{2})})$ is the distance between $t r j_p_{i}$ and $t r j_f_{i}$ where R = 6371 km is radius of the earth and $g (x) = s i n^{2} (x)$ .
for each $t r j_p_{i} \in t r a j_p o i n t s$ do
if $t r j_p_{i} n o t i n C d d t_S e t_t e m p$ then
$C d d t_S e t_t e m p$ .append ( $〈 t r j_p_{i} . P I D$ ,1〉)
else
update 〈 $t r j_p_{i} . P I D$ , $c o n f i d e n c e$ 〉 in $C d d t_S e t_t e m p$ as 〈 $t r j_p_{i} . P I D$ , $c o n f i d e n c e + 1$ 〉
end if
end for
end for
Sort elements in $C d d t_S e t_t e m p$ by confidence descending
for each e in $C d d t_S e t_t e m p$ do
$C d d t_S e t$ .append(e. $P I D$ )
end for
Return $C d d t_S e t$

4.3. Sample Construction

This module focuses on constructing samples for the following module 4 to build a classification model, i.e., the face–phone Trajectory Matching Determinator (FPTMD). The constructed samples will be used for training and testing in module 4.

4.3.1. Feature Engineering

Feature engineering is a key technique in machine learning, especially confronted with small data size. Highly discriminative features can help improve classification performance even using simple models [30]. We propose six features (see Table 2) which are statistical measures computed depending on spatiotemporal relation between the phones and a given face, trying to effectively capture the association in face–phone trajectory pairs.

Before building the five features, we built an affinity function to assign an affinity score to a face–phone trajectory point pair so as to quantitatively describe the spatiotemporal association between a face trajectory point and a phone trajectory point. Then, the five features were computed based on these affinity scores. The affinity function is designed to be in a shape of a radius base function with time difference (noted as timeDiff) and geographic position distance (noted as geoDis) as two inputs since the affinity function aims to model the heuristic knowledge that both timeDiff and geoDis being small implies a high possibility that the two heterogeneous trajectory points are originated from the same moving pedestrian. For a face trajectory and a phone trajectory, if timeDiff and geoDis are always both small for the trajectory point pairs in this face–phone trajectory pair, it can be further confirmed that these two trajectories are originated from the same pedestrian’s movement than otherwise. Since it is difficult to analytically derive such a radial base function which can fully and precisely describe the above heuristic knowledge, the function approximation technique was adopted to solve this difficulty and build such a radius function. We first manually built an affinity score matrix (Table 3) which roughly captures the above heuristic knowledge. Figure 5 shows the visualized version of the affinity score matrix. The size of the largest spatiotemporal is set as 〈10 min, 1 km〉 since nothing can be conjectured about a face–phone trajectory point pair if they are beyond this window and the range of affinity score is set as [0, 100] for convenient observance during the experiment. Secondly, by using this polynomial fitting for its strong ability to model various functions, the polynomial shown in (3) was obtained, which quantitatively models the above heuristic knowledge with a continuous function even though we only labeled the corresponding discrete points in the affinity score matrix. This trick also reduces the time cost since querying the matrix is not needed anymore. After obtaining the affinity function, it can be used to score any given pair of face–phone trajectory points even if the timeDiff and geoDis are not contained in the affinity score matrix.

Table 2. Features design for face–phone trajectory matching.

Features	Description
Sum_Af_score	Sum of scores of trajectory point pairs
Mean_Af_score	Mean of scores of trajectory point pairs
Num_Af_CoOcur	Times that a phone and a face is
	caught by spatiotemporal Windows
Std_Af_score	Standard deviation of scores
	of trajectory point pairs
Median_Af_score	Median of scores of trajectory point pairs
Times_BGST	Times that BGST happens

The features for deciding whether a phone trajectory and a face trajectory are real matching and their description are listed in Table 2, including Num_Af_CoOcur, Sum_Af_score, Mean_Af_score, Median_Af_score, Std_Af_score, Times_BGST. Each of these 6 features individually has some significance to imply the degree to which a phoneID matches a faceID. However, more worth mentioning is that some combinations of these features have meaningful explanation as for the possibility that a phone trajectory and a face trajectory matches. Some important explanations are as follows:

The “Num_Af_CoOcur” is the times that a phone and a face are caught in a spatiotemporal window and it is also the times that the face–phone trajectory point pairs are scored by the affinity function. Num_Af_CoOcur roughly reflects a frequency of co-occurrence of the corresponding faceID and phoneID. The “Sum_Af_score” is actually a fined version of Num_Af_CoOcur since every face–phone trajectory point pair caught in the windows is quantitatively scored by the affinity function.
Mean and median are two metrics for measuring the central tendency of a set of values of a random variable, specifically measuring the location of the middle or center of score distribution [31]. Intuitively, if a pair of face–phone trajectory is originated from same one movement, Mean_Af_score and Median_Af_score should be higher than those computed from non-matching face–phone trajectory pairs. More deeply, if a pair of face–phone trajectory is originated from same one movement, it should have some consistency between Mean_Af_score and Median_Af_score, i.e., they should be closer to each other than those computed from non-matching face–phone trajectory pairs. If the Mean_Af_score is much larger or much smaller than the Median_Af_score, the corresponding trajectory pair is not likely to be a matching pair and even Mean_Af_score or Median_Af_score is high since it is quite possibly caused by accidental but frequent co-occurrence of the corresponding face and phone which are not real matching pairs.
Standard deviation is a measure for describing the degree of dispersion of the values of a random variable. As a feature for describing the association between a face trajectory and a phone trajectory, a low Std_Af_score implies a more stable association in terms of the timeDiff and geoDis than high Std_Af_score does.
A faceID and a phoneID are unlikely to be a real match if they often appear at two geographic locations far from each other within a short time window. 1.6 km/min is an upper speed limit that a pedestrian can move. For two trajectory points, if the ratio of geoDis between them to the timeDiff between them exceeds 1.6 km/min, it is reasonable to infer that these two trajectory points are not generated from the same pedestrian. If these two trajectory points are face trajectory and phone trajectory, one can claim the corresponding faceID and phoneID as a negative match even if they may sometimes cross within a spatiotemporal window where timeDiff and geoDis both being very small. If the event that the ratio of geoDis between them to the timeDiff between them exceeds 1.6 km/min for two trajectory points happened, we say an event happens that a faceID and a phoneID appears with a big geoDis but with small timeDiff, simplified as BGST (Big geoDis and Small timeDiff) event.

We innovatively propose an algorithm called Big GeoDis and Small TimeDiff Searching (BGSTS) to count the number of times BGST(Times_BGST) as a negative feature as a high-confidence indicator that a face–phone trajectory pair is non-matching. Algorithm 2 gives pseudocode of BGSTS algorithm which rapidly searches the BGST case through the specified spatiotemporal windows {〈

ε

τ

〉}. A given face point and a phone point caught in a window with very small

ε

but very large

τ

implies that the face point and phone point appear far away within a short time duration. i.e., BGST happens for corresponding face and phone. While implementing BGSTS, binary search is used to raise searching speed so as to guarantee real-time applications and the time complexity is

Ω

(n×

l o g

(m)) where n is the length of

F T D

and m is the length of

P T D

There is the possibility of some special cases that face and mobile phone appear far away within a short duration but they belong to the same one owner, e.g., a pedestrian forgets to take his/her phone. So, we do not consider BGST as absolute evidence for deciding a face–phone pair as non-matching pair but instead count the times BGST happens, i.e., Times_BGST, and leave it for the FPTMDs to decide how to utilize Times_BGST to contribute to deciding whether a face and a phone match.

Algorithm 2 Big GeoDis and Small TimeDiff Event Searching (BGSTS)

Require: $F I D$ $f i d$ , $C d d t_S e t$ for $f i d$ which is obtained from MGSTWS algorithm, timeDiff scope $ε$ , geoDis scope $τ$
Ensure: $B G S T_S e t$ with the element in a form of 〈 $P I D$ , $t i m e s_B G S T$ 〉 where $P I D$ is in $C d d t_s e t$ and the $t i m e s_B G S T$ is the number of times that the BGST event happens for the $P I D$ and the $f i d$
BGST_Set←⌀
Get the face trajectory $t r j_f$ of $f i d$
Sort face trajectory points in $t r j_f$ by timestamp ascending
for each $p i d$ ∈ $C d d t_s e t$ do
Get the phone trajectory point set $p h_S e t$ for $p i d$
Sort phone trajectory points in $p h_S e t$ by timestamp ascending
for each face trajectory point $t r j_f_{i} \in t r j_f$ do
$t = t r j_f_{i} . t s m p$
$x = t r j_f_{i} . l g t d$
$y = t r j_f_{i} . l t t d$
$t r a j_p o i n t s = {$ phone trajectory point $t r j_p_{i} \in$ ph_Set $| d i s t_{g} (x, y, t r j_p_{i} . l g t d, t r j_p_{i} . l t t d) > τ$
and $(t - ε) \leq t r j_p_{i} . t s m p \leq (t + ε)$ // get phone trajectory points which is in the timeDiff scope $ε$ and out of the geoDis scope $τ$ .
for each phone trajectory point $p_{i} \in t r a j_p o i n t s$ do
if $p_{i}$ not in $B G S T_S e t$ then
$B G S T_S e t$ .append (〈 $p_{i}$ . $P I D$ ,1〉)
else
update 〈 $p_{i}$ . $P I D$ , $t i m e s_B G S T$ 〉∈ $B G S T_S e t$ as 〈 $p_{i}$ . $P I D$ , $t i m e s_B G S T$ + 1〉
end if
end for
end for
end for
Return $B G S T_S e t$

4.3.2. Sample Labeling

After applying MGSTWS for a given face trajectory, the potential matching phone candidate set was obtained for a given face, and potential matching phones with high confidence were included into a candidate set for the given face. We set the size of the candidate set as 100. For each phone trajectory in the candidate set, we paired it with the given face trajectory and computed the six features’ specific values for each face–phone trajectory pair. We created a sample in the form of 〈Num_Af_CoOcur, Sum_Af_score, Mean_Af_score, Median_Af_score, Std_Af_score, Times_BGST, label〉. The label was assigned 1 for real matching phone trajectory and it was actually included in the candidate set; the label was assigned 0 for all the non-matching phone trajectories included in the candidate set. A typical positive sample is 〈5, 406, 81.2, 82, 80, 13.08, 0, 1〉.

Up to this point, we can construct a basic sample set for training and testing a model by performing above-mentioned operations based on a face trajectory set and a phone trajectory set, as well as the real matching relation between faceID and a phoneID as ground truth given by our trajectory provider.

4.3.3. Theory behind the Sample Construction

The main idea of the above sample construction is as follows.

Since face–phone matching is a problem involving “making pair between face trajectory and phone trajectory in massive face trajectory data and phone trajectory set”, we solved the pair problem as an equivalent problem by setting each face trajectory as a fixed reference point and computing the features which aim to catch spatiotemporal relation between the candidate phone trajectories and the given face trajectories. By doing this, the train of thought was simplified and our framework satisfied the real-world application where the face trajectory points and phone trajectory points were collected sequentially in real time.

Our features and their combinations are inherently interpretable and could be extended to many related or similar tasks, such as vehicle–phone matching task. Interpretable features imply interpretable models can be adopted to meet the requirement for interpretability in real-world applications which complex models always lack, such as deep networks. Well-designed features mean that a much lower volume of samples is needed and this is desirable if a large volume of valuable real-world data is not available due to various reasons, such as the limitation of devices or the expense of labeling samples.

For face–phone trajectory matching, while positive pairs are natural to obtain from ground truth, it is obvious that we lack a natural standard for labeling negative pair samples. In our framework, the negative samples are generated based on the preliminary selected phone trajectory candidate set for the given face trajectory, and the non-matching candidate phone trajectories in the candidate set have relatively similar spatiotemporal relations with real matching phone trajectories when a face trajectory is referred. In the language of machine learning, the negative samples we labeled are the hard ones that are quite similar to the positive samples, which are also the ones that the real application always requires to be ruled out with respect to a given face trajectory. Note that, although pairing a face trajectory and a randomly selected phone trajectory and then building negative pair samples is easy, it is of lower significance since there is no application requirement for doing so.

4.4. Classification Model Building for Face–Phone Trajectory Pair Matching

Even though we labeled the high-quality samples as described above, a sample imbalance can still be observed, since there were many more negative samples than positive samples, regardless of the volume of the face or phone trajectory data sets being considerably huge. This contradicts the requirement that real matching face–phone trajectory pairs are meant to be identified. An effective classification model relies on a number of training data to give a good overview of unknown data and achieve high accuracy [32]. Data augmentation is one of the common skills in machine learning that is mainly used to increase the size of training data and make the training samples as diverse as possible, then improve the classification model generalization ability [33].

Based on the samples generated by using module 3, we propose the Clustering-based Oversampling for Minority class (COM) algorithm, which generates accurate and diverse positive samples to overcome sample imbalance and positive sample deficiency. After the augmentation, we trained various classifiers which determine whether a phone trajectory matches with a face trajectory.

COM is shown in Algorithm 3. All samples are clustered into

η

clusters. A newly constructed sample

p_{i}

is then obtained by multiplying

i^{t h}

cluster center by random weight

w_{i} = r a n d (0, 1)

bounded by [0, 1]. In this way, we produce

ζ

diverse and accurate samples. The complexity of COM is

Ω

(max(

ζ

η

)).

Algorithm 3 Clustering-based Oversampling for Minority class

Require: P: the existing positive samples, $ζ$ : the number of augmented samples, $η$ : the number of clusters
Ensure: R: the set of augmented samples.
R = ⌀
$c l u$ = KMeansCluster(P, $η$ ) // The function KMeansCluster groups samples P into $η$ clusters, and return clu which stores the cluster centers
i = 0
while sizeof(R) ≤ $ζ$ do
Randomly generate a number $w_{i} \in [0, 1]$
$p_{i}$ = $w_{i}$ × clu[i% $η$ ]
R.append( $p_{i}$ )
i ++
end while
Return R

We trained different classification models, i.e., the FPTMDs, based on the augmented samples, including LightGBM [34] (Light Gradient Boosting Machine), classic Decision Tree [35] (DT), Random Forest [36], Multi-Layer Perceptron [37] (MLP), K-Nearest Neighbor [38] (KNN), and Support Vector Machines [39] (SVM). For each of these models, we used k-fold cross validation to conduct model selection so as to avoid over-fitting. Among the models, LightGBM achieves the best accuracy. LightGBM [34] is an improved implementation of GBDT (Gradient Lifting Decision Tree), which is based on exclusive feature binding (EFB) and one-side sampling (GOSS). What is more, LightGBM meets the application requirements of our research for efficiency and scalability even in the condition of high dimension and large amounts of data.

5. Experiment

5.1. Data Description

We implemented our face–phone matching framework based on two real-world trajectory data sets: the face trajectory data set (

F T D

) and the phone trajectory data set (

P T D

). One-hundred-and-thirty-nine face image collectors and one-hundred-and-eighty-nine mobile phone collectors are installed in the data-collection geographic area. The maximum span in the east–west of the data-collection geographic area was about 22 km, and the maximum span in the north–south was about 13 km.

F T D

and

P T D

were collected from September 2019 to September 2020. There were 1971 face image trajectories in

F T D

which included 833,087 face image trajectory points, and there were 1995 mobile phone trajectories in

P T D

which included 3,091,659 mobile phone trajectory points.

We would hereby clarify that all the data sets involved in this paper have already been anonymized and do not violate any privacy and were used for pure scientific research.

Table 4 shows the statistical properties of lengths (number of trajectory points in a trajectory) in the trajectories in

F T D

and

P T D

. It can be seen that the medians of both the face trajectory length and phone trajectory length are 2, implying they are too short to derive useful information from. We call this property length sparseness, using the language in the research area of data mining. By comparing the average and median of the trajectory length between

F T D

and

P T D

, it can be seen that the length of face trajectories and phone trajectories have a marked difference. This property means that most traditional similarity-based methods widely used for unisource trajectory matching are not desirable for the heterogeneous trajectory as in this paper. We call this property the length inconstancy between face trajectory and phone trajectory. Table 5 shows the statistical properties of the number of geographical locations in each trajectory in

F T D

and

P T D

. It can be observed that the numbers of the geographical locations from which the corresponding trajectory points are collected are very small for most trajectories. We found that over 60% of face trajectories are composed of less than two different geographical locations and 50% of the phone trajectories only have two different geographical locations. We call this property geographical sparseness.

The above mentioned three properties, i.e., length sparseness, length inconsistency, and geographical sparseness, lead to a major challenge for heterogeneous face–phone trajectory matching. Very few trajectory-matching-related works can be found and they mostly concern unisource trajectory matching, which does not have to face these properties and the resulting challenge in the face–phone matching task.

5.2. Noise Filtering

For the face trajectory data set, we smoothed 1036 noise points by using the mean-filter method. To evaluate the influence of the noises and the necessity of the noise filtering operation, the accuracy for each model is also shown in Table 6 to compare using noise filtering and not using noise filtering. It can be seen that the noise-filtering constantly improves the accuracy for all the models we have tried. So, we suggest using noise filtering as a necessary preparation step for our method if high data quality of

F T D

and

P T D

cannot be guaranteed, which is a common situation.

5.3. Deciding Optimal Window Sizes for MGSTWS

As an approach to preselecting the candidate phoneIDs for a given faceID, the basic requirement for the MGSTWS algorithm is to efficiently include the real matching phone for further decision but not include too many non-matching phoneIDs into the candidate phoneID set. So, it is crucial to decide the optimal sizes of the windows. Unfortunately, this is a combination optimization problem and there is not a well-developed solution. We conducted the following optimization process which obtained relatively optimal window sizes based on Sequential and Greedy Searching (simplified as SGS) for MGSTWS.

In SGS, for a single window, it is desirable if it does include the real matching phone point for a given face point and is as small as possible. We started from the biggest spatiotemporal window size 〈1000 m, 10 min〉, and alternatively decreased the timeDiff by 1 min and decreased geoDis by 100 m to generate the next smaller candidate window. For each candidate window in the sequentially generated windows, if the possibility that the candidate window includes the real matching phone trajectory point (noted as p) decreases remarkably as compared with its prior window—such as when p decreases by 20% as we set as the p difference between two successive windows—then this candidate window is adopted into the MGSTWS. We randomly selected 1000 real matching face–phone trajectory pairs and repeated this window size optimization process, i.e., SGS. Finally, we obtained an optimal window sequence as {〈2 min, 200 m〉, 〈1 min, 100 m〉}.

It is worth mentioning that the optimal window sequence should be found out in this way. The optimal windows for a place may not be optimal for another place since the related devices are installed with different geographical distribution and their working modes are different and these differences may result in different statistical properties of trajectory data sets.

5.4. Polynomial Fitting of Affinity Function

To quantitatively describe the spatiotemporal association between a face trajectory point and a phone trajectory point that is included into the candidate phoneID set by MGSTWS, we first manually assigned a reasonable affinity score with respect to timeDiff and geoDis as shown in Table 3 and tried to use a polynomial to approximate the labeled data points.

During training the face–phone point pair affinity function, two loss functions were employed. One was the matrix Frobenius norm (shown in (1)) computed from the error matrix A (the matrix generated after each iteration minus the labeled affinity score matrix in Table 3), which was used to guide the affinity function to be of low overall error level. The other loss function was the matrix infinite norm (shown in (2)), computed as the maximum item of the error matrix (the matrix generated after each iteration minus the labeled affinity score matrix in Table 3), which guides the affinity function to have low error level for each labeled data point. The training iteration was stopped when the Frobenius norm was smaller than 10 and the infinite norm was smaller than 10. As for model selection, we tried polynomials from low orders to high orders. As long as the affinity function satisfied the stopping rule, we stopped trying higher order polynomials.

The resulting polynomial function is shown (3), and Figure 6 shows the visualization of the polynomial function and the fitting error. Although the affinity function is a 3-order polynomial, the function curve is quite smooth in the scope that it will be actually used, i.e., the timeDiff interval t being [0 min, 10 min], and geoDis interval s being [0 km, 1 km]. That is, in the scope that the affinity function will actually be used, the affinity function is of a generally low error level, but does not look like a complex 3-order polynomial function and does not fluctuate violently.

\begin{matrix} \begin{matrix} {| | A | |}_{F} = \sqrt[]{\sum_{i = 1}^{m} \sum_{j = 1}^{n} {| a_{i j} |}^{2}} \end{matrix} \end{matrix}

(1)

\begin{matrix} \begin{matrix} {| | A | |}_{\infty} =_{1 < i < n}^{m a x} | a_{i} | \end{matrix} \end{matrix}

(2)

\begin{matrix} \begin{matrix} A f f i n i t y s c o r e = 4.858 \times t - 27.590 \times s - 1.809 \times t^{2} \\ + 1.109 \times t \times s + 1.972 \times s^{2} + 0.083 \times t^{3} + 0.033 \times t^{2} \times s \\ - 0.080 \times t \times s^{2} - 0.037 \times s^{3} + 102.8 . \end{matrix} \end{matrix}

(3)

5.5. Clustering-Based Oversampling for Minority Class Algorithm and Its Necessity

To address the sample imbalance, we used the Clustering-based Oversampling for Minority class (COM) algorithm to produce new and accurate samples for training the classifiers listed in Table 7.

Table 7 shows the accuracy difference between using COM and without using COM. It can be seen that, for all the classification models listed in Table 7, the accuracy is remarkably improved by using COM. Although COM is simple, a method such as COM is necessary because the sample imbalance is an obstacle which must be confronted in our framework. Note that the results in Table 7 are based on experiments without fine adjustment.

The K-means clustering algorithm was used to cluster existing samples in COM. The number of clusters k needs to be specified generally based on experience in K-means clustering algorithm. We decided a good k according to the following process. A silhouette coefficient [40] was used to evaluate the quality of clustering, and we found the appropriate k value leading to a high silhouette coefficient. The silhouette coefficient

S C

was computed as (4) where

a (i) = \frac{1}{| C_{i} - 1 |} \sum_{m, n \in C_{i}, m \neq n} d (m, n)

is the average of the euclidean distances of all sample pairs inside each cluster

C_{i}

, which reflects the compactness of

C_{i}

;

b (i) = \underset{k = i}{m i n} \frac{1}{| C_{i} |} \sum_{m \in C_{i}, n \in C_{k}} d (m, n)

represents the average distance of all sample pairs where the two samples in each pair come from cluster

C_{i}

and the other

k - 1

clusters and reflects the degree of separation of

C_{i}

from the other clusters. The larger silhouette coefficient implies higher compactness within a cluster and higher separation among clusters and then better clustering quality. Figure 7 shows the value of the silhouette coefficient along with k from 2 to 10. As shown in Figure 7, the silhouette coefficient value is the largest when k is 2.

\begin{matrix} \begin{matrix} S C = \underset{0 < i \leq k}{m a x} \frac{b (i) - a (i)}{m a x {a (i), b (i)}} . \end{matrix} \end{matrix}

(4)

5.6. Training Set and Testing Set Construction

Before building classifiers, we construct positive and negative sample sets. The positive sample set consists of two parts: 1784 real matching face–phone trajectory pairs preselected by MGSTWS (part i), and 8000 augmented samples obtained by the CMO (part ii). The 9784 negative samples were obtained by randomly sampling from candidate sets obtained by MGSTWS, excluding the real matching face–phone trajectory pairs. Twenty percent of the real matching samples and twenty percent of the negative samples were used as test sets and these testing samples were never used during the training and validation process.

5.7. Face–Phone Trajectory Matching Determinator

To comprehensively evaluate the performance of different face–phone Trajectory Matching Determinators (FPTMDs), i.e., corresponding classification models, four measures (accuracy (A), precision (P), recall (R) and F1-score (F1)) were used to evaluate the classification results. A, P, R and F1 were calculated as (5), where TP is the number of true positive samples, i.e., the positive samples classified as true by a classifier; FP is the number of false positive samples, FN is the number of false negative samples and TN is the number of true negative samples.

\begin{matrix} \begin{matrix} A = & \frac{T P + T N}{T P + T N + F P + F N}, \\ P = \frac{T P}{T P + F P}, \\ R = \frac{T P}{T P + F N}, \\ F 1 = \frac{2 \times P \times R}{P + R} . \end{matrix} \end{matrix}

(5)

To find out suitable models for the data and our proposed features, we compared the performance of different FPTMDs, including (1) Random Forest model; (2) LightGBM model; (3) Classic Decision Tree; (4) K-nearest neighbor model (KNN); (5) Support Vector Machine (SVM) with a Gaussian kernel; (6) Multi-Layer Perceptron (MLP). The evaluation of these classifiers is shown in Table 8. It can be seen that LightGBM achieves the best performance with accuracy of 92.6% and precision of 96.9%. Receiver Operating Characteristic (ROC) curve (Figure 8) also shows that LightGBM achieves the best generalization performance, with the area under its own ROC curve being 0.98, which is the greatest among all the classification models. To more specifically understand the performance of LightGBM on the testing set, a confusion matrix is shown in Figure 9.

Table 8 shows that the tree-based models (LightGBM, Random Forest, and Decision Tree) and KNN perform better than SVM and MLP. The reason for the poorer performance of SVM (with Gaussian kernel) and MLP can be attributed to the small size of the training set as a main obstacle which we mean to overcome. This suggests that complex models are not satisfactory when a large number of high quality samples is not available.

In contrast, simple models (the tree-based models and KNN) perform better than complex models even with small training sets because they can take advantage of well-conducted feature engineering as we conducted in Section 4.3.1. Our elaborate feature engineering and LightGBM combined together contribute to the satisfactory performance. The better performance of KNN compared with SVM and MLP also benefits from the balance between positive and negative samples achieved by using COM since KNN needs to refer to the samples nearing the given samples to decide whether a given sample is true or false.

Some necessary details during training the classification models are summarized here. (1) Each model is finely adjusted respectively using tricks appropriate for the respective model; (2) 5-fold cross validation is adopted to avoid over-fitting; (3) An early warning mechanism is adopted. If the error on the training set decreases and the error on the verification set increases, the training process is stopped immediately.

6. Conclusions and Discussion

We proposed a framework to perform a face–phone trajectory matching practice. Our framework mainly focuses on designing features to catch an association between real matching face–phone trajectory pairs and differentiate the real matching face–phone trajectory pairs from the non-matching pairs.

The preselection-refining architecture guarantees the applicability and efficiency of the face–phone trajectory pair matching frame. For a given phoneID, the frame first includes only candidate phoneIDs with high possibilities of matching the faceID. Secondly, we developed discriminative and explainable features for face–phone trajectory matching. A discriminator takes these features and determines the (dis)association of faceID–phoneID pairs, which can be seen to perform excluding non-matching phoneIDs and further make sure which phoneID best matches a given faceID within the small candidate phoneID set. The above including-excluding training thought also stimulates us to develop the feature BGST as strong evidence of disassociation. The well-designed features guarantee the face–phone trajectory matching model’s simplicity, interpretability, and practicability.

Experiments show that our framework is effective for identifying the real matching phone for a given face. Among all the classifiers, LightGBM achieves the highest accuracy of 92.6% due to its ability to overcome data sparseness because LightGBM uses Exclusive Feature Bundling [34] combined with our well-designed features.

As mentioned above, our contribution mainly lies in the designation of meaningful features for the classifiers to decide whether a phoneID matches a given faceID. Among the six meaningful features, BGST is strong negative but not absolute evidence that a faceID–phoneID pair is a non-matching one. Leveraging BGST helps discriminate non-matching faceID–phoneID pairs even if these pairs do have some concurrence. A highly efficient algorithm such as BGSTS which rapidly searches BGST and computes the times that BGST happens needs to be developed. While computing values of features other than BGST, an affinity function (AF) based on polynomial fitting was used to quantify the possibility that a face–phone trajectory point pair is generated from the same pedestrian. This implementation with AF lowers the computation cost and simplifies on-spot practice. As a by-product of this work, we would like to draw a potentially useful and beneficial conclusion: a complex model can be used to capture the flexibility of data but the complexity can be reduced within the scope where the model is really used.

Our work provides a feasible framework for similar or related projects. However, in our work, some parameters (such as timeDiff scope

θ

, geoDis scope

β

inside MGSTWS algorithm) are determined by common sense and intuition and we performed many experiments to select the good parameters. It is more desirable that these parameters are determined in a more intelligent way. These subtasks may be modeled as combinatorial optimization; a genetic algorithm may be a good direction to explore.

Another interesting problem that may be worth more deep research is data augmentation. In this work, a Clustering-based Oversampling for Minority class (COM) algorithm is proposed to overcome sample imbalance and insufficient samples, which quite possibly arise in the face–phone pairing task. Unlike most scenarios where data augmentation is used, such as image recognition, trajectory data have not received much research attention and there does not exist a widely used method. So, more data augmentation methods concerning trajectory data are worth exploring.

Author Contributions

Z.D. and F.T. contributed equally to this work. Writing—original draft, Z.D. and F.T.; Writing—review and editing, Z.D. and F.T.; Conceptualization, methodology, supervision, H.Y.; Supervision, T.S. and D.R.; Software, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Natural Science Foundation Project (61070243), the Open Project Program Foundation of the Key Laboratory of Opto-Electronics Information Processing, Chinese Academy of Sciences (OEIP-O-202009), Guizhou High-level Talent Research Project (TZJF-2010-048), National Key R&D Plan Project. No.2020YFF0304903 and 2020YFF0304902 (Development and application demonstration system with interactive holographic guidance method under emergence situations at Olympic and Paralympic Winter Games sites).

Data Availability Statement

Access to data used in this study requires security clearance and may be available on request from the corresponding author with authorization.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zheng, K.; Zheng, Y.; Yuan, N.J.; Shang, S.; Zhou, X. Online discovery of gathering patterns over trajectories. IEEE Trans. Knowl. Data Eng. 2013, 26, 1974–1988. [Google Scholar] [CrossRef]
Liu, H.; Jin, C.; Zhou, A. Popular route planning with travel cost estimation from trajectories. Front. Comput. Sci. 2020, 14, 191–207. [Google Scholar] [CrossRef]
Jenelius, E.; Koutsopoulos, H.N. Travel time estimation for urban road networks using low frequency probe vehicle data. Transp. Res. Part B Methodol. 2013, 53, 64–81. [Google Scholar] [CrossRef] [Green Version]
Luo, W.; Tan, H.; Chen, L.; Ni, L.M. Finding time period-based most frequent path in big trajectory data. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 22–27 June 2013; pp. 713–724. [Google Scholar]
Zhang, D.; Li, N.; Zhou, Z.H.; Chen, C.; Sun, L.; Li, S. iBAT: Detecting anomalous taxi trajectories from GPS traces. In Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, 17–21 September 2011; pp. 99–108. [Google Scholar]
Yuan, J.; Zheng, Y.; Xie, X. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 86–194. [Google Scholar]
Zheng, Y.; Liu, T.; Wang, Y.; Zhu, Y.; Liu, Y.; Chang, E. Diagnosing New York city’s noises with ubiquitous data. In Proceedings of the 2014 ACM International joint Conference on Pervasive and Ubiquitous Computing, Seattle, WA, USA, 13–17 September 2014; pp. 715–725. [Google Scholar]
Fu, Y.; Xiong, H.; Ge, Y.; Yao, Z.; Zheng, Y.; Zhou, Z.H. Exploiting geographic dependencies for real estate appraisal: A mutual perspective of ranking and clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 1047–1056. [Google Scholar]
Kothari, P.; Kreiss, S.; Alahi, A. Human trajectory forecasting in crowds: A deep learning perspective. IEEE Trans. Intell. Transp. Syst. 2021, 23, 7386–7400. [Google Scholar] [CrossRef]
Bahari, M.; Nejjar, I.; Alahi, A. Injecting knowledge in data-driven vehicle trajectory predictors. Transp. Res. Part C Emerg. Technol. 2021, 128, 103010. [Google Scholar] [CrossRef]
Vilk, O.; Aghion, E.; Nathan, R.; Toledo, S.; Metzler, R.; Assaf, M. Classification of anomalous diffusion in animal movement data using power spectral analysis. J. Phys. A Math. Theor. 2022, 55, 334004. [Google Scholar] [CrossRef]
Quddus, M.A.; Ochieng, W.Y.; Noland, R.B. Current map-matching algorithms for transport applications: State-of-the art and future research directions. Transp. Res. Part C Emerg. Technol. 2007, 15, 312–328. [Google Scholar] [CrossRef] [Green Version]
Su, H.; Liu, S.; Zheng, B.; Zhou, X.; Zheng, K. A survey of trajectory distance measures and performance evaluation. VLDB J. 2020, 29, 3–32. [Google Scholar] [CrossRef]
Zheng, Y. Trajectory data mining: An overview. ACM Trans. Intell. Syst. Technol. (TIST) 2015, 6, 1–41. [Google Scholar] [CrossRef]
Sanderson, A.C.; Wong, A.K. Pattern trajectory analysis of nonstationary multivariate data. IEEE Trans. Syst. Man Cybern. 1980, 10, 384–392. [Google Scholar] [CrossRef]
Papadias, D.; Zhang, J.; Mamoulis, N.; Tao, Y. Query processing in spatial network databases. In Proceedings of the 2003 VLDB Conference, Berlin, Germany, 9–12 September 2003; Elsevier: Amsterdam, The Netherlands, 2003; pp. 802–813. [Google Scholar]
Pelekis, N.; Theodoridis, Y. Preparing for Mobility Data Exploration. In Mobility Data Management and Exploration; Springer: New York, NY, USA, 2014; pp. 121–141. [Google Scholar]
Veltkamp, R.C.; Latecki, L.J. Properties and performance of shape similarity measures. In Data Science and Classification; Springer: Berlin/Heidelberg, Germany, 2006; pp. 47–56. [Google Scholar]
Yi, B.K.; Jagadish, H.V.; Faloutsos, C. Efficient retrieval of similar time sequences under time warping. In Proceedings of the IEEE 14th International Conference on Data Engineering, Orlando, FL, USA, 23–27 February 1998; pp. 201–208. [Google Scholar]
Chen, L.; Özsu, M.T.; Oria, V. Robust and fast similarity search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD, USA, 14–16 June 2005; pp. 491–502. [Google Scholar]
Vlachos, M.; Kollios, G.; Gunopulos, D. Discovering similar multidimensional trajectories. In Proceedings of the IEEE 18th International Conference on Data Engineering, San Jose, CA, USA, 26 February–1 March 2002; pp. 673–684. [Google Scholar]
Wan, W.; Cai, M. Phone-vehicle trajectory matching framework based on ALPR and cellular signalling data. IET Intell. Transp. Syst. 2021, 15, 107–118. [Google Scholar] [CrossRef]
Gong, X.; Huang, Z.; Wang, Y.; Wu, L.; Liu, Y. High-performance spatiotemporal trajectory matching across heterogeneous data sources. Future Gener. Comput. Syst. 2020, 105, 148–161. [Google Scholar] [CrossRef]
Iwana, B.K.; Uchida, S. An empirical survey of data augmentation for time series classification with neural networks. PLoS ONE 2021, 16, e0254841. [Google Scholar] [CrossRef] [PubMed]
Wen, Q.; Sun, L.; Yang, F.; Song, X.; Gao, J.; Wang, X.; Xu, H. Time series data augmentation for deep learning: A survey. arXiv 2020, arXiv:2002.12478. [Google Scholar]
Zhang, R.; Wu, J.; Shao, M.; Li, B.; Lu, Y. Transient stability prediction of power systems based on deep belief networks. In Proceedings of the 2018 2nd IEEE Conference on Energy Internet and Energy System Integration (EI2), Beijing, China, 20–22 October 2018; pp. 1–6. [Google Scholar]
Blagus, R.; Lusa, L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 2013, 14, 106. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing, Proceedings of the International Conference on Intelligent Computing, ICIC 2005, Hefei, China, 23–26 August 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 878–887. [Google Scholar]
He, H.; Bai, Y.; Garcia, E.; Li, S.A. Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE World Congress On Computational Intelligence, Hong Kong, China, 1–6 June 2008. [Google Scholar]
Yuan, N.J.; Zheng, Y.; Xie, X.; Wang, Y.; Zheng, K.; Xiong, H. Discovering urban functional zones using latent activity trajectories. IEEE Trans. Knowl. Data Eng. 2014, 27, 712–725. [Google Scholar] [CrossRef]
Han, J.; Pei, J.; Tong, H. Data Mining: Concepts and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2022. [Google Scholar]
Weiss, G.M.; Provost, F. Learning when training data are costly: The effect of class distribution on tree induction. J. Artif. Intell. Res. 2003, 19, 315–354. [Google Scholar] [CrossRef] [Green Version]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Quinlan, J.R. Simplifying decision trees. Int. J. Man-Mach. Stud. 1987, 27, 221–234. [Google Scholar] [CrossRef] [Green Version]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Gardner, M.W.; Dorling, S. Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmos. Environ. 1998, 32, 2627–2636. [Google Scholar] [CrossRef]
Denoeux, T. A k-nearest neighbor classification rule based on Dempster-Shafer theory. In Classic Works of the Dempster-Shafer Theory of Belief Functions; Springer: Berlin/Heidelberg, Germany, 2008; pp. 737–760. [Google Scholar]
Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef] [Green Version]
Aranganayagi, S.; Thangavel, K. Clustering categorical data using silhouette coefficient as a relocating measure. In Proceedings of the IEEE International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), Sivakasi, India, 13–15 December 2007; Volume 2, pp. 13–17. [Google Scholar]

Figure 1. The framework of heterogeneous face–phone trajectory matching.

Figure 2. The face–phone trajectory matching problem (the ’**’ hides the true values for legal considerations).

Figure 3. Example of Noise point in a trajectory.

Figure 4. Example of how MGSTWS algorithm works (change of

C d d t_S e t

at 3 time points (a–c)).

Figure 4. Example of how MGSTWS algorithm works (change of

C d d t_S e t

at 3 time points (a–c)).

Figure 5. Visualization of affinity score of face–phone trajectory point pair.

Figure 6. Visualization of affinity function of face–phone trajectory point pair.

Figure 7. Silhouette coefficient with different values of k.

Figure 8. ROC curves of different models.

Figure 9. Confusion matrix of LightGBM.

Table 1. Notation and description.

Notations	Description
$F T D$	Face trajectory data set
$P T D$	Mobile phone trajectory data set
$P I D$	Unique phoneID of a mobile phone
$F I D$	Unique faceID of a face
$t r j_f^{j}$	A trajectory in $F T D$ , sometimes with identification number j
$t r j_p^{j}$	A trajectory in $P I D$ , sometimes with identification number j
$t r j_p_{i}^{j}$	$i^{t h}$ trajectory point in $t r j_p^{j}$
$t r j_f_{i}^{j}$	$i^{t h}$ trajectory point in $t r j_f^{j}$
$l g t d$	Longitude of a trajectory point i (wgs84)
$l t t d$	Latitude of a trajectory point i (wgs84)
$t s m p$	Timestamp of a trajectory point i

Table 3. Labeled matrix of affinity score for a face–phone trajectory point pair.

	0 min	1 min	2 min	3 min	4 min	5 min	6 min	7 min	8 min	9 min	10 min
GeoDis	0 min	1 min	2 min	3 min	4 min	5 min	6 min	7 min	8 min	9 min	10 min
0 m	100	100	90	85	90	75	70	65	60	55	10
100 m	100	100	90	85	90	75	70	65	60	55	50
200 m	80	80	80	70	60	55	50	45	40	50	50
300 m	10	10	70	60	55	50	40	40	30	40	40
400 m	0	0	50	40	50	30	30	30	20	20	30
500 m	0	0	10	35	40	10	20	20	10	0	10
600 m	0	0	0	20	20	0	10	8	5	0	0
700 m	0	0	0	10	10	0	0	5	0	0	0
800 m	0	0	0	0	0	0	0	0	0	0	0
900 m	0	0	0	0	0	0	0	0	0	0	0
1000 m	0	0	0	0	0	0	0	0	0	0	0

Table 4. Statistical properties of the lengths in trajectory data sets.

Trajectory Dataset	Number of Trajectories	Avg	Max	Min	Median
$F T D$	4040	8	165	2	2
$P T D$	748,646	27	4372	1	2

Table 5. Statistical properties of the number of geographical locations.

Dataset	Total	Avg	Max	Min	Median
$F T D$	22	2.5	9	1	2
$P T D$	22	3	22	1	5

Table 6. The influence of noise filtering on the accuracy of face–phone trajectory matching.

Model	Accuracy without Noise Filtering(%)	Accuracy for Using Noise Filtering(%)
Random Forest	71.2	77.4
LightGBM	80.5	84.8
SVM	71.3	75.0
Decision Tree	72.9	73.0
MLP	66.5	69.0
KNN	77.3	81.1

Table 7. Improvement of accuracy of classification model by using COM.

Model	Accuracy without Using Augmentation(%)	Accuracy Using Augmentation(%)
Random Forest	75.0	83.5
LightGBM	81.3	90.5
SVM	72.7	74.1
Decision Tree	72.4	86.7
KNN	66.0	88.5
MLP	65.9	73.6

Table 8. Evaluation of face–phone trajectory matching determinators.

Model	A(%)	P(%)	R(%)	F1(%)
LightGBM	92.6	96.9	88.5	92.5
Random Forest	84.4	93.1	75.9	83.6
SVM	74.9	96.1	54.3	69.4
Decision Tree	87.3	88.6	87.1	87.8
KNN	89.6	93.3	86.3	89.6
MLP	73.6	71.4	81.5	76.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, Z.; Tian, F.; Yang, H.; Sun, T.; Zhang, W.; Ruan, D. A Framework with Elaborate Feature Engineering for Matching Face Trajectory and Mobile Phone Trajectory. Electronics 2023, 12, 1372. https://doi.org/10.3390/electronics12061372

AMA Style

Dong Z, Tian F, Yang H, Sun T, Zhang W, Ruan D. A Framework with Elaborate Feature Engineering for Matching Face Trajectory and Mobile Phone Trajectory. Electronics. 2023; 12(6):1372. https://doi.org/10.3390/electronics12061372

Chicago/Turabian Style

Dong, Ziqi, Furong Tian, Hua Yang, Tao Sun, Wenchuan Zhang, and Dan Ruan. 2023. "A Framework with Elaborate Feature Engineering for Matching Face Trajectory and Mobile Phone Trajectory" Electronics 12, no. 6: 1372. https://doi.org/10.3390/electronics12061372

APA Style

Dong, Z., Tian, F., Yang, H., Sun, T., Zhang, W., & Ruan, D. (2023). A Framework with Elaborate Feature Engineering for Matching Face Trajectory and Mobile Phone Trajectory. Electronics, 12(6), 1372. https://doi.org/10.3390/electronics12061372

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu