5.1 Hybrid AE Models Training and Anomaly Detection
To train our hybrid models, we first represent the time series data set of the selected features into a sequence of time windows (Tw), where the selected features of all traps at time step t of each time window can be depicted as X1, t, X2, t, …, Xk, t, where, k = 1,…, total features of all traps.
We explored various "look-back periods" or time window sizes (10, 50, 100, 200, and 500) to train our AE models. Given the abrupt variations in our feature dataset’s short time steps and the 200 groups of sequences, each with 902 time steps, larger windows proved inefficient and unwieldy. Hence, we settled on a time window size ((Tw) = 20) to train our model.
We split the dataset from Section 3.4 into training (77 normal samples: 55 NAIVE, 22 GSPAT) and test sets, which contain both normal (20 NAIVE, 10 GSPAT) and abnormal samples (15 NAIVE, 78 GSPAT). Given the differing scales of the selected features, we apply the linear Min-Max scaling method, transforming the feature values as,
Here,
X ranges between [0-1], with min
normal and max
normal denoting the training dataset’s minimum and maximum values. Utilizing 5-fold cross-validation [
40], we train hybrid models on sequential data
X, encoding with LSTM, GRU, or VAE, then reconstructing to
\(\mathbf {\hat{X}}\). The aim is minimizing the mean squared reconstruction error (
\(R_L = \text{MSE}(X - \hat{X})\)).
Utilizing trained models, we predicted on the test dataset, leveraging the reconstruction error threshold
η to distinguish sequence types. To select the proper threshold, we determine it with the F-score as the metric of choice[
41]. For our detection purpose, we prioritize Precision (finding as many anomalies as possible) and Recall (minimizing the missed anomalies) as our evaluation metrics, since F-Score usually makes a balance between Precision and Recall, also useful when the distribution of class is imbalanced. So we adopt the F-score as our metric to select the threshold to distinguish between normal and abnormal. In K-fold cross-validation, we assessed variability and uncertainty in predictions of each fold, displaying performance as error bars on F-score values. Our analyses, shown in Figure
6, reveal that a 90% - 99% threshold range saw consistent performance, with mean F-scores approximately between 0.80 and 0.90.
We opt for the LSTM AE hybrid model to enhance stability and improve dynamic levitation. With a 92% threshold, the model achieves a mean F-score of 0.9 (with corresponding Precision: 86%, Recall:95%), demonstrating roughly 90% accuracy in detecting anomalies on the test dataset. In our test set, most of the actual abnormal groups (88 groups) are correctly predicted as abnormal (true positive), while a few (5 groups) fall into a false negative category. Some actual normal groups (14 groups) are predicted as abnormal (false positive), and 16 actual normal groups are predicted as normal (true negative). Among the true positive groups, we present a few examples where feature anomalies precede actual particle drop events (e.g., large position displacement captured by the camera) in Figure
7. Notably, a single anomaly step does not necessarily lead to one (or more) particle drops and we often observe an accumulation of anomalies before drop events, as indicated by the red dashed lines in Figure
7.
5.2 Stability Enhancement
Here, we present a few anomalous groups (46, 61, 62, and 67 in Figure
8,
9) as examples and report the stability enhancement through the amplitude amendment approach proposed in Section 4.3.
First, we used the processed tracking trajectories in our dataset and compared them to the target trajectories, identifying which particles dropped (i.e., particle 2 in Figure
8). Also with the anomaly time regions, when setting the phase retrieval solver parameters, we change the unstable trap’s target amplitude to higher than other previous stable particles (traps). Note that we gradually increase the target amplitude and find a proper increase. Here, with 30% of amplitude enhancement at predicted anomalous regions, we repeatedly ran the same trajectory for consecutive 3 times as we did in Section 3, and particle 2 in both groups
46 and
61 arrived at the endpoint of the trajectory without dropping.
In our examination of amplitude modifications for groups
62 and
67, we noted that exclusively increasing the target amplitude of the unstable trap does not reliably enhance stability. Across three repeated tests of a few groups, we found it is uncertain whether a particular particle will consistently drop, complicating the identification of the problematic trap. However, during this period of anomaly, we have the option to either reduce the amplitude of the stronger trap, increase the amplitude of the relatively weaker one, or employ a combination of both strategies to comprehensively address this instability. In group
62 (see Figure
9), after lowering the trap amplitude of particle 4 by 20%, no drop occurred. Likewise, by lowering the trap amplitude of particle 1 and increasing the trap amplitude of particle 2 by 20% within the suggested anomaly time region, we prevented the drop from happening in group
67.