Introduction

Eye tracking has a variety of application areas as well as research areas in today’s world. In human–machine interaction it is used as a new input signal [1, 2], in computer graphics as a constraint for the area to be rendered [3, 4], in medicine using more eye features as data for self-diagnosis systems [5,6,7] or to measure the eyes [8, 9], in the field of psychology it is used to detect neurological diseases, such as Alzheimer’s [10, 11], in behavioral research to evaluate expertise as well as train students [12, 13], and many more like driver monitoring in autonomous driving [14, 15] etc. This wide range of research and application areas requires that new eye features as well as more robust algorithms be easily and quickly deployable by everyone. As eye tracking itself is the focus of research and new more robust or accurate algorithms are constantly being published as well as new features, properties or signals, such as pupil dilation are added, it is necessary that everyone has quick access to it to improve their products or integrate it in their research.

This poses a problem for the industry, since it must be determined long before the final product which functionalities and features will be integrated into the software [16]. If new features are added, this is usually postponed until the next generation of eye trackers. This is due to the fact that the industry’s software must be very reliable and everything should be tested multiple times, as well as automated test cases for the new parts of the software must be integrated [17]. Further testing must be done with other integrated components, and the software must continue to be compatible with external software components used or developed by industry partners [18]. Another important point for the industry is the software architecture as well as also the quality of the source code. New parts must adhere to the software architecture as well as also be written in a clean and understandable way. This also delays the integration of new components, since the source code must also be checked.

Research groups themselves do not have these problems, since the software architecture is usually created and extended dynamically, which however also leads to a poorer source code quality [19]. Likewise, research groups have the possibility to accomplish theses for the advancement of the software, whereby no costs develop. In the case that a research group develops, such software, many new algorithms are already available in working form, which makes integration much easier and faster. In addition, research groups do not have contractual partners, so they do not have to maintain the output formats and integrated interfaces in the software in a prescribed format for many years.

The Pistol tool extracts a variety of eye features, such as pupil ellipse, iris ellipse, eyelids, eyeball, vision vectors, and eye-opening degree. These are needed in many applications such as fatigue or attention determination or can serve as new features in research, however these features are not provided by every eye tracker manufacturer.

We collected eye tracking recordings from different tasks (Sports, playing computer, hiking, and card games), two eye trackers (Pupil Invisible and Look!) and four subjects (Two male and two female). All recordings are annotated using the semi supervised approach, MAM [20]. We evaluated Pistol [21] on the collected data since it supports multiple eye trackers to showcase the applicability of the tool.

Related Work

Fig. 1
figure 1

Workflow of Pistol. In gray are the data sources, and the single processing steps are in color. Each arrow corresponding to a data dependency is colored in the same color as the processing step

In the field of eye tracking and feature extraction, there is a wide range of related work. This is due, on the one hand, to the industry, which provides a wide range of eye trackers at different prices and with different features. On the other hand, it is due to the ever-growing research community and the growing application fields for eye tracking.

From the industry there are for example the manufacturers Tobii [22], ASL [23], EyeTech [24], SR Research [25], Eyecomtec [26], Ergoneers [27], Oculus [28], Pupil Labs [29], iMotions [30], VIVE [31] and many more. The eye trackers differ in frame rate and in the features used for eye tracking, each having its advantages and disadvantages. For a good overview of more details, please refer to the manufacturer pages as well as survey papers that deal with this overview [32,33,34,35].

Science itself, has of course produced some eye trackers and systems for gaze calculation. The first one mentioned here is the EyeRec [36] software, which can be used for online eye tracking for worn systems. It has several built-in algorithms and uses a polynomial fitting to the optical vector of the pupil. Another software dealing with pupillometry is PupilEXT [37]. This is a highly accurate extraction of the pupil shape, which can also only be performed under severe limitations and with high-resolution cameras. The software OpenEyes [38] offers a hardware and software solution in combination. The algorithm used for pupil detection is Starburst. For the Tobii Glasses there is also a tool [39], which allows analyzing the images offline and to apply algorithms from science to the data. For Matlab there is also a freely available interface to use Tobii Eye Tracker directly [40]. For Tobii there also exists a custom software to improve the accuracy of the eye tracker [41]. To distribute the data of a Gazepoint Eye Tracker over a network, there is also a tool from science [42]. For studies with slide shows, there is the software OGAMA [43]. This can be used to record mouse movements and gaze signal and then analyze them, as well as create visualizations. GazeCode [44] is a software that allows mapping eye movements to the stimulus image. The software was developed to speed up the processing time for mapping and to improve the usability compared to commercial software, such as Tobii Pro Lab.

The distinguishing features of our software from existing ones is that we output a variety of other features, such as pupil, iris, eyelids, eye-opening, pupil vector, iris vector, eye movements and different methods for gaze estimation. In addition, we determine the 3D gaze point position and support a variety of eye trackers. Another important feature of our tool is, that each feature is exported together with a video, which showcases the quality.

Description of Pistol

Pistol is executed by calling the program with a path to the Pupil Invisible project. Then you specify the recording to be processed (psX or just X after the usage of the Pupil Player) and Pistol starts the detections and calculations. In addition, you can specify the range of marker detection to be used for calibration (based on the scene video frames). If you do not specify this range, all detected markers will be used. Pistol also supports to specify a polynomial (or exponential function) and neural network architectures for the gaze estimation. For the eye ball computation, the window size to compute the eye ball can be specified as well as two different approaches can be used (Direct on one ellipse or the eye ball computation over multiple ellipses). If pistol is not used together with Pupil Invincible eye tracker recordings, the path to the videos has to be selected (Eye and scene videos).

Figure 1 shows the processing flow of Pistol and the data dependencies of each step. The gray boxes represent data that either exists in the project or was generated by a calculation step. Each calculation step in Fig. 1 has a unique color, with which the data dependencies are also marked. At the end of each calculation step, the data are saved to a CSV file and a debugging video is generated to check the result.

In the following, we describe each processing step of our tool in detail. Some sections such as the pupil, iris, and eyelid detection are combined since the fundamental algorithmic are similar.

Pupil, Iris, and Eyelid Detection

Fig. 2
figure 2

Exemplary detections of the pupil, iris, and eyelid for one subject. The first four images are from the left eye and the last four images are from the right eye

In Fig. 2, we show some results of pupil, iris, and eyelid detection. For detection, we use small DNNs with maximum instead of residual connections [45] as well as tensor normalization and full distribution training [46] to detect landmarks. Using the maximum connections allows us to use smaller DNNs with the same accuracy. The full distribution training makes our DNNs more robust, and we also need less annotated data to train them. Tensor normalization increases accuracy and is more stable than batch normalization. In addition, we use landmark validation as well as batch balancing from [47] to evaluate the accuracy of the landmarks at the pixel level and discard detections if the inaccuracy is too high. To obtain the ellipses for the pupil and iris from the landmarks, we use the OpenCV [48] ellipse fit. The shape of the upper and lower eyelid is approximated using cubic splines. The architecture of our models as well as the training details are in the supplementary material.

Eye Opening Estimation

Fig. 3
figure 3

Exemplary images for the left (first 4) and right (last 4) eye of one subject. We compute the maximal distance along the vector between the eye corners to compensate for the off axial camera placement

Figure 3 shows some results of our eye-opening degree calculation. The eye-opening is calculated using an optimization procedure and is always oriented orthogonally to the vector between the eye corners. Without the constraint of this orthogonality (simple selection of the maximum over all minimum distances of the points on the eyelid curves), the results are very erratic, because due to the perspective of the camera, parts of the eyelids are not completely visible. In addition, the orthogonality to the vector between the eye corners is what is expected in a frontal view of the eye.

Let \(P_\text {up}\) be the set of points of the upper eyelid, \(P_\text {dw}\) be the set of points of the lower eyelid, and \(\overrightarrow{C}\) be the vector between the corners of the eye:

$$\begin{aligned}{} & {} {\text {Opening}}(P_\text {up},P_\text {dw}, \overrightarrow{C}) = \mathop {\mathrm {arg\,max}}\limits \mathop {\mathrm {arg\,min}}\limits |\overrightarrow{ud}|,\nonumber \\{} & {} \quad u \in P_\text {up},~d \in P_\text {dw},~\overrightarrow{ud} \perp \overrightarrow{C}, \end{aligned}$$
(1)

Equation 1 shows our optimization to compute the eyelid opening, where u and d are selected elements of \(P_\text {up}\) and \(P_\text {dw}\) with the side condition, that the vector between the two is orthogonal to the vector between the corners of the eye, i.e. \(\overrightarrow{ud} \perp \overrightarrow{C}\). For a frontal image this additional side condition is usually not necessary, but for the images from the pupil invisible the results without this side condition are unstable and not always oriented correctly regarding the eye.

Eye Ball, Pupil & Iris Vector Estimation

Fig. 4
figure 4

Exemplary images for the eyeball as well as iris and pupil vector. The eyeball is drawn in red, the iris vector in blue, and the pupil vector in green

To calculate the eyeball and the optical vectors, we use the neural networks of [49]. Here, several pupil ellipses are given into the neural network from which the eyeball radius and the eyeball center are calculated. As neural network, we used a network with one hidden layer consisting of 100 neurons. To calculate the optical vectors, we calculated the vector between the center and each of the pupil center and the iris center and converted it to a unit vector (Fig. 4).

Basically the eyeball can be computed continuously in a window with this method where we sample only once over all ellipses and select the 100 most different ones. The user of the software can specify the window size which is used to compute the eyeball. We also integrated the second approach from [49], which allows computing an orthogonal vector from a single ellipse using a neural network, which can then be used to continuously adjust the eyeball model. The choice of the method to be used is selected via a parameter when calling Pistol.

Eye Movement Detection

Fig. 5
figure 5

Exemplary detections for all eye movement types, Fixation, Saccade, Blink, and smooth pursuit. All images are from the left eye of the subject. In addition, if no feature is detected or only the eyelids are valid, but the eye is still open, the eye movement type will be marked as error. The four smooth pursuit (SM) images are from four consecutive frames with a distance of 25 frames (Eye cameras have 200FPS). The two saccade images are from two consecutive frames with a distance of 10 frames(Eye cameras have 200FPS)

For the detection of eye movements, we use the angles between the pupils and iris vectors as well as the difference of the eye-opening distance (Fig. 5). For classification, we use a neural network with a hidden layer and 100 neurons, as well as the softmax loss. The network was trained on the annotated data of 12 subjects. In addition, we use the validity of the extracted eye features to classify errors.

Pistol supports also the fully convolutional approach from [50] to segment the motion trajectories. This model is pretrained on TEyeD [51] and can be used to compute the eye movements for other eye trackers, such as the Dikablis, Emke GmbH, Look, Pupil, and many more, which are included in the TEyeD data set. This is necessary because there are significant differences in the eye trackers due to different camera placement and perspective distortion. We decided to provide a small separate neural network since with the Pupil Invisible Eye Tracker, the classification is not a linear function, as eye movements near the nose have significantly smaller distances compared to areas of the eyes that are much closer to the camera. This is due to the different depth and of course the perspective distortion of the camera lens.

Marker Detection

Fig. 6
figure 6

Exemplary images of the marker detection. The yellow dot is the estimated center. Zoom in to sea more details

Figure 6 shows results of our marker detection, where the center is shown as a yellow dot. Our marker detection must be able to detect the marker over different distances and should calculate the center as accurately as possible. To accomplish this in a reasonable time, we decided to use two DNNs. The first DNN gets the whole image scaled to \(400 \times 400\) pixel. From this, the DNN generates a heatmap with a resolution of \(50 \times 50\) pixel. The maxima in this heatmap are then used as the starting position for the second DNN, which extracts a \(120 \times 120\) pixel area from the original image and performs landmark detection with validation from [47] here. With the validation signal, we again reject marker positions which are too inaccurate. The architecture of our models as well as the training details are in the supplementary material.

2D & 3D Gaze Estimation

Fig. 7
figure 7

Exemplary images of the 2D gaze estimation for the left (first 2) and right (second 2) eye, and exemplary images of the 3D gaze estimation for both eyes (last 4). The iris center, pupil center, iris vector, and pupil vector with neural networks and Levenberg Marquart polynomial fitting are drawn in different colors. In addition, each scene frame as \(\approx 6\) gaze points based on the frame rate of the eye camera. Zoom in to sea more details

For the determination of the calibration function, we use the Levenberg Marquart optimization and neural networks with two hidden layers (50 and 20 neurons). The polynomial and the neural network architecture can be changed with additional parameters of Pistol. It is also possible to use exponential functions instead of the polynomial. This is due to the fact that Pistol supports different eye trackers, and both methods are able to learn more complex functions than simple polynomials. An example of this can already be found in the depth estimation, where the parameters are in the exponent of the function and thus cannot be determined via a direct computation method.

For the 2D eye tracking we fit a neural network and a polynomial with the Levenberg Marquart method to the pupil center, iris center, pupil vector, and iris vector. This is done separately and also for each eye independently. This means that people with an ocular incorrect eye position or only one functioning eye can also be measured. In the case of 3D eye tracking, we use the data from both eyes simultaneously. This means that we fit a neural network and a polynomial with the Levenberg Marquart method to both pupil centers, iris centers, pupil vectors, and iris vectors. Overall, the program therefore computes 8 gaze point estimators per eye (Total 16) and additionally 8 for both eyes in combination. All of these estimated gaze points can be used separately and are written to the csv files.

The selection of the calibration data is done in three steps. First, each scene image with a marker detection is assigned only one eye image for each eye. This assignment is done by the minimum distance of the time stamps. In the second step, we check that only scene images are used for which it is true that there are 5 valid marker detections in the preceding and following scene images. In the third step, we use the Levenberg Marquart method together with a polynomial. Based on the error of the gaze positions to the marker position, the best 90% are selected and the remaining 10% are discarded.

The training parameters for the neural networks are initial learning rate of 0.1, optimizer SGD, momentum 0.9, and weight decay 0.0005. Each network is trained 4 times for 2000 epochs, with batch size equal to all input data. After 2000 epochs, the learning rate is reduced by a factor of \(10^{-1}\). For the Levenberg Marquart method, we use the delta stop strategy with a factor of \(1^{-10}\) and a search radius of 10.0. For the neural networks and the Levenberg Marquart method, all data are normalized. This means that the pupil and iris centers are divided by the resolution in x and y direction, as well as the eyeball centers for the pupil and iris vectors. The vectors themselves are already unit vectors (Fig. 7).

Depth Estimation

Fig. 8
figure 8

Image of the marker area to depth estimation recording (Flipped by 90°). The brown markers on the floor have 50 cm distance to each other and the subject had to stand in front of the line for one measurement sample

For the determination of the depth, we have considered several methods. One is the fitting of polynomials and complex functions to the vectors of the pupil and the iris, as well as to the angle between both vectors. In addition, we tried to determine the depth geometrically. In all cases there were significant errors, because the vectors are not linear to each other due to the perspective distortion of the camera, the non-linear deep distribution of the eye image and also due to the head rotation. In the case of the head rotation, it comes to the eye rotation around the z axis as well as it also depends on the viewing angle to different eye positions comes, which are not linear to a straight view. Since neither neural networks nor complex functions worked for us, we decided to use the K nearest neighbor method (KNN) with \(k=2\). This method gave the best results in our experiments and a good accuracy (except for a few centimeters).

In order to determine the marker depth and thus obtain our calibration data for the KNN, we performed trial measurements with the marker and two people. The setup can be seen in Fig. 8. The brown strokes on the floor are plotted at 50 cm intervals, and both subjects took measurements at each stroke. We then used the Matlab curve fitting toolbox to determine the best function and parameters for them.

$$\begin{aligned} {\text {d}}({\text {A}}) = a * {\text {A}}^{b} + c \end{aligned}$$
(2)

The best function can be seen in Eq. 2, where A is the area of the marker and \({\text {d}}(A)\) is the depth in centimeters. The first thing to notice is that the parameter b is in the exponent, which makes determining the parameter not easy to solve directly. The parameters we determined for the function are \(a=13550.0f\), \(b=-0.4656\), and \(c=-18.02\). We use these parameters and function to determine the depth of the marker from the marker area using the fine detector.

Evaluation

In this section, we first describe the used eye trackers and the collected data. After the description we evaluate Pistol on the recordings together with other state-of-the-art algorithms from the eye tracking domain.

Eye Trackers

Since Pistol supports multiple eye trackers, we used the ones we have to collect the data. We had to omit the Tobii eye tracker since it does not allow us to access the image data which is required to run Pistol. In total, we could therefore use only two eye trackers, the Pupil Invisible Eye Tracker and the eye tracker from Look!.

Fig. 9
figure 9

Image of the Look! eye tracker on the left and the Pupil Invisible eye tracker on the right. The images are taken from https://eyetracking-research.de/ and https://www.hannesgeipel.com/pupil-invisible

The two eye trackers are shown in Fig. 9. The Pupil Invisible is a lightweight eye tracker which is especially useful in mobile studies. The glasses of the eye tracker can be replaced by lenses with different visual magnifications. This is a very good solution for people with different visual acuity, but still not perfect since those people are usually used to their own glasses. The Look! eye tracker in contrast allows the user to wear its own glasses which is a more comfortable solution. The disadvantage of the Look! eye tracker for people with glasses is that the eye cameras are outside and therefore have to record the eye through the glasses. This raises some challenges since glasses are made as less reflective to normal light as possible, but not in the near infrared spectrum. Therefore, the images will be full of reflections, since the sunlight contains also the near infrared spectrum. For us, those eye trackers count to the state-of-the-art in mobile head mounted eye tracking together with the Tobii head mounted eye trackers. As mentioned before, we could not use the Tobii eye tracker since it does not allow us to access the eye images and therefore Pistol is not applicable there as well as it is impossible to validate the results of their software without having the fundamental data which are the eye images.

Fig. 10
figure 10

Eye images of the Look! eye tracker in the top row and the Pupil Invisible eye images in the bottom row

Figure 10 shows some images from the Look! and the Pupil Invisible eye tracker. As can be seen, both eye trackers have problems with images where the optical vector comes close to the nasal area. In this region the gaze estimation has to rely on one eye image alone since in the other eye image the pupil is nearly not present. In addition, the Pupil Invisible eye tracker has a stronger near infrared illumination which is visible through the whiter skin color in the near infrared images. The Look! eye tracker cameras are also placed less off axial, which is good in terms of image processing, but they also hinder the visual field of the participant slightly.

Data in the Wild

We collected eye tracking data from four participants in real world settings. The task which are performed by all four participants with both eye trackers are playing the UNO card game, hiking and sport. The task playing a PC game was performed only by two of the participants. We decided to select those four tasks since playing card or computer games requires some sort of expertise in the filed. In eye tracking, it is an interesting scientific topic to reveal the expertise of an expert and transfer it to a novice. Therefore, those two tasks were selected and will server in further research regarding expertise classification. The two challenging tasks are sports and hiking. For sports, the major challenge is the movement of the eye tracker and the induced motion blur trough this movement. Three of the four participants selected jogging as sport and one selected strength and power endurance sport with weights. In hiking, the main challenge is the changing light conditions due to the sunlight passing the leafs of the trees. This leads to different lightning conditions everywhere in the eye images. For the Pupil Invisible we removed the glasses so that the conditions between the Look! and Pupil Invisible eye tracker are equal. Even more challenging are the persons wearing glasses during hiking. This is due to the patterns of the sunlight passing through the leafs of trees and reflecting on the glasses of the person. Since the Pupil Invisible does not allow the user to wear its own glasses, we made an additional recording with the Look! eye tracker where each subject wear glasses.

For the evaluation in the next subsection, we decided to group the recordings based on the difficulty. Therefore, the UNO card game and the PC game are in the normal categories (Top part of Tables 1, 2, 3), sport together with hiking is in the difficult categories (Second part of Tables 1, 2, 3), and the most difficult categories is hiking with glasses (Last part of Table 3 in the Look! evaluation).

For the recording of the UNO card game, each subject has worn each eye tracker four times. Each recording had a length of approximately 30 min, which sums up to a total amount of 16 h recording. In the sports task, each subject used each eye tracker two times for approximately 45 min. This leads to a total recording time of 12 h for the sports task. The PC game task, which was performed by two persons only, has a total recording time of 20 h. Each person in the PC game task has worn each eye tracker two times, but the recording times had varying length. The hiking task was performed by each subject two times with each eye tracker without glasses and two times with the Look! eye tracker and glasses. Since the hiking tracks changed for each recording, the length between the recordings is varying. For the hiking task without glasses we recorded a total of 29 h and for the hiking task with glasses we recorded 13 h. In total, we had a recording length of 90 h.

The annotation of the data was conducted as described in the TEyeDS paper [51] using a semi supervised approach and the MAM algorithm [20]. We annotated the pupil, iris, eye lids, and the eye movements for our evaluation. Since Pistol does only provide eye movements so far for the Pupil Invisible recordings, we omitted the evaluation of the eye movements for the Look! eye tracker.

Training

The detectors for Pistol were used as they are and no training or fine-tuning was performed. For the state-of-the-art algorithms, such as Tiny, BORE, 1000 pupils, and CBF, we performed a twofold cross validation where the training and test split was performed based on the subjects. With this approach, we are able to concatenate the results of both folds and are able to evaluate all approaches on the entire data set. The other approaches such as ElSe, ExCuSe, PuRe, etc. do not require any training and can be used on the entire data set without any restrictions.

Evaluation for Pupil Invisible Recordings

Table 1 We evaluated the feature extraction for our Pupil Invisible recordings in terms of mean absolute error (MAE: Lower is better) of the pupil center and the iris center as well as the mean intersection over union (mIoU: Higher is better) for the pupil, iris, and eyelid area. As competitors, we used several state-of-the-art algorithms

Table 1 shows the pupil, iris, and eye lid detection results on the Pupil Invisible data. As can be seen, the sports and hiking tasks are more challenging for all algorithms, but especially for tracking approaches, such as PuReST. The algorithm can’t handle the eye tracker shifts during sports and is easy to trick by dark spots which are created due to illumination changes during hiking. Overall, Pistol archives the best accuracy, but also has the highest runtime of all algorithms. We omitted the comparison to EllSeg [62] and DeepVOG [63] since they are trained on different data and perform a semantic segmentation which especially fails for lightning condition changes and reflections without retraining. Therefore, it would be an unfair comparison to Pistol, since it is trained on large amounts of data with heavy data augmentation. The most promising competitors are CBF and 1000 pupil segmentations in a second since they are very fast (Runtime approximately 1ms) and only require the CPU, but those approaches require large amounts of RAM. We think they could be integrated as a preprocessing step to find the coarse position of the pupil upon which the DNN from Pistol could be applied.

Table 2 We evaluated the eye movement detection on the Pupil Invisible recordings based on the consecutive pupil center, iris center, and eyelid opening differences as feature input. We report the results as a percentage of correct detected eye movement labels per eye movement type. For a comparison, we also evaluated several state-of-the-art algorithms
Table 3 We evaluated the feature extraction for our Look! recordings in terms of mean absolute error (MAE: Lower is better) of the pupil center and the iris center as well as the mean intersection over union (mIoU: Higher is better) for the pupil, iris, and eyelid area. As competitors, we used several state-of-the-art algorithms

Table 2 shows the results of our eye movement evaluation on the Pupil Invisible data set. As can be seen, the HOV feature performs very bad for blink detection which is due to the fact, that the feature itself is not capable of using the eye lid movements in the histogram of oriented velocities. We, therefore, added a histogram with the consecutive eye lid velocities to make it applicable. We also omitted the classical approaches from our evaluation which are based on velocity and acceleration thresholds since they cannot be trained and adapted to our selected annotation schema, which leads to a bad performance and possibly to a wrong impression of the reader. If we compare the UNO card game and PC game recordings with the sports and hiking recordings, we see a negative impact, which is due to the eye tracker movements. It is also interesting to see that the fully conv approach performs worse compared to Pistol, although this approach, in a simplified form, is integrated in Pistol. Therefore, the lower amount of training data and data augmentation is the possible reason for the worse performance. Interestingly, the tiny eye movement transformer outperforms the fully conv approach and is therefore a candidate for a future integration into Pistol. Overall, all approaches perform very well on the data set, but Pistol due to the higher amount of training data and data augmentation performs best.

Evaluation for Look! Recordings

Table 3 shows the evaluation on the Look! recordings. Those recordings are more frontal, so the overall performance of the approaches is in general better. This can be seen for the sports and hiking task as well as for the UNO card game and PC game recordings. The most interesting part in this evaluation are the ones with the reflections on glasses, which can be seen in the bottom part of the table (Hiking with glasses). Here all approaches suffer from a high impact on the accuracy of the mean absolute error (MAE) and the segmentation which is measured with the mean intersection over union (mIoU). The worst approach in the evaluation is PuReST which is due to the tracking which is used there. This tracking cannot handle the reflections and gets stuck on wrongly classified pupils. Overall, Pistol outperforms the competitors, but there is still work in front of the authors to further improve Pistol for images with reflections since approximately 2 billion people have visual impairments and possibly need glasses [68].

Conclusion

In this paper, we evaluated Pistol on additional recordings with two eye trackers. We showcased the applicability of Pistol and it’s limitations to different scenarios, and we argue that Pistol will help many researchers in their eye tracking experiments by providing a useful set of features which are so far not provided by the industry or commercial eye trackers. While the disadvantage of Pistol is the longer runtime, which makes it not applicable in online settings, compared to state-of-the-art approaches. In contrast, it delivers the highest accuracy in all scenarios, even under challenging conditions, such as the flickering reflections during hiking with glasses. Pistol can also be used out of the box and does not require retraining or fine-tuning, which makes it useable by anybody since it is provided for Linux and Windows. It can also be used without GPU at the cost of higher runtimes of the feature extraction process.