Keywords

1 Introduction

1.1 Making the Home More Accessible

The deep-rooted desire to control and manipulate a large variety of electrical devices in the living environment remotely in a customary and simple manner stems both from user convenience and urgent necessity, especially for elderly people often plagued with (temporal or permanent) mobility impairments which affect their mobility at home and their ability to maintain an independent life altogether. However, still only very few homes are equipped with interconnected devices and a convenient or central control unit. Moreover, accessible, almost intuitive everyday-interaction procedures are generally missed.

Instead of “analog” physical buttons on conventional remote controls, “digital” buttons on smartphones and tablets are often proposed to control light, blinds or temperature, with commands wirelessly sent to the devices. However, touch-based interaction can be arduous, straining and exhausting. The same is true for exact finger motion above a smartphone or tablet surface.

Moreover, the use of smartphones or tablets is still not prevalent among the elderly. The devices are not very robust – and the prevailing fear of dropping and damaging them further reduces their use. For many elderly, modern smartphones are generally perceived as being cumbersome, costly, bulky and heavy-weight. As an everyday communication device they require care, above all frequent recharging.

Among other modalities, mid-air gesture design and gestural interactions were widely investigated in the context of smart homes (see [1, 2] for reviews). A user-centered approach to deal with gesture design are elicitation studies to extract appropriate gestures that meet important criteria such as accessibility, memorability or reliability and hence ease-of-use. However, the effective use of gestures can be challenging and frustrating. In the sequel, experiments with a group of 29 elderly volunteers with a newly developed infrared handheld gesture-based remote control are reported.

1.2 Motivation for Device-Based Mid-Air Gestures

The use of mid-air gestures as means of a natural human-computer interaction has been popularized by entertainment applications e.g. in interactive sports games at distances up to several meters from the sensor or television screen. Popular commercial solutions include Nintendo’s WiiTM game platform with the hand-held controller WiiMoteTM (introduced 2006), Microsoft’s KinectTM which tracks the player’s body movement (introduced 2010 and updated 2013, but discontinued since 2017) or LeapMotionTM, which supports real-time motion tracking of palms and fingers and can be integrated with VR (Virtual Reality) Head Mounted Displays to enable controller-free, mid-air interactions in VR [3]. Also, Orbbec’s AstraTM and Intel’s RealsenseTM can be named here.

Often, mid-air gesture-based interaction is focused on free-hand gestures, which of course have the benefit that they can be used straight away, without a dedicated device. Elicitation studies of free-hand mid-air gestures for the home appliances typically investigate user-defined gestures for interactions with particular devices, especially the TV (e.g. [4, 5]) which has evolved to a standard interactive multimedia device.

Besides the computational challenges behind the analysis of freehand gestures from video footage obtained from a set of cameras – which poses challenging demands for the technical system itself but also great concerns regarding personality rights protection – a comparison of the results of the studies (e.g. in [2]) reveals that it is difficult to define a comprehensive universal gesture vocabulary. In many interviews and elicitation studies, device-free mid-air gestures receive negative assessments. Participants explicitly state that they would not choose free-hand gestures because those are too complicated to execute, too easy to forget, and altogether not intuitive enough, because free-hand gestures - other than voice or touch control - are only rarely used as interface to everyday devices. Instead, handheld control devices facilitate the execution of gestures, compared to hands-free control [6, 7]. Therefore, a flashlight-like universal gesture-based control we call “SmartPointer” is currently being developed and optimized.

1.3 User-Centered Design

Acceptance of new technologies is generally influenced by multiple factors, the most important being (1) expected personal benefits (like perceived usefulness or increased safety) to meet experienced needs and desires, (2) available alternatives and (3) persuasions and social influence (family, friends or professional caregivers) [8].

In a preliminary study carried out in the Usability Lab of the Geriatrics Research Group at Charité – Universitätsmedizin Berlin, a group of 20 elderly volunteers (aged 74.7 ± 4.9 years) was examined, how they would respond to a gesture-based interaction with home appliances [9]. The purpose of that study was to create a 2-D hand gesture set for specific tasks, with the final selection based on user ratings and consideration of system capabilities. For this purpose, a mixed-method design was applied, involving assessments, a guideline-based interview, a task-based application and a questionnaire using a gesture catalog. Finally, the test persons were given a number of imaginative tasks to control several household devices by mid-air gestures and were asked to perform those gestures of their choice with a commercial laser pointer (as the SmartPointer system was only to be developed) that they thought would be best to execute the task.

All test persons were very interested in, and open-minded towards, the tasks given to them. In the process, qualitative requirements were collected regarding the design and haptics of the hand-held device, desired feedback and safety concerns. Above all, a simple, almost intuitive use was of utmost importance for the test persons. “Intuitive” in this context is a synonym of “familiar”: A user interface is “intuitive” insofar as it resembles something the user has already learned or is exploiting existing skills [10].

The subsequent process of designing the system followed the guidelines of participatory design and was closely based on the 1) expectations and preferences of the potential users, 2) considerations regarding haptics and age-related limitations like fatigue and cognitive load and 3) experiences of the design experts among the project partners. As a result, the buttonless, lightweight (~70 g), small-sized (~10 cm) haptic interface, i.e. the handheld SmartPointer, was built in two different shapes we call ‘cross’ and ‘pipe’, see Fig. 1. (Weight and size are about half of that of usual remotes).

Fig. 1.
figure 1

Two alternative shapes of the battery-powered, handheld SmartPointer – ‘cross’ (left) and ‘pipe’ (right) along with the corresponding charging cradles – resulting from a participatory design process for the haptic interaction device. Assembly and functionality are identical.

2 Technical Implementation

The novel gesture-based infrared remote control consists of (1) the handheld SmartPointer which emits a) a visible light spot to select a particular device and “switch it on” and b) an invisible, spatially structured infrared (IR) light pattern to operate the device; (2) a small receiver box near the device which detects the trajectory of motion of the projected pattern, identifies the intended gesture and converts it into the device-specific command, Fig. 2. The emitted IR light is spatially modulated by a diffractive optical element (DOE) [11] which has a specially designed micro-structured surface to generate a binary pseudo-random array of light beams. This concept allows to come up with a handheld device which contains only a few inexpensive mass-market components (since the IR-optical signal is both the information and transmission channel) and is very easy to use. The technical system and design guidelines to optimize its overall performance are described in more detail in [12].

Fig. 2.
figure 2

The optical remote control comprises a hand-held device (left) that emits a visible beam and a large infrared pattern and is automatically activated by an acceleration sensor, a receiver box (right) with an array of photodiodes in, on or near the device to be remotely controlled and a processing unit for gesture recognition and the generation of device-specific commands.

The SmartPointer is used similarly to a laser pointer. Targeting the visible light beam for a short time (~1…2 s) towards a device selects it and switches it on. (The precision required for targeting is greatly reduced compared to using a laser pointer, due to the large size of the projected visible beam pattern.) A gesture is carried out with the handheld remote describing “a trajectory in the air”. This moves the invisible wide-angle beam pattern (with a large diameter > 2 m at working distances of several meters) across the receiver box, changing the light intensities at its IR-sensitive receivers (photodiodes) in a specific way. Based on the stepwise cross-correlation of these intensity changes at all pairs of photodiodes, the trajectory of the pattern is gradually reconstructed. Only the relevant, intended part is evaluated for recognition; the delivery and final movements are discarded. By assigning the trajectory features to one of the predefined gesture classes, the gesture is automatically recognized.

In short, the techniques required to reconstruct the user’s hand trajectory from the recorded electrical signals and to recognize the intended gesture can be categorized into the following stages: data acquisition, pre-processing, segmentation, feature extraction and multi-class classification.

3 Gesture Vocabulary

During preliminary investigations with uninstructed persons in the Usability Lab at Charité – Universitätsmedizin Berlin, it turned out that the majority of participants used the same gestures for the same functionalities for different devices. For example, for volume control of TV and radio or for dimming light, the gesture would often be the same [9]. This helps to unify and simplify the operation of a large variety of different household devices and hence can eliminate a great concern (especially among senior users) of forgetting the control gestures. During those tests, horizontal, vertical, circular and targeting gestures were most frequently used.

All gestures used in the recent experiments (2 targeting gestures, 4 one-directional strokes and 2 circle movements) are displayed and described in Table 1. This choice was guided by those preliminary investigations and practical experience (e.g. respecting “symmetry”, which refers to reverse gestures for reversed action [13]). The selected gestures are quite distinctive, but large spatio-temporal variabilities in shape and duration have to be considered for successful pattern recognition.

Table 1. Gesture lexicon used in the experiments; associated functions.

4 Study Procedure

In our laboratory study volunteered 29 healthy elderly persons (14 female/15 male) aged 74.9 ± 4.7 years, 23 being right-handed, 4 left-handed, and two ambidextrous. After completing the cognitive test, each test person picked one of the two differently shaped SmartPointers (‘cross’ or ‘pipe’, see Fig. 1) of her/his choice, sat on a chair in a distance of about 3 m opposite to the test devices (see Fig. 3) and was asked to subsequently operate the four devices by the specific associated gestures (cf. Table 1): (1) lamp: switch on/off; (2) blind: move up/down; (3) heater: draw a circle clockwise/counterclockwise; (4) door: move left/right. None of the participants had any familiarity with gesture-based devices or game consoles before. All, however, did immediately grasp how to “draw” a stroke or circle “in the air” with the flashlight-like SmartPointer without being primed. The test persons were only told that they had time to “draw” the gesture of up to 6 s after the starting signal.

Fig. 3.
figure 3

(Left) Test devices (light, blind, heater, door – from left to right) as seen from the user’s perspective when carrying out the control gestures. (Right) Detector box with the array of IR-sensitive photodiodes behind the optical filter (black square) to block ambient light.

While a test person was carrying out a gesture, the electrical signals from of all photodiodes of the receiver unit were synchronously recorded. After the preset time the recording was stopped, the trajectories instantly reconstructed and displayed and the gestures recognition result given. Then the test person was asked to repeat the same gesture up to two more times and after that to proceed to the next command.

Overall, 642 gestures from 29 participants were recorded and analyzed. Participants made an average of 2.70 gestures per gesture class (referent) (SD = 0.51).

5 Results

In Fig. 4, three exemplary recorded electrical signals – each 4 s long – from one photodiode (the signals from the other photodiodes being very similar, time-shifted copies) are displayed together with the reconstructed trajectories calculated from the signal fragments within frames representing the start and end of the intended “stroke”. The frames are rule-based set from amplitude gradients, velocities and accelerations. From these reconstructed trajectories, the gestures are recognized based on the match with linear or elliptical least square (LS) fits as a quality factor.

Fig. 4.
figure 4

(Above) Examples of recorded signals from one photodiode with rule-base found start and end of the meaningful part. (Below) Results of corresponding trajectories reconstructed from the signals within frames: line top-down, line right-to-left, circle counterclockwise (from left to right). The trajectories’ beginning and end are marked by green and black dots, resp. (Color figure online)

In Fig. 5, the recognition rates for all gestures from all 29 test persons are given in two bars for each test person. The left bar (in grey) represents the recognition rate with all strokes considered. The right bar (in black) represents the true positive rate (TPR), when only those strokes are taken into account where fragments of intended “strokes” could be recognized by the algorithm as well as by human observers.

Fig. 5.
figure 5

Results of gesture recognition. (Left) Results for every test person, for all attempts (‘all’) and meaningful signals (‘TPR’). (Right) Results for each of the 8 gesture classes, with TPR for all meaningful signals (Ntr counts of all N attempts for one class).

As can be seen from the results, almost every test person carried out some gestures awkwardly or not during the time of recording, such that the intention could not be found from the recordings. These failed attempts were often among the first gestures carried out by a test person and are obviously caused by lacking practice. In order to evaluate the pattern recognition algorithm, only meaningful gestures were taken into consideration (see right column of the Table in Fig. 5). Note, however, that the pointing gestures 1 and 2 (switch on/off) were always correctly executed and recognized.

The presented device-based gestural remote control portrays a novel, but familiar (and intuitive) user-friendly alternative to existing interfaces. First experiments with a group of uninstructed elderly volunteers already showed the great potential for a variety of devices to be controlled with a universal lightweight, buttonless, flashlight-like remote control. Further steps will be taken to improve the recognition results in the everyday practice and to include more gesture classes.