[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Support Strategies for Remote Guides in Assisting People with Visual Impairments for Effective Indoor Navigation

Rie Kamikubo, University of Tokyo, USA, rkamikub@iis.u-tokyo.ac.jp
Naoya Kato, University of Tokyo, Japan, nkato@iis.u-tokyo.ac.jp
Keita Higuchi, University of Tokyo, Japan, khiguchi@acm.org
Ryo Yonetani, University of Tokyo, Japan, yonetani@iis.u-tokyo.ac.jp
Yoichi Sato, University of Tokyo, Japan, ysato@iis.u-tokyo.ac.jp

People with visual impairments often require mobility assistance of sighted guides but they are not always available. Recent technological strides have opened up new directions for sighted guidance services, assigning guides from a network of remote workers to provide real-time assistance via audio/video communication. However, little has been known regarding desirable support characteristics of remote guides or challenges experienced in guide practices without the requisite expertise. To recommend support strategies that contribute to facilitating a successful platform for remote sighted guidance, this paper presents a comparative study of the performance of trained and untrained sighted guides who are recruited for a remote scenario in assisting people with visual impairments in indoor navigation. As an outcome of this research, we provide a deeper understanding of design opportunities for HCI to scaffold requirements of remote guides, such that their collaborative efforts and environmental knowledge influence the user experience. Based on our empirical insights, we suggest to develop the expertise of remote guides through: a) preliminary guidance cooperation awareness b) guidelines for verbal description methods, and c) approaches to compensate for the lack of environmental knowledge.

CCS Concepts:Human-centered computing → Empirical studies in collaborative and social computing; • Human-centered computing → Accessibility; • Social and professional topics → People with disabilities;

Keywords: Remote Assistance, Visual Impairment, Indoor Navigation, Collaboration

ACM Reference Format:
Rie Kamikubo, Naoya Kato, Keita Higuchi, Ryo Yonetani, and Yoichi Sato. 2020. Support Strategies for Remote Guides in Assisting People with Visual Impairments for Effective Indoor Navigation. In CHI Conference on Human Factors in Computing Systems (CHI '20), April 25–30, 2020, Honolulu, HI, USA. ACM, New York, NY, USA 14 Pages. https://doi.org/10.1145/3313831.3376823

1 INTRODUCTION

People with visual impairments can face significant challenges that deprive their safe and independent mobility [39]. To compensate for the vision loss, they have to rely on mobility aids which include the traditional white cane and guide dog [35, 41]. In recent years, navigation tools are becoming high-tech, such as using collision avoidance systems [27, 28, 43] or providing turn-by-turn directions [26, 33]. While assistive technology suggests solutions to enhance visually impaired people's mobility, there are various situations where technology fails to ensure reliable and consistent support (e.g., crowded spaces, indoors, construction areas) [41]. In such scenarios, having an in-person guide is still arguably the most efficient method [16, 17, 41] but the guide may not always be available.

To facilitate access to human-powered support, a number of commercially-available applications provide sighted guidance of remote workers [7, 21, 22, 23]. Be My Eyes is a well-recognized example that establishes a video connection to pair visually impaired users with crowdsourced volunteers who serve as conversational question-answering assistants [22]. The crowd volunteers interpret the video feed from the users’ smartphone camera and deliver visual information in nearly real time. Despite discussing the benefits of the service, Avila et al. have reported individual variability of the remote workers that inhibits quality assurance  [1]. As one of the few pioneers, a related collaborative platform called Aira has introduced human agents who are trained in the terminology and etiquette of communication [29].

Our eventual goal is to make such remote guidance cooperation a promising platform for people with visual impairments, especially to match them with effective remote workers for the mobility support. To this goal, we have noticed that there is an absence of known strategies to promote effective remote guide practices. Co-located sighted guide techniques have been widely acknowledged to offer proper preparation to act as effective guides for navigation assistance  [18, 31]. Inevitably, we are facing a lack of knowledge about the desirable ways of support and suggestions to improve the remote guide practices, which may have both consistencies and inconsistencies with existing strategies for in-person guides.

Figure 1: Visually impaired pedestrian is travelling a planned route while receiving mobility assistance from a remote guide through a video conferencing system, such as found in user-agent interaction in Aira  [29]. We conducted an analysis of 16 remote guide performances in giving verbal instructions and extracted the remote guide characteristics that were reviewed for the overall user experience.

In this work, we sought to better understand the requirements of effective remote guides and contribute to the development of support strategies for the guides to adopt in providing remote mobility assistance. As illustrated in Figure 1, we conducted a study to assess the performance of 16 sighted participants as remote guides in collaborative navigation tasks. To investigate how remote guides should give support to visually impaired users in this study context, we observed communication and assistance behaviors of the guides with the following types of requirements: trained and untrained for traditional sighted guidance, each expected to influence the experience of guided participants differently. In addition to the analysis of the verbal interaction during the task, we exploited the performance assessment by the guided participants and the self-assessment of the guides’ performance to derive desirable support characteristics and challenges of providing remote guidance.

To our knowledge, we present the first efforts to provide a set of suggestions that can improve the performance and effectiveness of remote guides who provide real-time assistance to people with visual impairments in indoor navigation. Based on the analysis of performances of trained and untrained participants in sighted guidance, we identified that trained guides demonstrated desirable verbal behaviors that untrained guides were lacking, as in-person guide techniques prioritize methods to inform details of the environment efficiently. Nevertheless, trained guides were not always well matched for the current video-based navigation platform due to their lack of environmental familiarity and awareness for remote guidance cooperation. To discuss social and technical recommendations to facilitate the design of remote mobility assistance, this paper concludes with the following implications: a) preliminary guidance cooperation awareness b) guidelines for verbal description methods, and c) design approaches to address the limitations of video-mediated collaboration associated with environmental unfamiliarity of the guides.

2 RELATED WORK

Our work is significantly informed by prior research studying collaborative navigation and making efforts towards detailing suggestions for sighted people to better interact with and assist those with visual impairments. Also, several prior studies have examined the feasibility of remote sighted guidance as a new promising direction. The question remains as to whether and how the strategies for traditional sighted guidance can be applied to enable more people to be effective remote guides.

Sighted Guidance for Blind or Low Vision People

Researchers have studied the experiences of people with visual impairments during navigation and found that they often rely on sighted guides, especially when navigating unfamiliar indoor spaces [40, 41]. In such collaborative navigation scenarios, a visually impaired person holds the elbow/arm of a sighted person and walks under the assumption that the sighted person will know the way around and avoid any obstacles in the way. Importantly, sighted guides can describe the environment and provide visual information as they negotiate their travel situations together. While location-aware pedestrian navigation systems (e.g.,  [17], [27], [33]) have expanded opportunities to receive navigation and wayfinding information, navigation environments are ever changing [4]. There are construction sites, bus stops changing locations, or businesses closing or being replaced, making the sighted guidance the most reliable method for people with visual impairments to ensure their ability to travel [41].

Despite the advantages of having sighted guides, several works have observed unreliable guidance by sighted people due to their lack of knowledge about how visually impaired people navigate or verbal descriptions methods to give environmental cues  [15, 36, 37, 40]. Navigators have proclaimed their experiences of receiving irrelevant information, ambiguous phrases like “there”, or inaccurate measurement estimations [36, 40]. Therefore, they have reported their preferences towards trained sighted guides to receive more efficient navigation help [2, 9].

Practice of Trained Sighted Guides

Trained sighted guides are aware of specific techniques to convey visual information and navigation instructions using verbal and physical cues [18, 31]. Guides, especially those who provide services as Orientation&Mobility (O&M) specialists, are well experienced with describing environmental features that enable visually impaired people to learn spatial relationships of their surroundings [35, 38]. Also, they inform navigation cues with regards to individual mobility needs and strategies, such as guiding blind pedestrians using a cane to benefit from walls or areas with boundaries, which can be misunderstood as obstacles by those without blindness awareness [41]. Trained guides also make sure that their walking pace and stance are appropriate for the companion's travel behaviors [13, 31].

To train professional guides, many efforts have been devoted to the development of verbal description methods and courtesy expressions to avoid misunderstandings and confusion [10, 32, 34, 42]. For example, as a tip to describe the geometric space in the immediate vicinity, people with visual impairments may take in information easier when objects are identified according to a clock orientation (e.g., “There is a table at 2 o'clock position”) [13, 32]. It is also important to avoid language that centers around visual cues when giving navigation instructions (e.g., Instead of “Go to your right when you reach the office supply room”, “Walk forward to the end of this aisle and make a full right” is strongly preferred) [2, 13].

Emphasis on Remote Sighted Guidance

The drawbacks of sighted guidance are that its professional service may not always be accessible and may not be ideal for promoting independence. To remove these barriers, pervasive mobile technologies, such as smartphones and wearable devices, have now opened up opportunities for visually impaired users to access human-powered support. Be My Eyes [22], VizWiz [7], TapTapSee [20], and BeSpecular [23] have demonstrated the idea of crowd workers responding to photos or videos captured by the users’ camera so that they can flexibly query for real-time assistance in a variety of situations. There is also a body of research devoted to investigating the feasibility of having remote assistants to enhance the mobility experience of visually impaired pedestrians [5, 6, 12, 14, 30].

To inspire suitable matching criteria in sighted and visually impaired collaboration, the remote service delivery of the Aira platform has showcased qualified “agents” who are considered trained in the user-friendly terminology and communication courtesy [29]. Along with the commercial development, one case study has ascertained the advantage to seek information about features of the environment from O&M specialists via Facetime [19]. Other related feasibility studies have also assigned individuals who are trained in O&M concepts as remote operators, leveraging their recognized qualifications in the design and adoption of remote mobility assistance  [6, 14, 24].

Despite the availability of the remote mobility assistance, we have not adopted known evaluations of remote guide performances to identify desirable communication/assistance approaches. As discussed in prior work [4], trained sighted guides or O&M specialists are qualified, particularly given their preliminary awareness of what information is useful for people navigating with visual impairments. However, to pave the way for a successful remote guidance, it is important to challenge the taken-for-granted assumptions about qualifications that have been based on techniques to mediate co-located interactions with visually impaired people [9, 18, 31, 34].

3 USER STUDY

We performed a study to examine collaborative navigation tasks involving remote guides and pedestrians with visual impairments. Our goal of this study was to learn about desirable communication/assistance strategies and discover a set of suggestions for those performing remote mobility assistance. We assessed the performance of trained sighted guides and untrained guides and performed a comparative review to exhibit: 1) desirable characteristics of support based on how visually impaired participants perceived and reacted to the performance of remote guides 2) desirable expertise factors based on the analysis of interviews with trained sighted guides reporting their needs and challenges in providing remote guidance.

Participants

After a pilot study, 24 participants (16 sighted and 8 visually impaired) were recruited for the study from the local mailing lists and word of mouth. We asked the participants with visual impairments to navigate unfamiliar planned routes in an office building located on a school campus and use a video conference system to receive remote help from trained and untrained sighted guides. The participants were compensated for their time and effort to take part in our study.

Pedestrians Navigating with Visual Impairments (PVI). Table 1 shows a list of participant demographics, which ranged from blind to severe to mild forms of low vision. We did not categorize participants by specific visual impairments considering the nature of user variability in real-world applications such as [22, 29]. Types of visual impairment experienced by the participants included advanced glaucoma, retinitis pigmentosa (RP), and complete blindness. They acquired their vision impairments adventitiously, and they all had sufficient hearing abilities. The blind subject (P4) lacked any visual information for navigation. Low vision subjects were able to obtain some visual cues, such as with remaining visual fields (P5, P7) or light perception (P3, P6, P8), but none had sufficient acuity to read building signage. P3, P4, and P6 use a white cane on a regular basis and are experienced to travel alone in an environment where they have received O&M training. P1, P2, P8, also with severe forms of vision loss, own a white cane but use it only when they cannot rely on sighted guide services. P5 and P7 access maps and directions from their personal devices to navigate independently.

Table 1: Pedestrians with visual impairments (PVI) including their age, gender, vision, travel behaviors (use of sighted guide, white cane).
ID Age/Gender Impairment Guide Use (Travel Aids)
P1 58/F Glaucoma Often (Travel with a regular cane)
P2 46/F RP Often (Prefer not to use white cane)
P3 54/F RP Sometimes (Always use white cane; Travel alone in everyday locations)
P4 35/M Blind Sometimes (Always use white cane; Travel alone in everyday locations)
P5 50/M Glaucoma No (Prefer not to use cane; Travel alone & search online for maps and directions)
P6 46/M RP No (Always use white cane; Travel alone in everyday locations,)
P7 20/M RP No (No cane; Travel alone with smartphone for maps and directions)
P8 67/F RP Often (Prefer not to use white cane)

Remote Guides. Each sighted participant was given the role to remotely guide the paired PVI via live video footage of the captured environment, supported by a physical map of the travel paths (Shown in Figure 2). Though all participants had no prior training or experiences in remote sighted guidance, there were two key expertise factors influencing the performance of remote mobility assistance:

  • Table 2 shows a list of 8 sighted guide specialists (four females) who are trained guides belonging to either public disability services office or private assistance services to help customers with disabilities. Their main job is to escort people, including those with visual impairments, to their desired destination. We focused on a random assignment of participants to be in pairs and tried to match individuals who had not met and/or traveled together. There were yet 3 pairs with relational factors due to limited local guidance offices, as many users know the same guides and services. In this paper, the pairs are noted with the same IDs (e.g., TG5 and P5 worked in a pair). Participants had not visited the test environment prior to the study.
  • We recruited 8 other sighted participants on-site who were not previously trained for sighted guidance. Our recruitment of UG participants followed the conditions that would reduce the effects of gender, educational, or age/social-class differences to mainly control variability in verbal interaction approaches and map reading skills. As a result, the participants, aged between 23 to 32, were male graduate students or staff randomly selected from the information science department of the institute that the study took place. These conditions also led to member homogeneity in tech-savviness and environmental familiarity.
Table 2: Trained sighted guides: participant information including their age, gender, Guide Experience, and relationship with PVI.
ID Age/Gender Guide Experience Know PVI Partner
TG1 43/M 2 years Yes
TG2 53/F 7 years Yes
TG3 66/M 3 years No
TG4 55/M 2 years Yes
TG5 54/M 6 years No
TG6 43/F 1 year No
TG7 46/F .5 year No
TG8 66/F 4 years No

Environment

The study took place inside a 7-story building on the university's campus where the journey of data collection was not controlled. PVI traversed the planned routes that involved other pedestrians like graduate students or staff. The planned routes that spanned in the area of 1080m2 consisted of 2m wide corridors, where pedestrians would find doors and panels on the walls. The session involved reaching multiple landmarks defined preliminarily by the researchers, which are kitchen and copier rooms with an open entryway on a single floor. There are 2 vending machines and a counter-top sink in the kitchen room, and the copier room had 1 large table with chairs and multiple copiers and scanners located around the table. Every pair went through at least two straight hallways (41.2 m long), three decision points (intersections with an area of 3.6m2), and 3 confidence points (different copier rooms as origin and destination, making a stop at the kitchen room in between). Figure 2 b shows a floor map of the building, with added labels of the confidence points.

Apparatus

In order to allow video communication between PVI and remote guides, we used a video conferencing service, Whereby1. Similar to the prior evaluation studies of remote navigation systems [5, 16], PVI received a special belt-wear to position the smartphone camera on their chest and point it onwards in the direction of the environment (Shown in Figure 1). They were able to detach the camera from the belt when needed2. The captured video was sent over the mobile data network to the browser-based interface on the remote guide's computer screen. The user wore a single-ear headset to hear the voice of the remote guide. We provided a floor plan to the remote guides, considering a real-world assistance scenario that such map information is a common method for sighted guides to rely on when navigating an unfamiliar indoor space. Also, for the remote guides in existing cost-friendly video link systems that have not incorporated the user localization inputs [19, 22], the floor plan is the only available reference to capture the building layout from the user's camera.

Figure 2: Sighted participants given the captured view of the pedestrian's environment and the floor map with landmark annotations (C: Copier room, K: Kitchen room), and starting positions and directions of the pedestrian.

Procedure

We obtained informed consent (approved by the university's IRB) from all participants. Three participants were invited for each experimental session (1 PVI as a user, TG and UG as remote guides). They first received an introduction about our research and its goals, and the task procedure was explained to them individually for their assigned role. We walked through how the video communication interface works for remote guides and warned about possible technical difficulties such as poor image quality or connection. We stressed that the remote guides should provide verbal feedback in a way that they think would be necessary to guide the user and focus on safety as a primary concern throughout the task. PVI were briefed about 2 collaborative navigation tasks, either connected with TG or UG, but did not receive any information about the paired remote guide's background and experience unless they were from the acquainted pairs with TG. To ensure the PVI's safety during the navigation tasks, we referred that one of the researchers would always be nearby as observers and intervene when necessary.

For each collaborative navigation task, the pairs were given a scenario for a journey (approx. 100m long) that started by first going to the kitchen room and then ended by moving to the assigned copier room. They were instructed to move to the next landmark only after they confirmed that they had arrived at the first one. By enabling remote connection throughout the journey, instead of strictly traversing for wayfinding, we gave them freedom to explore the landmark area for natural communication/interaction behaviors. Followed by a different remote guide for the next task, PVI were assigned a similar journey on a different floor and with slightly different interior settings and kitchen/mailbox rooms to pass. The order of the remote guide conditions (TG or UG) was counterbalanced.

After completing the experimental session involving two journeys, PVI were asked a set of questions to share their ratings and perspectives on the user experience with different remote guides. In addition to the reported scores, PVI described how the navigation instructions received were effective or ineffective. They also provided their impressions of remote mobility assistance including its limitations and potential applications. They were asked to give information about their daily travel scenarios and visual conditions as well. This semi-structured interview session lasted approximately 30 minutes.

All sighted participants completed a questionnaire to report their self-reviews of the performance. We held an additional semi-structured interview session with TG to learn about how giving instructions and providing assistance via remote connection were different compared to in-person co-navigation, including their perceived challenges to be effective remote guides. Specifically, they were asked to indicate navigation scenarios that they felt they provided guidance effectively or with difficulties. The session lasted about 20 to 30 minutes.

Data Collection and Analysis

Effectiveness Ratings. PVI reported subjective scores for the perceived effectiveness of assistance from TG and UG. The rating was on a scale from 1 to 7, where 7 received the highest effectiveness score. We conducted a paired Wilcoxon signed-rank test to determine whether the differences between the two conditions would be significant.

Interviews and Questionnaires. Notes were thoroughly taken during the interviews, and audio was recorded to look over for any missing content from the notes. Raw qualitative data of questionnaire results and interview notes were separated into text segments containing single ideas or incidents.

Interview responses of PVI were first open coded by the primary researcher, which identified the features of the remote guide's performance reflecting the concerns and perspectives of PVI. Three members of the research team then reviewed these initial codes, in which further groupings were done to emerge certain characteristics of support by TG and UG and how PVI found them effective or ineffective. Such characteristics were later linked with quantitative measures of verbal behaviors of TG and UG (analysis explained in the following section). Through further discussion of the meaning of PVI responses, the pros and cons of the ways of support gradually illuminated the same themes seen in TG interviews regarding the need to develop the expertise of remote guides to facilitate desirable remote mobility guidance cooperation.

Questionnaire responses of sighted participants (TG and UG) also followed the same analytic process as above, which started by identifying the initial codes reflecting the concerns and perceived effectiveness of their own performance. Also given the interview topics of TG - sighted guidance techniques and challenges in current remote mobility assistance, the research team compared and discussed the codes which were refined into main analytic points of the expertise factors to be effective remote guides.

Annotation of Verbal Behaviors of Remote Guides. To assess the characteristics of navigation instructions derived from the above qualitative data, we observed the performance of navigation tasks based on measuring what kind of spoken language and instructions were used by TG and UG. Using the screen recordings of the video conferencing interface which included audio of pedestrians and assistants, we counted each time a participant mentioned a word/phrase relating to the emerged properties of navigation instructions in remote interaction. We followed the previous analysis techniques for the subjective categorization of different types of information for route navigation [8, 9].

Specifically, two members of the research team individually analyzed the video recordings based on the pre-determined rules to code the categories of expressions used for navigation instructions: directional (e.g., left/right), numerical (e.g., distance in steps or meters), and descriptive with static environmental features (e.g., long corridor, narrow entryway, wall). Expressions that didn't fit the above categories were coded as others, and the coding results were collectively reviewed and matched. In addition to categorizing navigation instructions, we surveyed and classified all other speech occurrences.

4 EVALUATION OF REMOTE GUIDE PERFORMANCE

We describe representative characteristics of support that emerged from the interviews with PVI, with a set of positive and negative perspectives towards the performance of TG and UG. The effectiveness scores rated by PVI also shed light on the need to develop the expertise of remote guides. To gain a deeper understanding of their subjective feedback, we analyze quantitative measures of verbal behaviors of the sighted participants to examine their performance.

Effectiveness Scores

Given a question to rate the perceived effectiveness of overall assistance from TG and UG, the average score of participants of TG was 3.25 (SD = 1.16), whereas the average score of participants of UG was 5.75 (SD = 1.39). A paired Wilcoxon signed-rank test showed significant differences between the two conditions (Z=0.49, p < 0.03).

Characteristics of Support

We present 3 themes describing desirable support characteristics in remote mobility assistance. While PVI participants rated the support from UG as “effective” to complete the navigation tasks, we incorporate their qualitative feedback to understand their perspectives, as the support from TG was “preferred” for remote guidance cooperation despite it being ineffective for the tasks.

Key characteristics extracted from trained guides in comparison to untrained guides were associated with the quality of verbal description and the descriptive terminology. Six PVI participants (P1, P2, P4, P5, P7, P8) reported a positive opinion about easy-to-comprehend messages from TG that they were able to respond quickly to their instructions. Moreover, intuitive wordings used by TG for navigation cues helped PVI participants maintain autonomy in their walking pace, as stated by:

I was able to intuitively understand and respond quickly to the direction of travel when I received ‘at 3 o'clock position.’ (P1)

I was able to walk at my own pace. I was able to move intuitively with ‘Yes, keep going straight.’ (P2)

Considering the factor of no prior guide experiences, 5 PVI participants (P1, P2, P3, P7, P8) reacted negatively to the quality of navigation instructions by UG. For instance, P3, an active white cane user, encountered a stressful experience when she received instructions that did not follow her mobility standards using a cane. Low vision participants, with diverse visual abilities and needs, also explained their concern about unfavored verbal description methods by UG that did not bridge information gaps. There were a lot of miscues, as described by:

I can't believe he [UG3] told me to ‘Shift XX steps to the left.’ It is really dangerous for a white cane user to move sideways because I can detect what is in front of me but it is a lot harder to tell when there is something on my side. I was so nervous and felt stressed. (P3)

He [UG7] could have told me to leave the kitchen room, rather than ‘Turn 180 degrees.’ I knew where the exit was. I wanted to know which direction to go, not which direction to face. (P7)

His [UG8] explanation was short and simple but not enough in detail to help me. I am afraid when navigating surface-level changes. (P8)

To emphasize areas of improvement related to navigation-specific language skills of UG, 3 PVI participants (P1, P2, P7) referred to specific examples of non-intuitive expressions such as numerical measurements to describe distance and orientation or with no contextual details about the environment, such as a short command like ‘Turn 180 degrees’ as mentioned above. They experienced how such expressions took them time to process the instructions, which were reported by:

There were lots of instructions using degrees. I couldn't intuitively respond to ‘Turn 90 degrees.’ (P1)

I had to think hard for ‘Move XX steps’ instructions. I couldn't naturally grasp ‘XX meters more.’ Mobility was too slow and cautious. (P2)

PVI participants showed personal preferences regarding the level of detail expected in information delivery. Two PVI participants (P6, P8) mentioned the potentials of building sufficient knowledge to explore the environment by receiving navigation hints from TG. Specifically, P6 favored the ambient description to gain situational awareness:

Even if the information about surroundings is too detailed, it always helps me gain situational awareness and I have more hints for what to expect. (P6)

On the other end, there were more instances that the information from TG was considered overly descriptive and often unnecessary. Three PVI participants (P5, P7, P8) referred to the lack of task-oriented supplementary information. They wanted to know how such information is related to them to complete the navigation task, as described by:

“I don't need to know ’There is a big room on the left/right side’ to complete the task.” (P5)

She [TG8] was trying to be careful with the description but I can't make use of all of these visual cues she was mentioning like there are ‘four rooms’ or ‘panels hanging on the wall.’ I don't know whether they were hazardous and she wanted me to be cautious. I need concrete explanation. (P8)

Another characteristics of support were with regards to the lack of environmental familiarity, falling into a sense of insecurity and uneasiness for both parties involved in the collaborative navigation task. Overall, despite being satisfied with verbal description methods by TG, all 8 PVI participants described unreliable guidance to reach their destination effectively. They perceived a lack of confidence in the way TG gave feedback, which empowered their sense of uneasiness while traveling. We observed negative points of view towards directional guidance by TG, as reported by:

I can tell he [TG1] is not used to the building. I almost fell. (P1)

I felt more worried than accomplished after reaching the destination. He [TG3] didn't know how much (distance) is left so he told me to go by small steps. (P3)

She [TG6] was so confused and I even heard her asking the experimenter ‘Am I going the right direction?’ I got worried with her, and even though I kind of knew I missed the turn, I kept going straight. (P6)

I needed strong confirmation from her [TG7]. I kind of sensed I have arrived at the destination but I was not completely sure. (P7)

Two PVI participants, therefore, explicitly suggested to provide remote workers with environmental familiarity to ensure quality control with directions:

If the operator is new to the building, you won't receive reasonable instructions. (P1)

Quality of instructions would be a lot higher if the operator knows the route. (P2)

From a different perspective, P8 referred to the possibility of leveraging information technology to compensate for the lack of environmental knowledge of remote workers:

If there is a lot more information available and systems can collect important decision points or cautionary spots in the environment, I feel like remote workers will make fewer mistakes and perform even better. (P8)

Lastly, we observed how mobility assistance through remote collaboration reinforced the idea of building shared experiences. Also, the development of mutual efforts was an important factor for the effectiveness of remote mobility assistance. Positive reviews were mentioned regarding their shared experiences with the remote collaborator, in which PVI participants explained the benefits of having a remote guide than an in-person guide. Five of them (P1, P2, P4, P5, P8) reported how they usually follow their guide with complete reliance and have no decision power during navigation; in comparison, they referred to the value of gaining a sense of accomplishment in remote sighted guidance, as described by:

I felt more accomplished than usual sighted guidance. I was able to reach a destination with her [TG2], rather than following her when we walk together. (P2)

Unlike traditional sighted guidance, I liked how I had the power to check my directions but I also had a companion who was with me. I feel more accomplished and satisfied. (P8)

Even though information delivery by the paired TG received low effectiveness scores, 4 PVI participants (P1, P4, P5, P8) described their sense of ease and shared feelings. While no psychological comfort was reported towards the interaction with UG, we observed their positive attitude towards remote collaboration with TG, such as:

I felt the sense of ease. I couldn't reach the destination without him [TG1]. My worries were much less while I could hear his voice. (P1)

I had a sense of security because I have a companion to share the experiences. Whether I am lost or he [TG4] is lost, we can share our troubles. It is not the end of the world for one mistake. (P4)

Interestingly, PVI participants initiated a mutual contribution to aim the camera correctly because the camera work impacts what the remote guides can do. P8 constantly asked whether her walking pace was appropriate for the guides to follow the video. For effective remote collaboration, camera work was not considered a one-way street, as described by:

When I notice we were approaching the intersection, I tried to walk slowly and not to stop in the middle of it so the camera can capture the whole view of the paths. I cared about the information that I was sending. (P5)

I built an understanding that I was not simply walking alone. I moved my camera around to send information about the surroundings when she [TG8] lost her track. (P8)

Analysis of Verbal Behaviors

We analyzed the verbal behaviors of remote guides during the collaborative navigation tasks, in order to associate them with the characteristics of support perceived by PVI. Overall, the average numbers of speech occurrences by sighted participants were 65 for TG (SD = 12.3) and 39 for UG (SD = 13.2), within the average task completion times of 526.9 seconds (SD = 199.4) and 316.4 seconds (SD = 137.8) respectively. Out of these speech occurrences, giving navigation instructions described in Table 3 took 50 for TG and 34 for UG on average. The rest involved describing the environment by mentioning landmarks, points of interest, and obstacles (9 for TG, 3 for UG) and delivering caring messages such as “Are you okay? Be careful” (6 for TG, 2 for UG). Describing the environment was notably different between TG and UG.

Table 3: Frequency of Speech Occurrences by Type with Examples - % (STD) is calculated based on a mean average of individual participants’ measured counts divided by their total number of speech inputs.
Categories TG UG
Giving navigation instructions 76.6 85.5
Turn left/right, Move forward, Make U-turn (5.4) (4.7)
Describing the environment 15.6 8.1
Landmarks, Points of interest, Obstacles (5.8) (5.3)
Signaling care 7.8 6.4
Are you okay? Be careful. (5.6) (4.4)
Table 4: Expressions for Giving Navigation Instructions - Avg. % (STD).
Categories TG UG
Directional 47.0 60.8
Left/Right, Proceed, Stop (14.9) (9.8)
Numerical 18.2 19.4
Proceed 2m/3steps, Rotate 90 deg. to the left (14.6) (12.2)
Descriptive 27.6 17.3
Follow the wall on the left. You will be going the straight corridor. (20.3) (10.9)
Status check 7.2 2.6
Can you look around? Move up the camera. (5.9) (2.7)

Table 4 shows the average frequency of the expressions for navigation instructions mentioned by TG and UG.

  • Although PVI participants reported how they could not react intuitively to the directional instructions by UG, such as with numerical terms, TG relied on numerical expressions as frequently as UG. The quality of verbal description was reported to be higher for TG than UG but we observed numerical expressions being used by TG for instructing turns and steps to shift the user position.
  • TG participants were more likely to add contextual descriptions to directional cues using environmental features, such as “There is an open entrance to the kitchen room. It is approaching on your right. It is narrow.” This accords with the subjective feedback of PVI regarding the emphasis of descriptive terminology used by TG.

By comparing the frequency of directional and descriptive expressions, UG were biased towards using short directional commands, “stop and turn right” within other categorical expressions found. We also examined navigation instructions that didn't belong to the extracted features, in which we categorized them as Status Check. In Status Check, remote guides had to give instructions to optimize the view angle of the camera or capture scenes necessary to understand the current position of the PVI. As described previously that TG performed with uneasiness, they needed to retain where the pedestrian was located on the map frequently due to getting lost during the task. TG used Status Check especially at the start of the route and asked for “turning around in circle” to gain the overall spatial understanding and current position in the local environment.

5 SELF-EVALUATION OF REMOTE GUIDE PERFORMANCE

Based on self-reported experiences of sighted participants, we performed a thematic analysis that resulted in 3 main themes describing the needs to develop the expertise to be effective remote guides. TG reported significant difficulties to guide users via remote interaction. We present different opinions and concerns of the remote guides influenced by their expertise.

Needs for Conversation Assistance Expertise: Awareness of Terminology and Communication Rules

We reinforced our understanding that participants from TG were not always a knowledgeable match for the current platform of remote assistance. Their method of in-person guidance does not constantly involve verbal communication, and their job is to provide detailed information about journey plans, not turn-by-turn directional instructions which are simply inferred by their body movement in co-located guidance. All 8 TG participants reported frustration and extra effort to deliver verbal instructions in comparison. They exposed their need to progress conversational resources, including verbal and non-verbal cues, for efficient communication:

I usually don't need to tell direction of travel because we are right next to each other. The companion can simply follow me and I can physically correct her veering. (TG2)

If it is simply walking straight, we just walk together on the spot. In tele-assistance, I needed to constantly say something or the companion might get worried. (TG4)

I know I don't want to say by the number of steps but I had to (with the tele-assistance). We need to think of signals so it would be easier to communicate. (TG8)

Participants from UG were less concerned about information delivery methods via remote assistance. Only 2 UG participants (UG1, UG6) realized how they were limited with verbal description skills for efficient flow of information between senders and receivers, as described by:

I was not sure if I was telling information in the right way. I gave a lot of instructions using degrees, like ‘90 degrees to the right’ but the navigator didn't seem to respond well to these instructions. (UG1)

I had difficulty with wording the instructions in a user-friendly manner. (UG6)

Overall, there were more instances where participants from UG were satisfied with their approaches to give verbal instructions. 5 UG participants (UG1, UG2, UG5, UG7, UG8) emphasized how they succeeded in offering advance cues by distance measurements using steps or meters. They showed a positive self-evaluation of their performance and the terminology used to deliver information, such as reported by:

I was able to tell exact measurements such as by meters to describe distance. Though the terminology I used might not have been the best for the receiver, the expressions were effective for guidance. (UG1)

I constantly instructed her [P2] to move in steps little by little, like ‘make 5 steps ahead’ It was effective to avoid getting into obstacles. (UG2)

I was able to inform degrees for turns. (UG7)

Needs for Environmental Knowledge: Familiarity with Overall Structure and Spatial Relations

Participants from TG reported major concerns related to their familiarity and knowledge levels of the environment. 5 TG participants (TG2, TG4, TG5, TG6, TG7) mentioned that the style of instructions would be different if they had prior contextual understanding of the scene, such as the location of tactile pavings. They described the importance of preliminary spatial awareness and learning in order to be prepared for sighted guidance, as reported by:

I usually take time to research about my client's destination in regular sighted guidance, like how to safely navigate from the station and knowing its accessible areas. (TG5)

It sounds more appropriate to guide only the places I know. If I knew which direction that a door slides open, I can provide cautionary directions along the route. Right from the start, it was too difficult to navigate. I don't know which direction the user was facing. (TG6)

If I knew the travel paths, I would be able to give accurate instructions. From the screen, I can't tell how far things are or how many steps would get to certain distance. (TG7)

In addition, all TG participants emphasized the biggest challenge in video-mediated interaction, such as the problem with sensing depth or completely losing the user's current location. They also reported poor video quality and limited field of view to observe ground levels and ramps, detect moving objects, or inform orientation cues at the interaction. To compensate for these challenges, 7 of them (except TG2) suggested technical efforts and environmental familiarity, such as described by:

If the camera angle can be adjusted, I might be able to capture the travel paths better and give accurate directions. (TG1)

If I have a reasonable map and understand the overall spatial relations with respect to the current location, I will have less overloads and can assist the person remotely. (TG6)

If I can get higher image contrast for noticing obstacles and know the place, I can smoothly instruct and guide the blind person. (TG8)

It is important to note how none of the participants from UG reported the problem with estimating depth and tracking user position and orientation from the screen. Although they commented, similar to TG participants, about the poor video quality and limited field of view influencing negatively towards their performance, 5 UG participants (UG1, UG3, UG5, UG7, UG8) had a positive self-evaluation that they were able to give preparatory information at the right timing (e.g., ‘after you turn this intersection, you will be approaching the kitchen area’). We expect that their prior knowledge about the environment gave UG participants hints to navigate under technical limitations, and one of them added a comment:

[TG6]I don't think I will be able to navigate the pedestrian well in an unfamiliar environment.

Needs for Co-Navigation Expertise: Negotiation Behaviors to Build Mutual Contribution

Co-navigation strategies were required to help both remote guides and users negotiate their tasks together and accomplish their roles. The interaction in the pair of TG8 and P8 involved a quick Q&A session before the navigation started, such as TG8 asked about the preference of walking in the middle of the corridor or along the side wall. As such, P8 actively shared her visual conditions and daily navigation scenarios (e.g., P8 mentioned about her remaining light perception and that she can follow the ceiling lights to walk a straight hallway).

Rather than having leader-follower relationships, such that remote workers lead the way with full responsibility, TG participants referred to the need of receiving mutual efforts:

[TG2]Communication with the pedestrian is affected by how considerate the receiver can be.

[TG5]He [P5] moved steps backs to capture the view of the crossroad. I was able to get the overall picture of the paths and go back to following the map.

[TG6]Unlike regular sighted guidance where I can rely on other sensory information and techniques on the spot, I only have the camera. The pedestrian needs to be considerate about capturing the right information.

6 DISCUSSION

To discuss future research and design opportunities in assisting PVI through remote sighted guidance for effective indoor navigation, we developed our research probes from the performance assessment by PVI combined with the analysis of their verbal interaction and the self-assessment feedback from the guides. We came across similar themes that revealed the requirements of remote guides for verbal description, environmental, and co-navigation awareness. Specifically, we suggest the importance of a) preliminary guidance cooperation awareness b) guidelines for verbal description methods b) design efforts to compensate for the lack of environmental knowledge in video-mediated collaboration.

Strengthened Expertise through Preliminary Guidance Cooperation Awareness

The perspectives derived from our study with trained and untrained sighted guides, matched with the concerns of PVI, highlighted the importance of tailored guidelines for suitable guidance terminology and remote cooperation awareness to offer desirable mobility support. First, terminology guidelines for remote mobility assistance should be emphasized, as untrained sighted guides had misconceptions about desirable verbal description methods and trained guides also faced challenges to adopting their prior learning especially in guiding directions. PVI explicitly reported the difficulties to intuitively respond to navigation instructions from UG because they lacked contextual descriptions to the directional cues or used unfriendly navigation-specific languages such as distance measurements by steps or meters. While our analysis of verbal behaviors validated that TG provided contextual descriptions using environmental features, PVI were negative towards the overall experience with TG as well, as reported by: P5 - “I don't need to know ‘There is a big room on the left/right side’ to complete the task” and wished for concrete confirmation that they were going the right way. TG reported frustration and extra effort if they tried to follow the already known strategies to interact with visually impaired people such as in [18, 31]. They usually rely on nonverbal methods to focus on providing detailed information about journey plans, as reported by TG4 - “If it is simply walking straight, we just walk together on the spot. In tele-assistance, I needed to constantly say something or the companion might get worried.”

We extracted insights on cooperation awareness in remote navigation, which is different from the traditional co-navigation experience, as reported by: P2 - “I felt more accomplished than usual sighted guidance. I was able to reach a destination with her [TG2], rather than following her.” Sighted guides are typically responsible for covering all of the requirements in goal-oriented mobility but to coordinate remote navigation effectively, we have found the value of shared understanding for each side of the collaborators. For instance, the interaction in the pair of TG8 and P8 involved a quick Q&A before the navigation started. The guides also appreciated active contribution from PVI (i.e., P5 and P8) to decide what information they should give. One feasible recommendation for remote guides is to incorporate a briefing session to introduce themselves or question about their mobility or visual aid preferences. Though the guides for the remote guidance service can collect such user information beforehand if the users initially register their mobility habits or visual conditions, the formation of a common ground through confirming the user information is important to reach mutual understanding [42].

Guidelines for Verbal Description

Based on our analysis of desirable characteristics of support, we provide the following suggestions for the language and cues to better serve navigation tasks via remote interaction.

  • We suggest remote guides to offer contextual details in turn-by-turn navigation (e.g., “face the wall”, “to exit the room, the entryway is on your right”instead of “turn XX degrees”) to elicit quick responses from users. Short and systematic instructions such as “stop.. turn right.. proceed” without any contextual information will not give PVI the ability to expect possible future actions and maintain their preferred mobility. A study by [11] has reported improvement of PVI user-remote guide interaction when non-expert guides were given with a preset of instruction commands (e.g., “obstacle on the left/right, keep left/right to walk around it”). These commands are precise and have contextual details to inform the users.
  • Remote guides should avoid the use of adding degrees to left/right turns (e.g., “rotate 180 degrees”) or numerical scales for distance (e.g., “2 meters more”), because PVI have difficulties following these types of instructions. One useful directional instruction is to use intuitive messages such as by referring to the positions of the numbers on a clock (e.g., “at 3 o'clock position”) [32]. Describing distance with environmental features (e.g., “You go straight about four rooms”) is suggested in [34] but PVI cannot always make use of descriptive information of the geometric space and the level of detail preferred is connected to their individual needs (e.g., users with residual sight, the use of travel aids).
  • Audio information is the only reliance for people navigating with visual impairments under remote mobility assistance. The tone of voice of the remote guides has a huge impact on the overall user experience. Showing insecurity and uneasiness will reinforce users to lose comfort and a sense of security during navigation. PVI constantly face stressful navigation experiences, and it is thus important to give them strong confirmation that their actions are leading towards the goal.

Effectiveness of Environmental Familiarity

We have found that environmental familiarity of remote guides plays an important role in ensuring the quality of remote navigation, as observed in the lower effectiveness ratings by PVI in completing the experimental navigation task with TG. Guides who did not have enough environmental knowledge (i.e., TG) constantly faced ambiguity in guiding the planned routes and reported challenges in sensing depth and detecting the current location of the PVI. In our tested set-up, remote mobility assistance often led to technical troubles, including poor video quality or connection. In such cases, individuals with environmental knowledge (i.e., UG) were able to give navigation instructions to PVI due to their cognitive maps that compensated for the limited visual and spatial capabilities of video communication. These technical problems might be also consistent in real-world scenarios. As a potential solution, remote navigation services could prioritize connecting a visually impaired user with a remote guide who has environmental knowledge of the user's location.

Approaches to Compensate for the Lack of Environmental Knowledge

Environmental knowledge, however, is not a common talent recruited for by current remote assistance platforms marketed to visually impaired people. In real-world applications, it is highly expected that a PVI user is connected with a trained guide without environmental knowledge like in our experiment. The advanced user interfaces of video communication facilitate a promising approach to support remote workers’ environmental unfamiliarity. Regarding how TG constantly lost track of overall spatial relations, visualizing accurate real-time user position and orientation over the detailed maps could compensate for their challenges with environmental understanding. The use of reconstructed 3D maps is also beneficial to represent a remote indoor environment for supporting spatial awareness of remote guides. Recent computer vision technologies allow us to obtain realistic reconstructed 3D maps from image and depth data [25].

Alternatively, collaborative behaviors of PVI contributed to providing better camera views for remote guides, and that camera angle had a huge impact on the agent's spatial understanding. P8 constantly asked whether her walking pace was appropriate for the sighted partner to follow the video. P5 also cared to turn their body sideways at the intersection to capture a better camera view. Again, suggestions for cooperation awareness would facilitate active contribution from PVI to assist the guides to know the surrounding environments and make effective information delivery.

Limitations of the Study

Our work aims to maximize training opportunities of remote guides because most of the commercially-available applications (e.g., Be My Eyes) involve crowd workers/volunteers who are not always trained for remotely or physically assisting people with visual impairments. We expect other implications if professional remote guides such as Aira agents are recruited as TG. The study, however, has to consider the fact that current remote assistance platforms are limited to subscription areas (i.e., Aira is not available globally). Our analysis of the comparative study also needs to address other environmental settings (e.g., crowded travel paths) to further correlate with the benefits of sighted guidance in ensuring visually impaired people's ability to travel in complex navigation [41]. Additionally, we have followed certain homogeneous conditions to control the performance variability of untrained guides and allowed for various visual abilities and mobility preferences of visually impaired participants. Individual differences need to be accounted for in our analysis.

7 CONCLUSION

This work has presented the first-known investigation of remote guide performances in assisting people with visual impairments in indoor navigation. Based on our comparative analysis of trained and untrained guides as remote guides, we broaden our understanding of verbal description, environmental knowledge requirements, and co-navigation strategies to develop expertise of the remote guides and enhance their qualifications. We extracted social and technical recommendations to maximize training opportunities for untrained remote guides and improve the current design of remote mobility assistance. We hope that our implications will pave the way towards a promising remote guidance platform for people navigating with visual impairments.

ACKNOWLEDGMENTS

We thank all the study participants, and all publications support and staff. This research work was supported by Japan Science and Technology Agency (JST) CREST project.

REFERENCES

FOOTNOTE

1 https://whereby.com/

2Our pilot study explored other ways to mount cameras, such as using a lanyard to hung from the neck [3], having a wearable smart glasses such as in  Aira, or controlling the phone in hand [22]. We did not proceed with these setups because they often caused significant blur in videos, accidental drop of the camera, or an accidental termination of the application.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

CHI '20, April 25–30, 2020, Honolulu, HI, USA

© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-6708-0/20/04…$15.00.
DOI: https://doi.org/10.1145/3313831.3376823