1 Introduction
Multi-level buildings such as shopping malls, airports, and museums have large and complex structures. Thus visitors often get lost in such buildings and have problems finding their way around public buildings [
3,
12]. When people learn a route, they recognize visual landmarks around the route and/or create a cognitive map of the environment [
2,
24]. Thus, previous works have proposed route viewer systems that present the users with both a map and visual cues that consist of photo streams [
6,
10,
11,
14,
31] or a navigation movie captured along the route [
4,
14,
34].
As a route viewer system, researchers have presented
Movie map systems, which allow users to watch navigation movies along any route they choose [
5,
17,
18,
27,
28,
32,
35]. Movie map systems generate a navigation movie along the user-selected route by connecting and switching multiple movie sequences captured in the environment. These systems map movie sequences on a two-dimensional (2D) map and estimate the positions of intersections where switch movie sequences by using metadata (e.g.,Global Positioning System (GPS) information [
17]) or computer vision technologies (e.g.,visual Simultaneous Localization and Mapping (SLAM) [
27,
28] or feature matching [
18]). Unlike 2D environments such as city areas, which were the target of the existing movie map systems, multi-level buildings have overlapping floors. In such three-dimensional (3D) environments, if a system maps movie sequences on a 2D map like existing systems, the system can not detect intersections properly and fail to connect movie sequences. Thus, existing systems can not be introduced in multi-level buildings.
This work presents 3DMovieMap, an interactive route viewer for multi-level indoor buildings. As a setup, the system first estimates the camera positions and orientations from an equirectangular movie by using Visual SLAM. The system then identifies the captured floor and generates a movie map by judging whether the camera paths intersect on each floor. Finally, we manually align the generated movie map on the existing floor maps. When visitors use the system, they select some waypoints they want to visit on the movie map. The system then automatically generates the shortest path that visits all selected points by using Dijkstra’s algorithm. Finally, the system generates a navigation movie along the path by connecting multiple equirectangular sequences and extracting their perspective views. For generating smooth turning views, the system adjusts the orientations of the perspective views so that the camera orientations of the sequences at the connection point are matched.
In this work, we prepared four movie maps of public buildings, an international airport (terminals one and two), a science museum, and a university building. We asked two participants, including a user and a staff of the science museum, to use our system and collected their feedback about our system. They generally agreed that 3DMovieMap allowed users to easily learn their path even in a multi-level building, and the quality of turning views was enough to grasp the path. They also provided suggestions for improving the system and introducing it to the science museum. In addition, we will be releasing an open dataset of 8K equirectangular movies captured in the science museum.
4 Equirectangular Video Dataset of A Science Museum
This work captured equirectangular movies in four environments (Table
1). Among them, we will release equirectangular movies taken at the science museum as an open dataset on the web site of the science museum
1. This dataset is freely available to anyone for research purposes. This dataset consists of three types of equirectangular movies.
1) Exploring the whole museum: This movie was captured while walking through the whole area of the museum that is accessible to visitors (the first, third, fifth, and seventh floors). This video was used for our 3DMovieMap (Table
1, Fig.
11–A).
2) Exploring each exhibition area: These movies were captured while walking through the exhibition area on the third and fifth floors (Fig.
11–B).
3) Exploring each exhibition: These movies were captured while walking through the inside of each exhibit (Fig.
11–C). The total number of exhibitions is 16 (the third floor 7 and the fifth floor: 9). For all equirectangular movies, we prepared four types of movies featured combinations of with/without stabilization and two resolutions (4K and 8K).
This dataset has two unique points. First, The science museum has distinctive architecture as a part of its exhibitions. For example, a circular walkway (Fig.
6–B) goes around a globe-like display (Fig.
6–A), and escalators are in an open-air stairwell that goes from the first floor to the seventh floor (Fig.
6–D). Equirectangular movies that capture such distinctive architecture might be an interesting resource for computer vision research such as visual SLAM [
29]. Second, the movies captured various objects such as a dome-shaped theater (Fig.
6–C), a rocket engine (Fig
12–A), and a scale model of the ISS living quarters (Fig.
12–B). High-resolution movies that capture these unique objects can be used for computer vision research such as 3D reconstructions and view synthesis [
19].
5 Preliminary Study
5.1 Procedure
As a preliminary study to collect feedback on the system for improving our system, we asked a user (P1, Female, 30 years old) to use the system. We also asked a museum staff member of the science museum (P2, Female, 31 years old) to provide feedback on our system from the perspective of facility managers. P1 has not visited all of the buildings for which the 3DMovieMap was created. P2 is familiar with the science museum, but she has not visited the international airport buildings and the university building. We first provided an overview of the study and described the interface (10 minutes). Then, we asked participants to watch navigation movies by using the system (about 20 minutes). We did not specify paths for viewing navigation movies, and participants were free to operate the system and watch navigation movies. Finally, we conducted a semi-structured interview session for about 30 minutes to receive qualitative feedback. Specifically, we first asked the participants about the advantages and disadvantages of our system. Then, we asked for suggestions to improve our system.
5.2 Results
Participants generally agreed that the system allowed users to learn their path in a multi-level building easily: A1:“Since I may lose my orientation in public buildings, I may not be able to find the direction to the destination even if I look at the map of the buildings. For example, when I reach the floor where my destination is located by escalator or stairs, I may not know in which direction my destination is located. The system allowed me to learn the path visually. So I can understand the correct direction easily. The quality of the turning views was enough to grasp the path.” (P1); and A2:“Our museum has introduced google map street view service. When comparing the service, the system allowed us to generate a course to walk over most spots in the museum more easily and grasp the overview in a shorter time. In addition, it was easier to understand the 3D and vertical structures such as the connection between floors (the circular walkway).” (P2) In addition, a museum staff member also agreed that the system could be helpful for museum visitors: A3:“The simulated walk is helpful to find new places that interest visitors.”
P1 (a user) suggested enabling the system to highlight visual landmarks in the generated navigation movies to improve the system: A4:“When I walk in public buildings, I look at some visual landmarks, such as signage, stores, and objects, to determine which direction to go. So, when the system shows a turning view, how about highlighting such visual landmarks in the video, rather than simply slowing the video down? For example, the system can zoom in on a landmark or draw a rectangle to emphasize it.” P2 (a museum staff member) provided the following suggestions for introducing the system in their museum: A5:“The current system generates the shortest path, but it would be good to prioritize routes we would like visitors to walk through. For example, in our museum, visitors can move between the third and fifth floors via the circular walkway or an escalator. Unless it is a very long way, we want them to walk the circular walkway. In addition, I think it would be good to avoid taking the route that users have already watched so that visitors can see more paths and contents of the museum.”; and A6:“It would be nice if the system slowed the video down when moving near an exhibit and generated a movie with the camera pointed in the direction of the exhibit (rather than in the walking direction).”
7 Conclustion
We proposed an interactive route viewer system, 3DMovieMap, that aims to provide users with navigation movies along any route they choose in multi-level buildings such as a science museum, airport, and university building. The system first estimates the camera positions and orientations of an equirectangular movie using visual SLAM and detects the floors of each frame based on the results of the SLAM. The system then generates a 3D structured movie map by detecting intersections for each floor. When users select some waypoints they want to visit on the interface, the system calculates the shortest path that visits all selected points. Then, the system generates a smooth navigation movie along the path by connecting multiple equirectangular movie sequences and extracting perspective views while controlling the camera angle to match the angles of movie sequences at the connection point. We constructed four movie maps of public buildings (a science museum two international airport terminals, and a university building) and asked two participants, a user and a science museum staff member, to use the system. They generally agreed that 3DMovieMap allowed users to easily learn their path even in a multi-level building, and the quality of turning views was enough to grasp the path. In the future, we will extend our system based on the feedback from the participants and conduct a user study to evaluate its effectiveness in learning routes in multi-floor buildings. For example, we have plans to highlight visual landmarks in the navigation movie and recommend a path based on the contents of the floor or the user’s past navigation video viewing history, except the shortest path. In addition, we will be releasing an open dataset of 8K equirectangular movies captured in the science museum.