Introduction

Augmented reality (AR) and the Internet of Things (IoT) have received significant attention as the key technologies for making our future living spaces smarter, more responsive, and more interactive, thereby changing our everyday lives [1, 2]. AR is a type of interactive medium that provides a view of the real world augmented by, and/or spatially registered with, useful computer-generated information. It empowers users to understand the world and amplify their intelligence in solving problems and conducting various tasks [3, 4]. In other words, AR offers a convenient approach for users to visualize and interact with physical objects and their associated data. In addition, a spatially registered and visually augmented interface offers a direct and semi-tangible interface, and is thus easy to comprehend and highly useful, particularly for everyday and/or anywhere usage [5]. For example, Microsoft showcased a future AR service using a pair of mixed reality smart glasses, which directly visualize and make the object’s functionality interact with datasets from physical objects and structures within the user’s environment [6].

In addition, recent IoT, as an infrastructure for “everywhere” services, offers an efficient way of managing the necessary and massive amounts of associated data (for example, individual product information) in a distributed and object-centric fashion [7]. IoT refers to a network of everyday physical objects embedded with minimal computing elements for the sensing, collecting, and/or sharing of data, and even controlling the objects themselves, such as electronic products. Such an infrastructure has been touted as the basis for the future smart environments through an intuitive control and context-based services [8].

These two seemingly unrelated concepts, AR and IoT, might have different objectives, but can be complementary to each other along with the potential advantages and expected synergies of integrating them [9]. AR provides an intuitive method for users to visualize and interact with IoT objects and their associated data. In particular, context-aware AR services are made possible by using and tapping into the more refined environment information made available by the IoT infrastructure [10]. In fact, it can also provide a natural environment for combining the convenience of interactive digital information (e.g., AR-enabled) to a more effective, humane, and tangible/physical analog objects/world. In the midst of everything going digital, analog is making a comeback in our daily lives with the recent popularity of printed books, vinyl records, and film-based photos.

In the previous studies conducted in the IoT or AR field, many people suggested everywhere data management and intuitive visualization such as a server-based approach to ubiquitous AR services with everyday physical objects. However, because the object recognition process, equivalent to looking up the content directory, involves complicated feature matching, for a vast number of objects, expanding the AR services to large everyday spaces has been difficult to achieve. In the cloud services for providing an AR service, it was difficult to provide scalability to the IoT object [1]. Thus, many researchers have focused on AR applications to carry and share the useful information connected with physical objects, and an enhanced AR system allows a user to connect to objects.

In line with such thinking, i.e., the idea of synergistic marriage of AR and IoT, this paper presents a new AR shopping framework and experience enabled by an extension of the IoT as a control and product trial interface; in addition, we demonstrate our proposal using an actual prototype system and validate our claims in terms of its improved usability and system performance. We illustrate possible scenarios of shopping in the future with the interactive and smart digital information combined with the analog real world and present that as a proof-of-concept enables to immediately obtain information about shopping items and correctly visualize the information based on the exact location of the item to an AR client. The proof-of-concept implementation is presented as applied to such a “digital–analog” style of shopping. In addition, its usability is assessed experimentally as compared to using the conventional control interface.

We believe and claim that there are key components required to support such a seamless and scalable AR service and experience for IoT-ready products: (1) object-centric data management and visualization, (2) mechanism for accessing, controlling, and interacting with the object, and (3) content exchange interoperability. Figure 1 shows a possible system architecture highlighting the three aforementioned components (for a detailed explanation, refer to later sections). The AR client (a mobile or glass device) can instantly connect to an IoT product, receive relevant object-specific data, control information and associated AR datasets for the given targeted service (e.g., recognizing the object and visualizing product information) and thereby interact directly with the physical object to try it out in situ, called direct control and natural interaction [1]. Thus, in the situation of object-centric data management, the data and/or content can also be uploaded to the objects for adding and creating new IoT services and applications, and AR provides an ideal and natural infrastructure for “everywhere” interaction with physical objects. In the context of shopping, services might include interactively visualizing the usage instruction, negotiating for the price and delivery, and test-driving the product through the device control interface with seamless content operability. Note that advertisement, product control, and the AR tracking datasets shown in Fig. 1 are required to visualize with pre-built and pre-stored AR contents with respect to the object’s functionalities by various styles of AR interaction.

Fig. 1
figure 1

Overall possible IoT + AR architecture for “digi-log” shopping experience

Additionally, object tracking is a fundamental problem in AR. The proposed IoT products can also be to easily apply recognition and tracking for AR. Besides generic data and service content, individual IoT products of interest in the vicinity of the AR client can communicate the information required to recognize and track itself, including the features, algorithm type, and even the physical condition (for example, lighting, distance, or other companion reference object). That is, the AR client is “guided” by the target object itself to localize and track it [1]. Note that, in this scheme, the number of candidates in the matching, i.e., only the candidates in the interaction space of the AR client or user, is relatively low. This, in turn, makes it feasible to use a collective algorithmic method and reduce the number of features, templates, and models in the matching process, further lowering the data requirement.

Figure 2 shows a more detailed scheme of AR-based interaction to support direct control of the shopping objects. AR-based interaction is expected to be much more intuitive, direct, and helpful, e.g., the GUI-based hand-held remote devices or conventional switchable interfaces (e.g., in turning on/off one of the displayed lamps in Fig. 2). For example, for a shopping customer to test products on the ceiling, if possible, one will usually need some help from the salesperson (who is not often readily available). Even if there exists a GUI-based app for such a purpose, there is the nuisance to download it and become familiar with the interface, which would be prohibitive for the millions of different products. Our proposed scheme eliminates many of such mental and physical obstacles with automatic discovery and connection and object-specific information exchange on the spot through a unified and standardized IoT framework. For instance, the AR service client interacts directly with the IoT object of interest in the immediate shopping area, and upon connection, immediately receives context-relevant AR shopping datasets (for tracking or customized service content, among other uses) [1]. Depending on the context, appropriate and available services, such as a simple product information display, appliance control, and an instruction manual, are shown, presented in a proper form (for example, through a mobile GUI, AR glasses, mobile AR, voice, spatially registered AR, or a simple overlay), and interacted with. Thus, we demonstrate our proposal using an actual prototype system for possible shopping scenarios presented as applied to such a “digital–analog” style of shopping with the interactive and smart digital information in the future and validate our claims in terms of its improved usability and system performance with the proof-of-concept implementation.

Fig. 2
figure 2

Future AR interaction scheme to support direct control and testing of shopping items

The rest of this paper is organized as follows. First, we provide a review of related research and requirements of our proposed IoT + AR architecture in Section II. In Sections III and IV, we discuss futuristic use-case scenarios and a detailed data flow of IoT + AR for shopping situations. Section V presents the actual implementation and Section VI presents the validation usability experiment. Finally, in Section VII, we summarize our study and conclude the paper with a discussion and directions for future work.

Related work

The review of related research focuses on three key components and requirements in our proposed IoT + AR architecture, namely the current state-of-the-art on AR/IoT data management, previous approaches to interaction with IoT objects including the few cases of using AR, and standard content representation or system interoperability protocols.

AR data and content representation for physical objects

AR services commonly need to manage generic data and service contents for their constituent objects or augmentation targets, which are physical everyday objects. Herein, we review the current approaches for representing such physical object data for AR use (for example, the architecture and data-handling).

Previous AR systems were implemented as a single application with all of the content and assets embedded in it, using programming libraries [11, 12]. As such, the augmented content of an object tended to be simple (for example, to simply demonstrate the augmentation capability) and unorganized. GPS-equipped mobile and smartphones have allowed for location-based geographical and AR services to be developed, for example, providing guidelines for commercial points of interest and tourism [13, 14].

Such a service has necessitated the separation of content (and its format specifications) and the underlying player to support the notion of “everywhere” content and service, as well as a unified management of content on the server. HTML [15], KML [16], and ARML [17] are markup languages for such purpose. For example, Wikitude proposed the augmented reality markup language (ARML) for location-based services [14]. ARML allows defining geographical points or landmarks of interest and associating GPS coordinates and simple augmentation content (for example, text, logos, and images). Several other content representation methods exist for AR services, but they require either a specific application or content type (for example, video-based [18], AR on-line manual [19], and AR guide [20]) or complicated scripting without sufficient abstraction. However, a standard interoperable content format for representing various comprehensive forms of AR services is yet to be proposed.

In addition, in the near future, billions of physical objects can automatically communicate with computers and interconnect with one another for collective intelligent services [21]. In this context, scalable objects’ naming and addressing and specifying standard content formats in tune with AR-based services is important. One promising direction is the use of the Web to support interactions with physical objects, as exemplified by Google’s Physical Web [22]. Objects possess URLs and can exhibit their own dynamic and cross-platform contents represented in standard languages such as HTML and/or Javascript. Thus, we can envision a future where various AR services will be available under a unified Web framework, that is, the webization of things. For example, Ahn et al. presented a content structure, as an example of an extension to HTML5, for building webized mobile AR applications [23]. This allows a physical object to be referenced and associated with its virtual counterpart as the directly matched result.

In our case, the client AR system receives “feature datasets” and “contents” information for each shopping item from the discovered IoT device in the user proximity in the standard format (e.g., front images of shopping items, product origin, and price) to recognize, visualize, and interact with the item [7]. In addition, information exchanged between the AR client and the shopping item is contained in the IoT device rather than retrieved from an external server. We assume that the future IoT object will have this information (feature datasets for AR tracking, generic contents, UI control structure) as a standard format. We can envision that different IoT objects may contain different AR information depending on its characteristics, for example, functionalities, process interfaces, and operating manuals.

AR data/content storage, management, and indexing for physical objects

The most frequently used method of viewing and interacting with digital objects and products is to use interfaces as provided by hand-held remote controls; more recently, the smartphone and GUI-based interfaces have replaced it quickly [24]. AR provides a tighter augmentation through the process of target object recognition and identification [10]. Although a server-based approach to ubiquitous AR services with everyday physical objects is possible, the AR services for large everyday spaces has been difficult to achieve owing to the object recognition process, complicated feature matching, and content look-up for several objects.

High-performance cloud computing services exist for the fast object matching process and expediting the associated content retrieval in providing an AR service [25]. Nevertheless, it will still be difficult to support the scalability to the level of “everywhere.” An alternative may be to connect to a singular areal server (serving only a particular local area such as a single home) managing only a limited number of objects [26, 27]. This is similar to the concept of fog computing to enable computing services at the edge of the adjacent network for effective data management. For example, Rathore and Park presented a fog-based attack detection framework to detect attacks in IoT. This approach was suggested to solve the problem that cannot produce significant results at the centralized attack detection mechanisms due to scalability, distribution, resource limitations, and low latency [28]. Sharma et al. proposed a fog node architecture to mitigate security attacks for real-time analytic services [29].

Thus, a filtering approach (to reduce the search space), such as broadcasting messages, to nearby clients was proposed [30]. Iglesias et al. suggested a method for identifying and augmenting candidate target objects with contextual data such as the user’s attribute, user-object proximity, relative orientation, resource visibility, and making the final selection manually [30]. Ajanki et al. proposed a similar concept [31]. Unfortunately, there has been no noticeable work on scaling AR services and their efficient data management to large-scale everyday environments (for example, an AR service that operates at home, the workplace, on the street, or in a shopping area). The approaches (including even the future “unified” Web-based solution) outlined above are based on the central network server architecture, as already indicated, and incur a serious performance bottleneck. Therefore, a few studies are attempting to obtain datasets directly from objects close to the user in the same space [7].

As already mentioned, in our scheme, the AR-enabling IoT device itself already stores and contains (in-house) standard “features” information used for the mobile AR clients to recognize/track them and can communicate generic contents for various shopping purposes including the augmented display. Thus, an interactive control of the IoT device is possible on the spot. To select among millions of different objects with a filtering approach, the AR client finds IoT objects (equipped with elementary processing, storage, and network modules) similar to identifying mobile access points, which communicate the necessary AR tracking information to the client. Because there is bound to be only a relatively few target objects around, the AR client can quickly identify (and even track) the objects and retrieve the associated content.

AR interaction for physical objects

The current and most prevalent application of AR offers an excellent control method for in situ object control (or even for remote objects using a remote-controlled camera) [7]. AR can be used to visualize simulations of applied control for previewing or even training purposes [32, 33]. However, there have been only a few attempts of using AR (or even VR) as the control and simulation interface.

As the first proposed result, Rekimoto and Ayatsuka proposed a visual tagging system, called a CyberCode [34], which uses 2D barcodes to identify and detect objects and offers different methods to manipulate physical objects. For example, the user can metaphorically “drag-and-drop” one object onto another to invoke a certain functionality (for example, by dragging and dropping a projector object onto a computer, the computer will retrieve the currently projected slide). Similarly, Heun et al. proposed an AR interface to create new functionalities of smarter objects that have an embedded processor and communication capability [24].

In addition, the Microsoft HoloLens platform suggested and presented a situation that visualizes datasets associated with objects (e.g., motor temperature and door functioning) to reduce the maintenance costs of a particular product (e.g., elevators) [6]. However, datasets in the cloud need to continuously manage the updated information, and when there are many similar objects, it becomes confusing what it is. On the other hand, using AR that contains datasets in each object, it would be more intuitive to visualize information directly at the precise position related to the object.

In addition, Muller et al. proposed an interactive AR-enabled appliance instruction manual [19]. In their prototype, an AR-capable device can interact with an appliance through a pre-established connection. Lifton and Paradiso presented a dual reality system, realizing an interplay between the physical and corresponding mirrored and simulated virtual worlds [35]; here, interactions in the real world were reflected onto the virtual world. Lu proposed a bi-directional mapping technique for enhanced information visualization. For example, when a user turns on an appliance in a real environment (for example, a TV), the attribute of the deployed sensors detects the user’s activity and transmits it to the simulated world. The virtual world can also generate counterpart representations of the real world. This system was developed to realize eco-feedback for energy-saving [26]. More recently, Alce et al. proposed a comparison of three basic AR interaction models (floating icons, floating menu, and WIM) for managing IoT environments, and found that the WIM model stood out as difficult and time-consuming [36, 37]. In our case, we evaluated AR interaction methods using a mobile-type smartphone that connects to the IoT products with pre-configured information about itself in its memory.

Despite the potential of such AR interfaces (e.g., over the conventional remote-controlled types), it is not clear how a consistent and coherent AR interaction framework should be established for “millions” of different objects. In the previous studies mentioned above, AR interfaces are mostly anecdotal and designed in an ad-hoc manner.

Use-case scenarios: shopping with IoT + AR

We illustrate two use-case scenarios using AR-capable shopping objects in terms of emphasizing the effectiveness of our AR-enabling IoT approach, which highlight the three aforementioned key components: (1) object-centric data management and visualization, (2) mechanism for accessing, controlling, and interacting with the object, and (3) content exchange interoperability.

Test driving at the showroom

Sophie enjoys shopping. Wearing her AR glasses, she visits a nearby electronics store to buy a pair of speakers. Because there are many speakers on display, she has a bit of difficulty choosing a pair. She connects directly to these speakers, and her AR glasses present different types of product-related information (for example, the price, dynamic range, and availability) overlaid directly on the products. She still hesitates and decides to further listen to their sound quality. She designates a particular model using her finger (tracked by the AR glass-mounted camera), and the model lends its control interface to Sophie to input her MP3 file for a sound test. The glasses indicate the speaker position among those being tested by other customers and visualizes the sound wave propagation to allow Sophie to experience the surround sound effect and resulting musical quality (see Fig. 3).

Fig. 3
figure 3

Proposed use-case scenario 1: a direct and intuitive interface for in situ object control (as opposed to a traditional GUI-based button interface) [31]

Besides sensing, collecting, and exhibiting useful data, IoT objects are meant to be digitally “controlled” to realize related smart services [35, 36]. In many situations (and in scenario 1 as well), direct in situ control is needed, and AR is a proper interface (for example, versus a simple GUI-based control button interface) because it provides the necessary contextual information to make the task easier and the situation more clearly understood [7]. For instance, IoT devices with connectivity and computing capability can be embedded in an object as sensor systems. Thus, objects themselves can communicate the necessary data such as current sensor information of a physical device to the client on a need-to-know basis (including the information required for recognition and tracking). That is, the data are now delegated and distributed to individual objects in the environment.

Therefore, our approach provides an ideal and natural infrastructure for “everywhere” AR accessibility with physical shopping objects. Note that the data and/or content can also be uploaded to the objects for adding and creating new shopping services and applications. Besides generic data and service content, individual objects of interest in the vicinity of the AR client can communicate information required to recognize, visualize, and interact itself, including the features and shopping contexts.

Step-by-step how-to-use guideline of a product

John wants to find out the steps to printing or copying a paper on a displayed printer/copier in a computer store, but he cannot find the manual to operate the device, and it would probably take a long time to understand the manual even if he had one. He runs an IoT + AR service on his smart pad and aims it toward the printer. The app finds and connects to the printer, and starts tracking and augmenting it with a control interface. After a few clicks, the app augments the printer with the step-by-step instructions on how to operate the printer. Each instruction is graphically overlaid on the corresponding parts for easier understanding. The app shows an AR-based control interface with which John, in a standing position, easily and intuitively controls the product without having to fiddle with the actual device or call the front desk for help. The product can be virtually emulated for John to witness how the printer prints out the paper (see Fig. 4).

Fig. 4
figure 4

Proposed use-case scenario 2: AR emulation of workings of the physical objects (printer) [32]

This scenario illustrates how AR services can be accessed at any time to “everywhere” object and operates in the simulation mode. The client can connect to any object using the assumed standard protocols without the local or remote central server communication. The AR client detects the presence of objects (equipped with elementary processing and operation functionalities) in its vicinity (similar to identifying Wi-Fi access points) [37, 38]. These objects communicate the necessary information to the client with intuitive AR visualization to provide direct overlapping situations, and objects having their own operation can be distributed, stored, and exchanged to the AR client.

Because there is bound to be only relatively target objects around, the client can quickly identify (and even track) the objects and retrieve the associated visualization content. It can be argued that the objects simply need to be organized geographically and managed through a hierarchical network of servers (similar to the case of a geographical service). However, disregarding the enormous number of objects to be handled (even compared to that of geographical objects), there is currently no common technology for accurately recognizing individual objects (which may be mobile) in indoor locations without pre-registration of their tracking features.

Data flow in the IoT + AR shopping service

Figure 5 shows a possible data flow configuration and distributed data management with respect to our suggested IoT + AR approach. This illustrates the AR visualization scheme for “everywhere” shopping services in terms of physical objects. The performance problem in a scaled environment such as IoT is manifested by the amount of time needed to look up and match the target object and handle and/or process associated data and/or content among millions of candidate objects through the network. With IoT objects having their own computational, networking, and storage capabilities, data and/or content can be distributed, stored, and exchanged (even without the Internet infrastructure). Instead, IoT objects distributed to individual objects in the environment can themselves communicate the necessary data to the client including the information required for recognition and tracking). For example, an AR client detects the presence of IoT objects (equipped with elementary processing, storage, and network modules) in its vicinity (similar to identifying Wi-Fi access points). Then, these objects communicate the necessary information to the client. At this time, each IoT object may have different types of information (e.g., A object contains X1, X2; B object has Y1, Y2; and C object includes Z1, Z2). To handle different datasets in each object, individual objects will configure the essential data (feature information for AR recognition and tracking, specific contents, UI control structure) with the standard format in advance [7]. Otherwise, the AR client can be implemented using the algorithm to interpret the configured information individually.

Fig. 5
figure 5

A possible data flow configuration for our proposed AR-enabling IoT approach

Prototype implementation

We developed a proof-of-concept prototype of the proposed AR framework tested on IoT digital clocks and lamps as if sold in the shopping center. In our test environment, we used the Raspberry Pi 3 and beacons integrated into the digital clocks and lamps to pose them as IoT-enabled products on display. The AR client was implemented on a mobile-type smartphone that connects to the IoT products in the vicinity through the beacons. The Raspberry Pi 3 (model B)-embedded board has a small storage capacity and wireless Internet communication (BCM43438) and includes a quad-core 1.2-GHz 64-bit CPU, 1-GB RAM, 100 Base Ethernet, 4 USB 2.0 ports, HDMI, MICRO SD port [39]. IoT products such as Raspberry Pi contain pre-configured information about themselves (e.g., product’s price) and feature sets for AR registration in their memory. The proposal described in the previous section on data and/or content distribution to IoT objects attached to clocks or lamps can be used to easily solve scaling of the recognition and tracking for the AR client. That is, in addition to generic data and service content, individual and different-typed IoT objects of interest in the vicinity of the AR client can communicate the information required to recognize and track themselves, including the features, algorithm type, and even the physical condition (for example, lighting, distance, or other companion reference objects). Note that AR feature datasets extracted from images of shopping items are stored in a known target resource database in a widely used industry standard format (e.g., image feature data format for recognition [11]). For data exchange between the AR client and IoT device, the TCP/IP protocol was used. Figure 6 shows our AR system configuration such as components, functions, execution flow, and control methods with the prototype. In the current implementation, when a participant enters a shopping environment, the presence of nearby IoT-capable objects is detected, and then, the AR-capable objects can be filtered out based on distances between the client and the object and the AR client system receives information in terms of AR tracking and control interfaces (e.g., buttons).

Fig. 6
figure 6

Our AR system configuration in the current implementation, which consists of the mobile AR client and the physical objects to attach IoT computing resources such as the storage and network module

Figures 7 and 8 show a user holding the AR client (smartphone) toward a product on display (receiving necessary information) and being able to directly control them, to turn them on or off. Based on the communicated information, the contents and control interface are spatially overlaid on the desired target product and the interface is dynamically created. Additionally, the HDMI module shown in Fig. 7 was connected to a separate monitor device to test whether the datasets were successfully transferred to the mobile AR client, the LAN module was used to insert AR datasets to the Raspberry Pi or test data transmission, and the USB module is used to connect the mouse input. This makes the choice of the target product, among many choices displayed on the shelf, much easier with the physical context intact. Here, we can envision a future where various IoT services, including even AR, will be available information related to object contents and control interfaces such as the very basic and generic object information to vendor-supplied interactive AR services. This allows a physical object to be referenced and associated with its virtual counterpart. Using interactive contents augmented on the target IoT object with the correctly tracked situation on the object, the AR client can directly access and immediately exchange context information on the IoT object. To develop control UI menus and AR contents, we used Unity3D C# scripting language and Vuforia AR tracking engine.

Fig. 7
figure 7

a Example of using the mobile AR client to try out IoT products and b the IoT module attached to a shopping spot

Fig. 8
figure 8

AR interaction example of in situ/remote operation of the IoT lamp: the AR client provides an easy and intuitive way to turn on or off the lamp without having to control with the actual button or call the front desk for help (e.g., the red lamp in the left picture: the light is off, the same blue same lamp in the right picture: the light is on)

Usability experiments

So far, we have described the motivation, futuristic scenarios, and technical aspect of realizing the IoT + AR platform as applied to the offline interactive shopping situation. The underlying assumption is that our proposed approach is useful and well-received and creates an effective shopping experience. Thus, in this section, we experimentally assess its satisfaction level and usability.

The first experiment analyzed the level of user satisfaction by showing two types of interfaces on a hand-held device: (a) conventional web-based and (b) AR-based. The level of the user’s overall satisfaction (10 participants, average age of 36) was evaluated through a survey question on a 7-point Likert scale. The Wilcoxon test for paired samples revealed that the mean satisfaction score was significantly higher (Z = 2.871, P < 0.05) with the AR (average 6.2) than the conventional interface (average 3.5).

The second experiment investigated the usability aspect, where the user was asked to turn the light of the IoT lamps on and off using (1) conventional switchable GUI interfaces on a hand-held device and (2) AR-based with spatial registration to the target IoT device (Fig. 9c). After the repeated task trials, the subjects were asked to answer a usability survey having questions on ease of use, naturalness, fatigue, speed, and simple preference adapted from that of the NASA TLX [40] (Table 1). The responses were measured on a 7-level Likert scale. In addition, we measured the error rate in terms of accuracy, e.g., the number of incorrectly selected and operated cases. Sixteen paid subjects (12 men and 4 women) with a mean age of 37 years participated and were divided into two groups to experience each condition for the between-subject measurement (GUI vs. IoT + AR).

Fig. 9
figure 9

Various forms of interactions with IoT objects: a experimental test environment to control lights on the ceiling, b conventional GUI-based switchable interface to turn lights on or off, and c our proposed interface to operate directly through AR visualization

Table 1 Subjective usability survey assessing the ease of use, naturalness, fatigue, speed, and simple preference

Figure 10 shows the result of the subjective and self-reported assessment of usability between GUI and IoT + AR operation. In all usability categories, the IoT + AR interface showed higher scores and the users significantly preferred IoT + AR, demonstrating the expected advantage of in situ augmentation. Particularly, in the fatigue of the result, the experiment resulted in the IoT + AR group showed higher scores. This is because the subjects still preferred the GUI method to be more familiar. On the other hand, interactive AR interfaces placed near the object was helpful to quickly control the menu (See Speed in our result). As for the operational error rates, the GUI-based group made 32 errors (out of 80), while the IoT + AR group made only three. The subjects relied heavily on the spatial and visual context as provided only by the IoT + AR-based interaction, leading to a much more intuitive, direct, and helpful object control. Furthermore, with the AR-enabling IoT interaction method, information obtained directly from the object can be utilized to adaptively tailor a particular interface to the given objects given the client platform, whereas a GUI and/or menu will suffice for a simple option selection.

Fig. 10
figure 10

Results of the usability evaluation survey. The Wilcoxon test for paired samples showed a statistically significant difference between the GUI and IoT + AR interface on all categories (Z values are − 3.2014, − 3.5752, − 3.5399, − 3.5706, − 3.5502 from the left, and the P value is less than 0.05 in all comparative samples.)

Conclusions and future work

In this paper, we described how the current AR infrastructure can be extended to include smarter and more effective user interactions for physical objects in real analog shopping situations. Individual or groups of physical IoT objects can be imbued with data and/or content in a distributed manner and efficiently utilized by the AR client along with the potential advantages and expected synergies of integrating them. The distribution makes it possible to scale and customize interaction techniques such as AR. Our approach leverages on the IoT control interface for physical objects, and intuitive and natural AR interaction in a complementary way, also combining the digital and analog worlds. Thus, our notable approaches to their integration into the IoT framework as a control interface can enable the given AR service to significantly reduce latency. Through the pilot experiments, we also partly validated our claims of the synergy and advantages of our proposal. An outstanding issue is that the contents and data exchange protocol need to be standardized for true scalability. In the future, we will continue to further demonstrate our approach to a large-scale shopping center and investigate how to effectively put AR information in such a large space related to a real application in a shopping experience to provide an impression of our work. Especially, we plan to validate our approach in terms of AR object recognition for improving the shopping experience. In addition, we will develop a particular AR interface that can be adaptively tailored to such objects given the client platform.