CN117999534A

CN117999534A - Apparatus, method and graphical user interface for interacting with a three-dimensional environment

Info

Publication number: CN117999534A
Application number: CN202280064296.6A
Authority: CN
Inventors: I·帕斯特拉纳文森特; J·R·达斯科拉; C·D·麦肯齐; J·钱德; S·O·勒梅; K·E·S·鲍利; D·D·达尔甘; Z·C·泰勒
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2021-09-24
Filing date: 2022-09-22
Publication date: 2024-05-07
Also published as: CN118672393A

Abstract

A computer system displays a first user interface object and a second user interface object in a three-dimensional environment. The first user interface object and the second user interface object have a first spatial relationship and a second spatial relationship, respectively, with a first anchor location and a second anchor location corresponding to a position of a user's hand in a physical environment. When the first user interface object and the second user interface object are displayed in the three-dimensional environment, the computer system detects movements of the user's hand in the physical environment corresponding to translational and rotational movements of the user's hand relative to a point of view, and in response, translates the first user interface object and the second user interface object relative to the point of view according to the translational movements of the user's hand, and rotates the first user interface object relative to the point of view without rotating the second user interface object according to the rotational movements of the user's hand.

Description

Apparatus, method and graphical user interface for interacting with a three-dimensional environment

Related patent application

The present application is a continuation of U.S. patent application Ser. No. 17/948,117, filed on 9/2022, which claims priority from U.S. provisional application Ser. No. 63/248,370, filed on 24/2021, 9, which is incorporated herein by reference in its entirety.

Technical Field

The present disclosure relates generally to computer systems having a display generating component and one or more input devices that provide a computer-generated augmented reality (XR) experience, including but not limited to electronic devices that provide virtual reality and mixed reality experiences via the display generating component.

Background

In recent years, the development of computer systems for virtual reality, augmented reality, and augmented reality has increased significantly. Exemplary augmented reality and augmented reality environments include at least some virtual elements that replace or augment the physical world. Input devices (such as cameras, controllers, joysticks, touch-sensitive surfaces, and touch screen displays) for computer systems and other electronic computing devices are used to interact with the virtual/augmented reality environment. Exemplary virtual elements include virtual objects (including digital images, videos, text, icons, control elements (such as buttons), and other graphics).

Methods and interfaces for interacting with environments (e.g., applications, augmented reality environments, mixed reality environments, virtual reality environments, and augmented reality environments) that include at least some virtual elements are cumbersome, inefficient, and limited. For example, providing a system for insufficient feedback of actions associated with virtual objects, a system that requires a series of inputs to achieve desired results in a virtual/augmented reality environment, and a system in which virtual objects are complex, cumbersome, and error-prone to manipulate, can create a significant cognitive burden on the user and detract from the feel of the virtual/augmented reality environment. In addition, these methods take longer than necessary, wasting energy. This latter consideration is particularly important in battery-powered devices.

Disclosure of Invention

Accordingly, there is a need for a computer system with improved methods and interfaces to provide a user with a computer-generated experience, thereby making user interactions with the computer system more efficient and intuitive for the user. The disclosed systems, methods, and user interfaces reduce or eliminate the above-described drawbacks and other problems associated with user interfaces for computer systems having a display generating component and one or more input devices. Such systems, methods, and interfaces optionally supplement or replace conventional systems, methods, and user interfaces for providing an augmented reality experience to a user. Such methods and interfaces reduce the number, extent, and/or nature of inputs from a user by helping the user understand the association between the inputs provided and the response of the device to those inputs, thereby forming a more efficient human-machine interface.

According to some embodiments, a method is performed at a computer system in communication with a first display generating component and one or more input devices. The method includes displaying, via the first display generating component, a first user interface object and a second user interface object in a first view of a three-dimensional environment, wherein respective characteristic locations of the first user interface object in the three-dimensional environment have a first spatial relationship with a first anchor location in the three-dimensional environment corresponding to a position of a first hand of a user in a physical environment, and respective characteristic locations of the second user interface object in the three-dimensional environment have a second spatial relationship with the first anchor location in the three-dimensional environment corresponding to the position of the first hand of the user in the physical environment, and wherein the first user interface object comprises one or more user interface objects in a predetermined layout. The method further includes detecting, via the one or more input devices, a first movement of the first hand in the physical environment when the first user interface object and the second user interface object are displayed in the first view of the three-dimensional environment, the first movement of the first hand corresponding to translational and rotational movement relative to a viewpoint corresponding to the first view of the three-dimensional environment. The method further includes, in response to detecting the first movement of the first hand in the physical environment: translating the first user interface object and the second user interface object in the three-dimensional environment relative to the viewpoint according to the translational movement of the first hand in the physical environment; and rotating the first user interface object in the three-dimensional environment relative to the viewpoint in accordance with the rotational movement of the first hand in the physical environment without rotating the second user interface object in the three-dimensional environment.

In some embodiments, a method is performed at a computer system in communication with a first display generating component and one or more input devices. The method includes displaying, via the first display generating component, a view of a communication session between a first user of the first display generating component and a second user of a second display generating component different from the first display generating component, wherein the view of the communication session includes a view of a three-dimensional environment including at least some virtual content shared between the first user and the second user, wherein displaying the view of the three-dimensional environment of the communication session includes displaying a respective representation of the second user in the view of the three-dimensional environment, and wherein the respective representation of the second user is determined based on a virtual spatial relationship between the first user and the second user in the three-dimensional environment. The method further includes displaying, via the first display generating component, a user interface for controlling the communication session when the view of the communication session is displayed, wherein the user interface for controlling the communication session includes a first control object that, when activated by the first user, causes the first computer system to perform a respective operation that modifies an appearance of a three-dimensional region of the three-dimensional environment. The method further includes detecting a first user input activating the first control object when the view of the three-dimensional environment is displayed. The method further includes, in response to detecting the first user input activating the first control object: modifying the appearance of the three-dimensional region of the three-dimensional environment for the first user of the first display generating component; and initiating a process for modifying the appearance of the three-dimensional region of the three-dimensional environment displayed at the second display generating component for the second user of the second display generating component.

In some embodiments, a method is performed at a computer system in communication with a first display generating component and one or more input devices. The method includes displaying a first three-dimensional computer-generated experience in a view of a three-dimensional environment. The method further includes detecting a first event when the first three-dimensional computer-generated experience is displayed in the view of the three-dimensional environment. The method further includes, in response to detecting the first event, displaying a first user interface object in the view of the three-dimensional environment, wherein the first user interface object includes one or more user interface objects that, when activated, cause the computer system to perform a respective operation that modifies at least one aspect of the display of the first computer-generated experience in the three-dimensional environment. The method further includes detecting that the user's attention is no longer directed to the first user interface object when the first user interface object is displayed in the view of the three-dimensional environment. The method further includes, in response to detecting that the attention of the user is no longer directed to the first user interface object: stopping displaying at least a portion of the first user interface object in the view of the three-dimensional environment in accordance with determining that the first user interface object is a first object type having a first spatial relationship with respect to a viewpoint of the view of the three-dimensional environment; and in accordance with a determination that the first user interface object is a second object type having a second spatial relationship different from the first spatial relationship with respect to the three-dimensional environment, maintaining a display of the first user interface object in the three-dimensional environment.

According to some embodiments, the computer system includes or communicates with: a display generation component (e.g., a display, projector, and/or head mounted display), one or more input devices (e.g., one or more cameras, a touch-sensitive surface, optionally one or more sensors for detecting contact strength with the touch-sensitive surface), optionally one or more audio output components, optionally one or more haptic output generators, one or more processors, and a memory storing one or more programs; the one or more programs are configured to be executed by the one or more processors, and the one or more programs include instructions for performing or causing the operations of performing any of the methods described herein. According to some embodiments, a non-transitory computer-readable storage medium has stored therein instructions that, when executed by a computer system having a display generating component, one or more input devices (e.g., one or more cameras, a touch-sensitive surface, optionally one or more sensors for detecting contact strength with the touch-sensitive surface), optionally one or more audio output components, and optionally one or more tactile output generators, cause the device to perform any of the methods described herein or cause the operations of any of the methods described herein to be performed. According to some embodiments, a graphical user interface on a computer system having a display generating component, one or more input devices (e.g., one or more cameras, a touch-sensitive surface, optionally one or more sensors for detecting intensity of contact with the touch-sensitive surface), optionally one or more audio output components, optionally one or more haptic output generators, a memory, and one or more processors for executing one or more programs stored in the memory, includes one or more elements of the elements displayed in any of the methods described herein, the one or more elements updated in response to an input, as described in any of the methods described herein. According to some embodiments, a computer system includes: a display generating component, one or more input devices (e.g., one or more cameras, a touch-sensitive surface, optionally one or more sensors for detecting intensity of contact with the touch-sensitive surface), optionally one or more audio output components, and optionally one or more tactile output generators; and means for performing or causing the operations of any one of the methods described herein. According to some embodiments, information processing apparatus for use in a computer system having a display generating component, one or more input devices (e.g., one or more cameras, a touch-sensitive surface, optionally one or more sensors for detecting intensity of contact with the touch-sensitive surface), optionally one or more audio output components, and optionally one or more tactile output generators, comprises means for performing or causing to be performed the operations of any of the methods described herein.

Accordingly, improved methods and interfaces are provided for computer systems having display generating components for interacting with a three-dimensional environment and facilitating user use of the computer system in interacting with the three-dimensional environment, thereby improving the effectiveness, efficiency, and user safety and satisfaction of such computer systems. Such methods and interfaces may supplement or replace conventional methods for interacting with a three-dimensional environment and facilitating user use of a computer system in interacting with the three-dimensional environment.

It is noted that the various embodiments described above may be combined with any of the other embodiments described herein. The features and advantages described in this specification are not all-inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.

Drawings

For a better understanding of the various described embodiments, reference should be made to the following detailed description taken in conjunction with the following drawings, in which like reference numerals designate corresponding parts throughout the several views.

FIG. 1 is a block diagram illustrating an operating environment of a computer system for providing an augmented reality (XR) experience, according to some embodiments.

FIG. 2 is a block diagram illustrating a controller of a computer system configured to manage and coordinate a user's XR experience, according to some embodiments.

FIG. 3 is a block diagram illustrating a display generation component of a computer system configured to provide a visual component of an XR experience to a user, according to some embodiments.

FIG. 4 is a block diagram illustrating a hand tracking unit of a computer system configured to capture gesture inputs of a user, according to some embodiments.

Fig. 5 is a block diagram illustrating an eye tracking unit of a computer system configured to capture gaze input of a user, according to some embodiments.

Fig. 6 is a flow diagram illustrating a flash-assisted gaze tracking pipeline in accordance with some embodiments.

Fig. 7A-7D are block diagrams illustrating the display of a user interface object at a first respective location in a three-dimensional environment corresponding to a location at or near a user's hand in a physical environment, according to some embodiments.

Fig. 7E-7F are block diagrams illustrating controls displayed for a user when the user is engaged in a shared communication session according to some embodiments.

Fig. 7G-7J are block diagrams illustrating user interface objects in a modal user interface object displayed to a user while the user is focusing on the modal user interface object, according to some embodiments.

FIG. 8 is a flow chart of a method for displaying a first user interface object and a second user interface object in respective spatial relationships relative to an anchor location corresponding to a user's hand, according to some embodiments.

Fig. 9 is a flow chart of a method for modifying the appearance of a shared view of a communication session between multiple users, in accordance with some embodiments.

FIG. 10 is a flowchart of a method for determining whether to stop displaying at least a portion of a first user interface object in a view of a three-dimensional environment or to maintain display of the first user interface object in the three-dimensional environment when a user's attention is no longer directed to the user interface object, according to some embodiments.

Detailed Description

According to some embodiments, the present disclosure relates to a user interface for providing a computer-generated augmented reality (XR) experience to a user.

The systems, methods, and GUIs described herein improve user interface interactions with virtual/augmented reality environments in a variety of ways.

In some embodiments, a computer system displays a first user interface object and a second user interface object in respective spatial relationships relative to an anchor location corresponding to a user's hand, and in response to detecting movement of the user's hand, the computer system translates both the first user interface object and the second user interface object in accordance with translational movement of the user's hand. According to some embodiments, the computer system rotates the first user interface object but not the second user interface object according to a rotational movement of the user's hand.

In some embodiments, a computer system modifies the appearance of a shared view of a communication session corresponding to a shared three-dimensional environment between a plurality of users. For example, the computer system displays selectable controls that allow a user to adjust various applications and other settings related to a shared three-dimensional environment shared with other users during a communication session.

In some embodiments, a computer system displays user interface elements for a user in a three-dimensional environment and automatically removes certain user interface elements that have not been placed in the three-dimensional environment in response to the user looking away from the user interface elements. For example, when the user looks away from the user interface element, the user interface element that has been placed in and anchored to the three-dimensional environment is maintained, while when the user is not paying attention to the user interface element, the user interface element that has not been placed in and anchored to the three-dimensional environment is removed from the current view.

Fig. 1-6 provide a description of an exemplary computer system for providing an XR experience to a user. The user interfaces in fig. 7A to 7J are used to illustrate the processes in fig. 8 to 10, respectively.

The processes described below enhance operability of the device through various techniques and make the user-device interface more efficient (e.g., by helping the user provide appropriate input and reducing user error in operating/interacting with the device), including by providing improved visual, audible, and/or tactile feedback to the user, reducing the number of inputs required to perform the operation, providing additional control options without the user interface becoming cluttered with additional displayed controls, performing the operation when a set of conditions has been met without further user input and/or additional techniques. These techniques also reduce power usage and extend battery life of the device by enabling a user to use the device faster and more efficiently.

In some embodiments, as shown in fig. 1, an XR experience is provided to a user via an operating environment 100 comprising a computer system 101. Computer system 101 includes a controller 110 (e.g., a processor or remote server of a portable electronic device), a display generation component 120 (e.g., a Head Mounted Device (HMD), a display, a projector, and/or a touch screen), one or more input devices 125 (e.g., eye tracking device 130, hand tracking device 140, other input devices 150), one or more output devices 155 (e.g., speakers 160, haptic output generator 170, and other output devices 180), one or more sensors 190 (e.g., image sensor, light sensor, depth sensor, haptic sensor, orientation sensor, proximity sensor, temperature sensor, position sensor, motion sensor, and/or speed sensor), and optionally one or more peripheral devices 195 (e.g., home appliances and/or wearable devices). In some implementations, one or more of the input device 125, the output device 155, the sensor 190, and the peripheral device 195 are integrated with the display generating component 120 (e.g., in a head-mounted device or a handheld device).

In describing an XR experience, various terms are used to refer differently to several related but different environments that a user may sense and/or interact with (e.g., interact with inputs detected by computer system 101 that generated the XR experience, such inputs causing the computer system that generated the XR experience to generate audio, visual, and/or tactile feedback corresponding to various inputs provided to computer system 101). The following are a subset of these terms:

physical environment: a physical environment refers to a physical world in which people can sense and/or interact without the assistance of an electronic system. Physical environments such as physical parks include physical objects such as physical trees, physical buildings, and physical people. People can directly sense and/or interact with a physical environment, such as by visual, tactile, auditory, gustatory, and olfactory.

And (3) augmented reality: in contrast, an augmented reality (XR) environment refers to a fully or partially simulated environment in which people perceive and/or interact via an electronic system. In XR, a subset of the physical movements of the person, or a representation thereof, is tracked, and in response, one or more characteristics of one or more virtual objects simulated in the XR environment are adjusted in a manner consistent with at least one physical law. For example, an XR system may detect a person's head rotation and, in response, adjust the graphical content and sound field presented to the person in a manner similar to the manner in which such views and sounds change in a physical environment. In some cases (e.g., for reachability reasons), the adjustment of the characteristics of the virtual object in the XR environment may be made in response to a representation of the physical motion (e.g., a voice command). A person may utilize any of his sensations to sense and/or interact with XR objects, including vision, hearing, touch, taste, and smell. For example, a person may sense and/or interact with audio objects that create a 3D or spatial audio environment that provides a perception of point audio sources in 3D space. As another example, an audio object may enable audio transparency that selectively introduces environmental sounds from a physical environment with or without computer generated audio. In some XR environments, a person may sense and/or interact with only audio objects.

Examples of XRs include virtual reality and mixed reality.

Virtual reality: a Virtual Reality (VR) environment refers to a simulated environment designed to be based entirely on computer-generated sensory input for one or more senses. The VR environment includes a plurality of virtual objects that a person can sense and/or interact with. For example, computer-generated images of trees, buildings, and avatars representing people are examples of virtual objects. A person may sense and/or interact with virtual objects in the VR environment through a simulation of the presence of the person within the computer-generated environment and/or through a simulation of a subset of the physical movements of the person within the computer-generated environment.

Mixed reality: in contrast to VR environments designed to be based entirely on computer-generated sensory input, a Mixed Reality (MR) environment refers to a simulated environment designed to introduce sensory input from a physical environment or a representation thereof in addition to including computer-generated sensory input (e.g., virtual objects). On a virtual continuum, a mixed reality environment is any condition between, but not including, a full physical environment as one end and a virtual reality environment as the other end. In some MR environments, the computer-generated sensory input may be responsive to changes in sensory input from the physical environment. In addition, some electronic systems for rendering MR environments may track the position and/or orientation relative to the physical environment to enable virtual objects to interact with real objects (i.e., physical objects or representations thereof from the physical environment). For example, the system may cause movement such that the virtual tree appears to be stationary relative to the physical ground.

Examples of mixed reality include augmented reality and augmented virtualization.

Augmented reality: an Augmented Reality (AR) environment refers to a simulated environment in which one or more virtual objects are superimposed over a physical environment or a representation of a physical environment. For example, an electronic system for presenting an AR environment may have a transparent or translucent display through which a person may directly view the physical environment. The system may be configured to present the virtual object on a transparent or semi-transparent display such that a person perceives the virtual object superimposed over the physical environment with the system. Alternatively, the system may have an opaque display and one or more imaging sensors that capture images or videos of the physical environment, which are representations of the physical environment. The system combines the image or video with the virtual object and presents the composition on an opaque display. A person utilizes the system to indirectly view the physical environment via an image or video of the physical environment and perceive a virtual object superimposed over the physical environment. As used herein, video of a physical environment displayed on an opaque display is referred to as "pass-through video," meaning that the system captures images of the physical environment using one or more image sensors and uses those images when rendering an AR environment on the opaque display. Further alternatively, the system may have a projection system that projects the virtual object into the physical environment, for example as a hologram or on a physical surface, such that a person perceives the virtual object superimposed on top of the physical environment with the system. An augmented reality environment also refers to a simulated environment in which a representation of a physical environment is transformed by computer-generated sensory information. For example, in providing a passthrough video, the system may transform one or more sensor images to apply a selected viewing angle (e.g., a viewpoint) that is different from the viewing angle captured by the imaging sensor. As another example, the representation of the physical environment may be transformed by graphically modifying (e.g., magnifying) portions thereof such that the modified portions may be representative but not real versions of the original captured image. For another example, the representation of the physical environment may be transformed by graphically eliminating or blurring portions thereof.

Enhanced virtualization: enhanced virtual (AV) environment refers to a simulated environment in which a virtual environment or computer-generated environment incorporates one or more sensory inputs from a physical environment. The sensory input may be a representation of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but the face of a person is realistically reproduced from an image taken of a physical person. As another example, the virtual object may take the shape or color of a physical object imaged by one or more imaging sensors. For another example, the virtual object may employ shadows that conform to the positioning of the sun in the physical environment.

Hardware: there are many different types of electronic systems that enable a person to sense and/or interact with various XR environments. Examples include head-mounted systems, projection-based systems, head-up displays (HUDs), vehicle windshields integrated with display capabilities, windows integrated with display capabilities, displays formed as lenses designed for placement on a human eye (e.g., similar to contact lenses), headphones/earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smart phones, tablet computers, and desktop/laptop computers. The head-mounted system may have one or more speakers and an integrated opaque display. Alternatively, the head-mounted system may be configured to accept an external opaque display (e.g., a smart phone). The head-mounted system may incorporate one or more imaging sensors for capturing images or video of the physical environment, and/or one or more microphones for capturing audio of the physical environment. The head-mounted system may have a transparent or translucent display instead of an opaque display. The transparent or translucent display may have a medium through which light representing an image is directed to the eyes of a person. The display may utilize digital light projection, OLED, LED, uLED, liquid crystal on silicon, laser scanning light sources, or any combination of these techniques. The medium may be an optical waveguide, a holographic medium, an optical combiner, an optical reflector, or any combination thereof. In one embodiment, the transparent or translucent display may be configured to selectively become opaque. Projection-based systems may employ retinal projection techniques that project a graphical image onto a person's retina. The projection system may also be configured to project the virtual object into the physical environment, for example as a hologram or on a physical surface. In some embodiments, the controller 110 is configured to manage and coordinate the XR experience of the user. In some embodiments, controller 110 includes suitable combinations of software, firmware, and/or hardware. The controller 110 is described in more detail below with reference to fig. 2. In some implementations, the controller 110 is a computing device that is in a local or remote location relative to the scene 105 (e.g., physical setting/environment). For example, the controller 110 is a local server located within the scene 105. In another example, the controller 110 is a remote server (e.g., a cloud server, a central server, or another server) located outside of the scene 105. In some implementations, the controller 110 is communicatively coupled with the display generation component 120 (e.g., HMD, display, projector, and/or touch-screen) via one or more wired or wireless communication channels 144 (e.g., bluetooth, IEEE 802.11x, IEEE 802.16x, IEEE 802.3x, etc.). In another example, the controller 110 is included within a housing (e.g., a physical enclosure) of the display generation component 120 (e.g., an HMD or portable electronic device including a display and one or more processors), one or more of the input devices 125, one or more of the output devices 155, one or more of the sensors 190, and/or one or more of the peripheral devices 195, or shares the same physical housing or support structure with one or more of the above.

In some embodiments, display generation component 120 is configured to provide an XR experience (e.g., at least a visual component of the XR experience) to a user. In some embodiments, display generation component 120 includes suitable combinations of software, firmware, and/or hardware. The display generating section 120 is described in more detail below with respect to fig. 3. In some embodiments, the functionality of the controller 110 is provided by and/or combined with the display generating component 120.

According to some embodiments, display generation component 120 provides an XR experience to a user when the user is virtually and/or physically present within scene 105.

In some embodiments, the display generating component is worn on a portion of the user's body (e.g., on his/her head, on his/her hand). As such, display generation component 120 includes one or more XR displays provided for displaying XR content. For example, in various embodiments, the display generation component 120 encloses a field of view of a user. In some embodiments, display generation component 120 is a handheld device (such as a smart phone or tablet computer) configured to present XR content, and the user holds the device with a display facing the user's field of view and a camera facing scene 105. In some embodiments, the handheld device is optionally placed within a housing that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., tripod) in front of the user. In some embodiments, display generation component 120 is an XR room, housing, or room configured to present XR content, wherein the user does not wear or hold display generation component 120. Many of the user interfaces described with reference to one type of hardware for displaying XR content (e.g., a handheld device or a device on a tripod) may be implemented on another type of hardware for displaying XR content (e.g., an HMD or other wearable computing device). For example, a user interface showing interactions with XR content triggered based on interactions occurring in a space in front of a handheld device or a tripod-mounted device may similarly be implemented with an HMD, where the interactions occur in the space in front of the HMD and responses to the XR content are displayed via the HMD. Similarly, a user interface showing interaction with XR content triggered based on movement of a handheld device or tripod-mounted device relative to a physical environment (e.g., a scene 105 or a portion of a user's body (e.g., a user's eye, head, or hand)) may similarly be implemented with an HMD, where the movement is caused by movement of the HMD relative to the physical environment (e.g., the scene 105 or a portion of the user's body (e.g., a user's eye, head, or hand)).

While relevant features of the operating environment 100 are shown in fig. 1, those of ordinary skill in the art will recognize from this disclosure that various other features are not shown for the sake of brevity and so as not to obscure more relevant aspects of the exemplary embodiments disclosed herein.

Fig. 2 is a block diagram of an example of a controller 110 according to some embodiments. While certain specific features are shown, those of ordinary skill in the art will appreciate from the disclosure that various other features are not shown for the sake of brevity and so as not to obscure more pertinent aspects of the embodiments disclosed herein. To this end, as a non-limiting example, in some embodiments, the controller 110 includes one or more processing units 202 (e.g., microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), graphics Processing Units (GPUs), central Processing Units (CPUs), processing cores, etc.), one or more input/output (I/O) devices 206, one or more communication interfaces 208 (e.g., universal Serial Bus (USB), IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, global system for mobile communications (GSM), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), global Positioning System (GPS), infrared (IR), bluetooth, ZIGBEE, and/or similar types of interfaces), one or more programming (e.g., I/O) interfaces 210, memory 220, and one or more communication buses 204 for interconnecting these components and various other components.

In some embodiments, one or more of the communication buses 204 include circuitry that interconnects and controls communications between system components. In some embodiments, the one or more I/O devices 206 include at least one of a keyboard, a mouse, a touchpad, a joystick, one or more microphones, one or more speakers, one or more image sensors, one or more displays, and the like.

Memory 220 includes high-speed random access memory such as Dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), double data rate random access memory (DDR RAM), or other random access solid state memory devices. In some embodiments, memory 220 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 220 optionally includes one or more storage devices located remotely from the one or more processing units 202. Memory 220 includes a non-transitory computer-readable storage medium. In some embodiments, memory 220 or a non-transitory computer readable storage medium of memory 220 stores the following programs, modules, and data structures, or a subset thereof, including optional operating system 230 and XR experience module 240.

Operating system 230 includes instructions for handling various basic system services and for performing hardware-related tasks. In some embodiments, XR experience module 240 is configured to manage and coordinate single or multiple XR experiences of one or more users (e.g., single XR experiences of one or more users, or multiple XR experiences of a respective group of one or more users). To this end, in various embodiments, the XR experience module 240 includes a data acquisition unit 242, a tracking unit 244, a coordination unit 246, and a data transmission unit 248.

In some embodiments, the data acquisition unit 242 is configured to acquire data (e.g., presentation data, interaction data, sensor data, and/or location data) from at least the display generation component 120 of fig. 1, and optionally from one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. For this purpose, in various embodiments, the data acquisition unit 242 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

In some embodiments, tracking unit 244 is configured to map scene 105 and track at least the location/position of display generation component 120 relative to scene 105 of fig. 1, and optionally the location of one or more of input device 125, output device 155, sensor 190, and/or peripheral device 195. For this purpose, in various embodiments, tracking unit 244 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics. In some embodiments, tracking unit 244 includes a hand tracking unit 245 and/or an eye tracking unit 243. In some embodiments, the hand tracking unit 245 is configured to track the location/position of one or more portions of the user's hand, and/or the motion of one or more portions of the user's hand relative to the scene 105 of fig. 1, relative to the display generating component 120, and/or relative to a coordinate system defined relative to the user's hand. The hand tracking unit 245 is described in more detail below with respect to fig. 4. In some embodiments, the eye tracking unit 243 is configured to track the positioning or movement of the user gaze (or more generally, the user's eyes, face, or head) relative to the scene 105 (e.g., relative to the physical environment and/or relative to the user (e.g., the user's hand)) or relative to XR content displayed via the display generating component 120. The eye tracking unit 243 is described in more detail below with respect to fig. 5.

In some embodiments, coordination unit 246 is configured to manage and coordinate XR experiences presented to a user by display generation component 120, and optionally by one or more of output device 155 and/or peripheral device 195. For this purpose, in various embodiments, coordination unit 246 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

In some embodiments, the data transmission unit 248 is configured to transmit data (e.g., presentation data and/or location data) to at least the display generation component 120, and optionally to one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. For this purpose, in various embodiments, the data transmission unit 248 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

While the data acquisition unit 242, tracking unit 244 (e.g., including the eye tracking unit 243 and hand tracking unit 245), coordination unit 246, and data transmission unit 248 are shown as residing on a single device (e.g., controller 110), it should be understood that in other embodiments, any combination of the data acquisition unit 242, tracking unit 244 (e.g., including the eye tracking unit 243 and hand tracking unit 245), coordination unit 246, and data transmission unit 248 may reside in a single computing device.

Furthermore, FIG. 2 is a functional description of various features that may be present in a particular implementation, as opposed to a schematic of the embodiments described herein. As will be appreciated by one of ordinary skill in the art, the individually displayed items may be combined and some items may be separated. For example, some of the functional blocks shown separately in fig. 2 may be implemented in a single block, and the various functions of a single functional block may be implemented by one or more functional blocks in various embodiments. The actual number of modules and the division of particular functions, and how features are allocated among them, will vary depending upon the particular implementation, and in some embodiments, depend in part on the particular combination of hardware, software, and/or firmware selected for a particular implementation.

Fig. 3 is a block diagram of an example of display generation component 120 according to some embodiments. While certain specific features are shown, those of ordinary skill in the art will appreciate from the disclosure that various other features are not shown for the sake of brevity and so as not to obscure more pertinent aspects of the embodiments disclosed herein. For the purposes of this description, as a non-limiting example, in some embodiments, HMD 120 includes one or more processing units 302 (e.g., microprocessors, ASIC, FPGA, GPU, CPU, processing cores, and the like), one or more input/output (I/O) devices and sensors 306, one or more communication interfaces 308 (e.g., ,USB、FIREWIRE、THUNDERBOLT、IEEE 802.3x、IEEE 802.11x、IEEE 802.16x、GSM、CDMA、TDMA、GPS、IR、BLUETOOTH、ZIGBEE and/or similar types of interfaces), one or more programming (e.g., I/O) interfaces 310, one or more XR displays 312, one or more optional inwardly-facing and/or outwardly-facing image sensors 314, memory 320, and one or more communication buses 304 for interconnecting these components and various other components.

In some embodiments, one or more communication buses 304 include circuitry for interconnecting and controlling communications between various system components. In some embodiments, the one or more I/O devices and sensors 306 include an Inertial Measurement Unit (IMU), an accelerometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, and/or blood glucose sensor), one or more microphones, one or more speakers, a haptic engine, and/or one or more depth sensors (e.g., structured light, time of flight, etc.), and the like.

In some embodiments, one or more XR displays 312 are configured to provide an XR experience to a user. In some embodiments, one or more XR displays 312 correspond to holographic, digital Light Processing (DLP), liquid Crystal Displays (LCD), liquid crystal on silicon (LCoS), organic light emitting field effect transistors (OLET), organic Light Emitting Diodes (OLED), surface conduction electron emitting displays (SED), field Emission Displays (FED), quantum dot light emitting diodes (QD-LED), microelectromechanical systems (MEMS), and/or similar display types. In some embodiments, one or more XR displays 312 correspond to diffractive, reflective, polarizing, holographic, etc. waveguide displays. For example, the HMD 120 includes a single XR display. In another example, the HMD 120 includes an XR display for each eye of the user. In some embodiments, one or more XR displays 312 are capable of presenting MR and VR content. In some implementations, one or more XR displays 312 can present MR or VR content.

In some embodiments, the one or more image sensors 314 are configured to acquire image data corresponding to at least a portion of the user's face including the user's eyes (and may be referred to as an eye tracking camera). In some embodiments, the one or more image sensors 314 are configured to acquire image data corresponding to at least a portion of the user's hand and optionally the user's arm (and may be referred to as a hand tracking camera). In some implementations, the one or more image sensors 314 are configured to face forward in order to acquire image data corresponding to a scene that a user would see in the absence of the HMD 120 (and may be referred to as a scene camera). The one or more optional image sensors 314 may include one or more RGB cameras (e.g., with Complementary Metal Oxide Semiconductor (CMOS) image sensors or Charge Coupled Device (CCD) image sensors), one or more Infrared (IR) cameras, and/or one or more event-based cameras, etc.

Memory 320 includes high-speed random access memory such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. In some embodiments, memory 320 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 320 optionally includes one or more storage devices located remotely from the one or more processing units 302. Memory 320 includes a non-transitory computer-readable storage medium. In some embodiments, memory 320 or a non-transitory computer readable storage medium of memory 320 stores the following programs, modules, and data structures, or a subset thereof, including optional operating system 330 and XR presentation module 340.

Operating system 330 includes processes for handling various basic system services and for performing hardware-related tasks. In some embodiments, XR presentation module 340 is configured to present XR content to a user via one or more XR displays 312. For this purpose, in various embodiments, the XR presentation module 340 includes a data acquisition unit 342, an XR presentation unit 344, an XR map generation unit 346, and a data transmission unit 348.

In some embodiments, the data acquisition unit 342 is configured to acquire data (e.g., presentation data, interaction data, sensor data, and/or location data) from at least the controller 110 of fig. 1. For this purpose, in various embodiments, the data acquisition unit 342 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

In some embodiments, XR presentation unit 344 is configured to present XR content via one or more XR displays 312. For this purpose, in various embodiments, XR presentation unit 344 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

In some embodiments, XR map generation unit 346 is configured to generate an XR map based on the media content data (e.g., a 3D map of a mixed reality scene or a map of a physical environment in which computer-generated objects may be placed to generate an augmented reality). For this purpose, in various embodiments, XR map generation unit 346 includes instructions and/or logic for the instructions as well as heuristics and metadata for the heuristics.

In some embodiments, the data transmission unit 348 is configured to transmit data (e.g., presentation data and/or location data) to at least the controller 110, and optionally one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. For this purpose, in various embodiments, the data transmission unit 348 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

Although the data acquisition unit 342, the XR presentation unit 344, the XR map generation unit 346, and the data transmission unit 348 are shown as residing on a single device (e.g., the display generation component 120 of fig. 1), it should be understood that in other embodiments, any combination of the data acquisition unit 342, the XR presentation unit 344, the XR map generation unit 346, and the data transmission unit 348 may be located in separate computing devices.

Furthermore, fig. 3 is used more as a functional description of various features that may be present in a particular embodiment, as opposed to a schematic of the embodiments described herein. As will be appreciated by one of ordinary skill in the art, the individually displayed items may be combined and some items may be separated. For example, some of the functional blocks shown separately in fig. 3 may be implemented in a single block, and the various functions of a single functional block may be implemented by one or more functional blocks in various embodiments. The actual number of modules and the division of particular functions, and how features are allocated among them, will vary depending upon the particular implementation, and in some embodiments, depend in part on the particular combination of hardware, software, and/or firmware selected for a particular implementation.

Fig. 4 is a schematic illustration of an exemplary embodiment of a hand tracking device 140. In some embodiments, the hand tracking device 140 (fig. 1) is controlled by the hand tracking unit 245 (fig. 2) to track the position/location of one or more portions of the user's hand, and/or the movement of one or more portions of the user's hand relative to the scene 105 of fig. 1 (e.g., relative to a portion of the physical environment surrounding the user, relative to the display generating component 120, or relative to a portion of the user (e.g., the user's face, eyes, or head), and/or relative to a coordinate system defined relative to the user's hand). In some implementations, the hand tracking device 140 is part of the display generation component 120 (e.g., embedded in or attached to a head-mounted device). In some embodiments, the hand tracking device 140 is separate from the display generation component 120 (e.g., in a separate housing or attached to a separate physical support structure).

In some implementations, the hand tracking device 140 includes an image sensor 404 (e.g., one or more IR cameras, 3D cameras, depth cameras, and/or color cameras) that captures three-dimensional scene information including at least a human user's hand 406. The image sensor 404 captures the hand image with sufficient resolution to enable the fingers and their respective locations to be distinguished. The image sensor 404 typically captures images of other parts of the user's body, and possibly also all parts of the body, and may have a zoom capability or a dedicated sensor with increased magnification to capture images of the hand with a desired resolution. In some implementations, the image sensor 404 also captures 2D color video images of the hand 406 and other elements of the scene. In some implementations, the image sensor 404 is used in conjunction with other image sensors to capture the physical environment of the scene 105, or as an image sensor that captures the physical environment of the scene 105. In some embodiments, the image sensor 404, or a portion thereof, is positioned relative to the user or the user's environment in a manner that uses the field of view of the image sensor to define an interaction space in which hand movements captured by the image sensor are considered input to the controller 110.

In some embodiments, the image sensor 404 outputs a sequence of frames containing 3D mapping data (and, in addition, possible color image data) to the controller 110, which extracts high-level information from the mapping data. This high-level information is typically provided via an Application Program Interface (API) to an application program running on the controller, which drives the display generating component 120 accordingly. For example, a user may interact with software running on the controller 110 by moving his hands 408 and changing his hand gestures.

In some implementations, the image sensor 404 projects a speckle pattern onto a scene that includes the hand 406 and captures an image of the projected pattern. In some implementations, the controller 110 calculates 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation based on lateral offsets of the blobs in the pattern. This approach is advantageous because it does not require the user to hold or wear any kind of beacon, sensor or other marker. The method gives the depth coordinates of points in the scene relative to a predetermined reference plane at a specific distance from the image sensor 404. In this disclosure, it is assumed that the image sensor 404 defines an orthogonal set of x-axis, y-axis, z-axis such that the depth coordinates of points in the scene correspond to the z-component measured by the image sensor. Alternatively, the hand tracking device 440 may use other 3D mapping methods, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors.

In some implementations, the hand tracking device 140 captures and processes a time series containing a depth map of the user's hand as the user moves his hand (e.g., the entire hand or one or more fingers). Software running on the image sensor 404 and/or a processor in the controller 110 processes the 3D mapping data to extract image block descriptors of the hand in these depth maps. The software may match these descriptors with image block descriptors stored in database 408 based on previous learning processes in order to estimate the pose of the hand in each frame. The pose typically includes the 3D position of the user's hand joints and finger tips.

The software may also analyze the trajectory of the hand and/or finger over multiple frames in the sequence to identify gestures. The pose estimation functions described herein may alternate with motion tracking functions such that image block-based pose estimation is performed only once every two (or more) frames while tracking changes used to find poses that occur on the remaining frames. Pose, motion, and gesture information are provided to an application running on the controller 110 via the APIs described above. The program may move and modify images presented on the display generation component 120, for example, in response to pose and/or gesture information, or perform other functions.

In some implementations, the gesture includes an air gesture. An air gesture is a motion of a portion of a user's body (e.g., a head, one or more arms, one or more hands, one or more fingers, and/or one or more legs) through the air that is detected without the user touching an input element (or being independent of an input element that is part of a device) that is part of a device (e.g., computer system 101, one or more input devices 125, and/or hand tracking device 140) (including a motion of the user's body relative to an absolute reference (e.g., angle of the user's arm relative to the ground or distance of the user's hand relative to the ground), movement relative to another portion of the user's body (e.g., movement of the user's hand relative to the user's shoulder, movement of one hand of the user relative to the other hand of the user, and/or movement of the user's finger relative to the other finger or portion of the hand of the user), and/or absolute movement of a portion of the user's body (e.g., a flick gesture comprising a predetermined amount and/or speed of movement of the hand in a predetermined gesture, or a shake gesture comprising a predetermined speed or amount of rotation of a portion of the user's body)).

In some embodiments, according to some embodiments, the input gestures used in the various examples and embodiments described herein include air gestures performed by movement of a user's finger relative to other fingers or portions of the user's hand for interacting with an XR environment (e.g., a virtual or mixed reality environment). In some embodiments, the air gesture is a gesture that is detected without the user touching an input element that is part of the device (or independent of an input element that is part of the device) and based on a detected movement of a portion of the user's body through the air, including a movement of the user's body relative to an absolute reference (e.g., an angle of the user's arm relative to the ground or a distance of the user's hand relative to the ground), a movement relative to another portion of the user's body (e.g., a movement of the user's hand relative to the user's shoulder, a movement of the user's hand relative to the other hand of the user, and/or a movement of the user's finger relative to the other finger or part of the hand of the user), and/or an absolute movement of a portion of the user's body (e.g., a flick gesture that includes a predetermined amount and/or speed of movement of the hand in a predetermined gesture that includes a predetermined gesture of the hand, or a shake gesture that includes a predetermined speed or amount of rotation of a portion of the user's body).

In some embodiments where the input gesture is an air gesture (e.g., in the absence of physical contact with the input device, the input device provides information to the computer system as to which user interface element is the target of the user input, such as contact with a user interface element displayed on a touch screen, or contact with a mouse or touchpad to move a cursor to the user interface element), the gesture takes into account the user's attention (e.g., gaze) to determine the target of the user input (e.g., for direct input, as described below). Thus, in embodiments involving air gestures, for example, an input gesture in combination (e.g., simultaneously) with movement of a user's finger and/or hand detects an attention (e.g., gaze) toward a user interface element to perform pinch and/or tap inputs, as described below.

In some implementations, an input gesture directed to a user interface object is performed with direct or indirect reference to the user interface object. For example, user input is performed directly on a user interface object according to performing input with a user's hand at a location corresponding to the location of the user interface object in a three-dimensional environment (e.g., as determined based on the user's current viewpoint). In some implementations, upon detecting a user's attention (e.g., gaze) to a user interface object, an input gesture is performed indirectly on the user interface object in accordance with a position of a user's hand not being at the position corresponding to the position of the user interface object in the three-dimensional environment while the user is performing the input gesture. For example, for a direct input gesture, the user can direct the user's input to the user interface object by initiating the gesture at or near a location corresponding to the display location of the user interface object (e.g., within 0.5cm, 1cm, 5cm, or within a distance between 0 and 5cm measured from the outer edge of the option or the center portion of the option). For indirect input gestures, a user can direct the user's input to a user interface object by focusing on the user interface object (e.g., by looking at the user interface object), and while focusing on an option, the user initiates an input gesture (e.g., at any location detectable by the computer system) (e.g., at a location that does not correspond to the display location of the user interface object).

In some embodiments, according to some embodiments, the input gestures (e.g., air gestures) used in the various examples and embodiments described herein include pinch inputs and tap inputs for interacting with a virtual or mixed reality environment. For example, pinch and tap inputs described below are performed as air gestures.

In some implementations, the pinch input is part of an air gesture that includes one or more of: pinch gestures, long pinch gestures, pinch and drag gestures, or double pinch gestures. For example, pinch gestures as air gestures include movements of two or more fingers of a hand to contact each other, i.e., optionally, immediately followed by interruption of contact with each other (e.g., within 0 to 1 second). A long pinch gesture, which is an air gesture, includes movement of two or more fingers of a hand into contact with each other for at least a threshold amount of time (e.g., at least 1 second) before a break in contact with each other is detected. For example, a long pinch gesture includes a user holding a pinch gesture (e.g., where two or more fingers make contact), and the long pinch gesture continues until a break in contact between the two or more fingers is detected. In some implementations, the double pinch gesture as an air gesture includes two (e.g., or more) pinch inputs (e.g., performed by the same hand) that are detected in succession with each other immediately (e.g., within a predefined period of time). For example, the user performs a first pinch input (e.g., a pinch input or a long pinch input), releases the first pinch input (e.g., breaks contact between two or more fingers), and performs a second pinch input within a predefined period of time (e.g., within 1 second or within 2 seconds) after releasing the first pinch input.

In some implementations, the pinch-and-drag gesture as an air gesture includes a pinch gesture (e.g., a pinch gesture or a long pinch gesture) that is performed in conjunction with (e.g., follows) a drag input that changes a position of a user's hand from a first position (e.g., a start position of the drag) to a second position (e.g., an end position of the drag). In some implementations, the user holds the pinch gesture while the drag input is performed, and releases the pinch gesture (e.g., opens their two or more fingers) to end the drag gesture (e.g., at the second location). In some implementations, pinch input and drag input are performed by the same hand (e.g., a user pinch two or more fingers to contact each other and move the same hand to a second position in the air with a drag gesture). In some embodiments, the input gesture as an over-the-air gesture includes an input (e.g., pinch and/or tap input) performed using two hands of the user, e.g., the input gesture includes two (e.g., or more) inputs performed in conjunction with each other (e.g., simultaneously or within a predefined time period).

In some implementations, the tap input (e.g., pointing to the user interface element) performed as an air gesture includes movement of a user's finger toward the user interface element, movement of a user's hand toward the user interface element (optionally, the user's finger extends toward the user interface element), downward movement of the user's finger (e.g., mimicking a mouse click motion or a tap on a touch screen), or other predefined movement of the user's hand. In some embodiments, a flick input performed as an air gesture is detected based on a movement characteristic of a finger or hand performing a flick gesture movement of the finger or hand away from a user's point of view and/or toward an object that is a target of the flick input, followed by an end of the movement. In some embodiments, the end of movement is detected based on a change in movement characteristics of the finger or hand performing the flick gesture (e.g., the end of movement away from the user's point of view and/or toward an object that is the target of the flick input, the reversal of the direction of movement of the finger or hand, and/or the reversal of the acceleration direction of movement of the finger or hand).

In some embodiments, the determination that the user's attention is directed to a portion of the three-dimensional environment is based on detection of gaze directed to that portion (optionally, without other conditions). In some embodiments, the portion of the three-dimensional environment to which the user's attention is directed is determined based on detecting a gaze directed to the portion of the three-dimensional environment with one or more additional conditions, such as requiring the gaze to be directed to the portion of the three-dimensional environment for at least a threshold duration (e.g., dwell duration) and/or requiring the gaze to be directed to the portion of the three-dimensional environment when the point of view of the user is within a distance threshold from the portion of the three-dimensional environment, such that the device determines the portion of the three-dimensional environment to which the user's attention is directed, wherein if one of the additional conditions is not met, the device determines that the attention is not directed to the portion of the three-dimensional environment to which the gaze is directed (e.g., until the one or more additional conditions are met).

In some embodiments, detection of the ready state configuration of the user or a portion of the user is detected by the computer system. Detection of a ready state configuration of a hand is used by a computer system as an indication that a user may be ready to interact with the computer system using one or more air gesture inputs (e.g., pinch, tap, pinch and drag, double pinch, long pinch, or other air gestures described herein) performed by the hand. For example, the ready state of the hand is determined based on whether the hand has a predetermined hand shape (e.g., a pre-pinch shape in which the thumb and one or more fingers extend and are spaced apart in preparation for making a pinch or grasp gesture, or a pre-flick in which the one or more fingers extend and the palm faces away from the user), based on whether the hand is in a predetermined position relative to the user's point of view (e.g., below the user's head and above the user's waist and extending at least 15cm, 20cm, 25cm, 30cm, or 50cm from the body), and/or based on whether the hand has moved in a particular manner (e.g., toward an area above the user's waist and in front of the user's head or away from the user's body or legs). In some implementations, the ready state is used to determine whether an interactive element of the user interface is responsive to an attention (e.g., gaze) input.

In some embodiments, the software may be downloaded to the controller 110 in electronic form, over a network, for example, or may alternatively be provided on tangible non-transitory media, such as optical, magnetic, or electronic memory media. In some embodiments, database 408 is also stored in a memory associated with controller 110. Alternatively or in addition, some or all of the described functions of the computer may be implemented in dedicated hardware, such as a custom or semi-custom integrated circuit or a programmable Digital Signal Processor (DSP). Although controller 110 is shown in fig. 4, some or all of the processing functions of the controller may be performed by a suitable microprocessor and software or by dedicated circuitry within the housing of hand tracking device 402 or other devices associated with image sensor 404, for example, as a separate unit from image sensor 440. In some embodiments, at least some of these processing functions may be performed by a suitable processor integrated with display generation component 120 (e.g., in a television receiver, handheld device, or head mounted device) or with any other suitable computerized device (such as a game console or media player). The sensing functionality of the image sensor 404 may likewise be integrated into a computer or other computerized device to be controlled by the sensor output.

Fig. 4 also includes a schematic diagram of a depth map 410 captured by the image sensor 404, according to some embodiments. As described above, the depth map comprises a matrix of pixels having corresponding depth values. Pixels 412 corresponding to the hand 406 have been segmented from the background and wrist in the map. The brightness of each pixel within the depth map 410 is inversely proportional to its depth value (i.e., the measured z-distance from the image sensor 404), where the gray shade becomes darker with increasing depth. The controller 110 processes these depth values to identify and segment components of the image (i.e., a set of adjacent pixels) that have human hand features. These features may include, for example, overall size, shape, and frame-to-frame motion from a sequence of depth maps.

Fig. 4 also schematically illustrates the hand bones 414 that the controller 110 eventually extracts from the depth map 410 of the hand 406, according to some embodiments. In fig. 4, bone 414 is superimposed over hand background 416 that has been segmented from the original depth map. In some embodiments, key feature points of the hand and optionally on the wrist or arm connected to the hand (e.g., points corresponding to the knuckles, finger tips, palm center, or end of the hand connected to the wrist) are identified and located on the hand bones 414. In some embodiments, the controller 110 uses the positions and movements of these key feature points on the plurality of image frames to determine a gesture performed by the hand or a current state of the hand according to some embodiments.

Fig. 5 illustrates an exemplary embodiment of the eye tracking device 130 (fig. 1). In some embodiments, eye tracking device 130 is controlled by eye tracking unit 243 (fig. 2) to track the positioning and movement of the user gaze relative to scene 105 or relative to XR content displayed via display generation component 120. In some embodiments, the eye tracking device 130 is integrated with the display generation component 120. For example, in some embodiments, when display generating component 120 is a head-mounted device (such as a headset, helmet, goggles, or glasses) or a handheld device placed in a wearable frame, the head-mounted device includes both components that generate XR content for viewing by a user and components for tracking the user's gaze with respect to the XR content. In some embodiments, the eye tracking device 130 is separate from the display generation component 120. For example, when the display generating component is a handheld device or an XR chamber, the eye tracking device 130 is optionally a device separate from the handheld device or XR chamber. In some embodiments, the eye tracking device 130 is a head mounted device or a portion of a head mounted device. In some embodiments, the head-mounted eye tracking device 130 is optionally used in combination with a display generating component that is also head-mounted or a display generating component that is not head-mounted. In some embodiments, the eye tracking device 130 is not a head mounted device and is optionally used in conjunction with a head mounted display generating component. In some embodiments, the eye tracking device 130 is not a head mounted device and optionally is part of a non-head mounted display generating component.

In some embodiments, the display generation component 120 uses a display mechanism (e.g., a left near-eye display panel and a right near-eye display panel) to display frames including left and right images in front of the user's eyes, thereby providing a 3D virtual view to the user. For example, the head mounted display generating component may include left and right optical lenses (referred to herein as eye lenses) located between the display and the user's eyes. In some embodiments, the display generation component may include or be coupled to one or more external cameras that capture video of the user's environment for display. In some embodiments, the head mounted display generating component may have a transparent or translucent display and the virtual object is displayed on the transparent or translucent display through which the user may directly view the physical environment. In some embodiments, the display generation component projects the virtual object into the physical environment. The virtual object may be projected, for example, on a physical surface or as a hologram, such that an individual uses the system to observe the virtual object superimposed over the physical environment. In this case, separate display panels and image frames for the left and right eyes may not be required.

As shown in fig. 5, in some embodiments, the gaze tracking device 130 includes at least one eye tracking camera (e.g., an Infrared (IR) or Near Infrared (NIR) camera) and an illumination source (e.g., an array or ring of IR or NIR light sources, such as LEDs) that emits light (e.g., IR or NIR light) toward the user's eyes. The eye-tracking camera may be directed toward the user's eye to receive IR or NIR light reflected directly from the eye by the light source, or alternatively may be directed toward "hot" mirrors located between the user's eye and the display panel that reflect IR or NIR light from the eye to the eye-tracking camera while allowing visible light to pass through. The gaze tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyzes the images to generate gaze tracking information, and communicates the gaze tracking information to the controller 110. In some embodiments, both eyes of the user are tracked separately by the respective eye tracking camera and illumination source. In some embodiments, only one eye of the user is tracked by the respective eye tracking camera and illumination source.

In some embodiments, the eye tracking device 130 is calibrated using a device-specific calibration process to determine parameters of the eye tracking device for the particular operating environment 100, such as 3D geometry and parameters of LEDs, cameras, hot mirrors (if present), eye lenses, and display screens. The device-specific calibration procedure may be performed at the factory or another facility prior to delivering the AR/VR equipment to the end user. The device-specific calibration process may be an automatic calibration process or a manual calibration process. According to some embodiments, the user-specific calibration process may include an estimation of eye parameters of a specific user, such as pupil position, foveal position, optical axis, visual axis, eye distance, etc. According to some embodiments, once the device-specific parameters and the user-specific parameters are determined for the eye-tracking device 130, the images captured by the eye-tracking camera may be processed using a flash-assist method to determine the current visual axis and gaze point of the user relative to the display.

As shown in fig. 5, the eye tracking device 130 (e.g., 130A or 130B) includes an eye lens 520 and a gaze tracking system including at least one eye tracking camera 540 (e.g., an Infrared (IR) or Near Infrared (NIR) camera) positioned on a side of the user's face on which eye tracking is performed, and an illumination source 530 (e.g., an IR or NIR light source such as an array or ring of NIR Light Emitting Diodes (LEDs)) that emits light (e.g., IR or NIR light) toward the user's eyes 592. The eye-tracking camera 540 may be directed at a mirror 550 (which mirrors reflect IR or NIR light from the eye 592 while allowing visible light to pass) between the user's eye 592 and the display 510 (e.g., a left display panel or a right display panel of a head-mounted display, or a display and/or projector of a handheld device) (e.g., as shown in the top portion of fig. 5), or alternatively may be directed at the user's eye 592 to receive reflected IR or NIR light from the eye 592 (e.g., as shown in the bottom portion of fig. 5).

In some implementations, the controller 110 renders AR or VR frames 562 (e.g., left and right frames for left and right display panels) and provides the frames 562 to the display 510. The controller 110 uses the gaze tracking input 542 from the eye tracking camera 540 for various purposes, such as for processing the frames 562 for display. The controller 110 optionally estimates the gaze point of the user on the display 510 based on gaze tracking input 542 acquired from the eye tracking camera 540 using a flash assist method or other suitable method. The gaze point estimated from the gaze tracking input 542 is optionally used to determine the direction in which the user is currently looking.

Several possible use cases of the current gaze direction of the user are described below and are not intended to be limiting. As an exemplary use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content in a foveal region determined according to a current gaze direction of the user at a higher resolution than in a peripheral region. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another exemplary use case in an AR application, the controller 110 may direct an external camera used to capture the physical environment of the XR experience to focus in the determined direction. The autofocus mechanism of the external camera may then focus on an object or surface in the environment that the user is currently looking at on display 510. As another example use case, the eye lens 520 may be a focusable lens, and the controller uses the gaze tracking information to adjust the focus of the eye lens 520 such that the virtual object the user is currently looking at has the appropriate vergence to match the convergence of the user's eyes 592. The controller 110 may utilize the gaze tracking information to direct the eye lens 520 to adjust the focus such that the approaching object the user is looking at appears at the correct distance.

In some embodiments, the eye tracking device is part of a head mounted device that includes a display (e.g., display 510), two eye lenses (e.g., eye lens 520), an eye tracking camera (e.g., eye tracking camera 540), and a light source (e.g., light source 530 (e.g., IR or NIR LED)) mounted in a wearable housing. The light source emits light (e.g., IR or NIR light) toward the user's eye 592. In some embodiments, the light sources may be arranged in a ring or circle around each of the lenses, as shown in fig. 5. In some embodiments, for example, eight light sources 530 (e.g., LEDs) are arranged around each lens 520. However, more or fewer light sources 530 may be used, and other arrangements and locations of light sources 530 may be used.

In some implementations, the display 510 emits light in the visible range and does not emit light in the IR or NIR range, and thus does not introduce noise in the gaze tracking system. Note that the position and angle of the eye tracking camera 540 is given by way of example and is not intended to be limiting. In some implementations, a single eye tracking camera 540 is located on each side of the user's face. In some implementations, two or more NIR cameras 540 may be used on each side of the user's face. In some implementations, a camera 540 with a wider field of view (FOV) and a camera 540 with a narrower FOV may be used on each side of the user's face. In some implementations, a camera 540 operating at one wavelength (e.g., 850 nm) and a camera 540 operating at a different wavelength (e.g., 940 nm) may be used on each side of the user's face.

The embodiment of the gaze tracking system as shown in fig. 5 may be used, for example, in an augmented reality (e.g., including virtual reality and/or mixed reality) application to provide an augmented reality (e.g., including virtual reality, augmented reality, and/or augmented virtual) experience to a user.

Fig. 6 illustrates a flash-assisted gaze tracking pipeline in accordance with some embodiments. In some embodiments, the gaze tracking pipeline is implemented by a glint-assisted gaze tracking system (e.g., an eye tracking device 130 as shown in fig. 1 and 5). The flash-assisted gaze tracking system may maintain a tracking state. Initially, the tracking state is off or "no". When in the tracking state, the glint-assisted gaze tracking system uses previous information from a previous frame when analyzing the current frame to track pupil contours and glints in the current frame. When not in the tracking state, the glint-assisted gaze tracking system attempts to detect pupils and glints in the current frame and, if successful, initializes the tracking state to "yes" and continues with the next frame in the tracking state.

As shown in fig. 6, the gaze tracking camera may capture left and right images of the left and right eyes of the user. The captured image is then input to the gaze tracking pipeline for processing beginning at 610. As indicated by the arrow returning to element 600, the gaze tracking system may continue to capture images of the user's eyes, for example, at a rate of 60 to 120 frames per second. In some embodiments, each set of captured images may be input to a pipeline for processing. However, in some embodiments or under some conditions, not all captured frames are pipelined.

At 610, for the currently captured image, if the tracking state is yes, the method proceeds to element 640. At 610, if the tracking state is no, the image is analyzed to detect a user's pupil and glints in the image, as indicated at 620. At 630, if the pupil and glints are successfully detected, the method proceeds to element 640. Otherwise, the method returns to element 610 to process the next image of the user's eye.

At 640, if proceeding from element 410, the current frame is analyzed to track pupils and glints based in part on previous information from the previous frame. At 640, if proceeding from element 630, a tracking state is initialized based on the pupil and flash detected in the current frame. The results of the processing at element 640 are checked to verify that the results of the tracking or detection may be trusted. For example, the results may be checked to determine if the pupil and a sufficient number of flashes for performing gaze estimation are successfully tracked or detected in the current frame. At 650, if the result is unlikely to be authentic, the tracking state is set to no and the method returns to element 610 to process the next image of the user's eye. At 650, if the result is trusted, the method proceeds to element 670. At 670, the tracking state is set to YES (if not already YES), and pupil and glint information is passed to element 680 to estimate the gaze point of the user.

Fig. 6 is intended to serve as one example of an eye tracking technique that may be used in a particular implementation. As will be appreciated by one of ordinary skill in the art, other eye tracking techniques, currently existing or developed in the future, may be used in place of or in combination with the glint-assisted eye tracking techniques described herein in computer system 101 for providing an XR experience to a user, according to various embodiments.

In this disclosure, various input methods are described with respect to interactions with a computer system. When one input device or input method is used to provide an example and another input device or input method is used to provide another example, it should be understood that each example may be compatible with and optionally utilize the input device or input method described with respect to the other example. Similarly, various output methods are described with respect to interactions with a computer system. When one output device or output method is used to provide an example and another output device or output method is used to provide another example, it should be understood that each example may be compatible with and optionally utilize the output device or output method described with respect to the other example. Similarly, the various methods are described with respect to interactions with a virtual environment or mixed reality environment through a computer system. When examples are provided using interactions with a virtual environment, and another example is provided using a mixed reality environment, it should be understood that each example may be compatible with and optionally utilize the methods described with respect to the other example. Thus, the present disclosure discloses embodiments that are combinations of features of multiple examples, without the need to list all features of the embodiments in detail in the description of each example embodiment.

User interface and associated process

Attention is now directed to embodiments of a user interface ("UI") and associated processes that may be implemented on a computer system (such as a portable multifunction device or a head-mounted device) having a display generating component, one or more input devices, and (optionally) one or more cameras.

Fig. 7A-7J illustrate a three-dimensional environment displayed via a display generating component (e.g., display generating component 7100, display generating component 7102, or display generating component 120) and interactions occurring in the three-dimensional environment caused by user inputs directed to the three-dimensional environment and/or inputs received from other computer systems and/or sensors. In some implementations, the input is directed to the virtual object within the three-dimensional environment by a user gaze detected in an area occupied by the virtual object or by a gesture performed at a location in the physical environment corresponding to the area of the virtual object. In some implementations, the input is directed to the virtual object within the three-dimensional environment by a gesture performed (e.g., optionally, at a location in the physical environment that is independent of the area of the virtual object in the three-dimensional environment) when the virtual object has an input focus (e.g., when the virtual object has been selected by a gaze input that is detected simultaneously and/or previously, by a pointer input that is detected simultaneously or previously, and/or by a gesture input that is detected simultaneously and/or previously). In some implementations, the input is directed to a virtual object within the three-dimensional environment by an input device that has positioned a focus selector object (e.g., a pointer object or a selector object) at the location of the virtual object. In some implementations, the input is directed to a virtual object within the three-dimensional environment via other means (e.g., voice or control buttons). In some embodiments, the input is directed to the representation of the physical object or the virtual object corresponding to the physical object by a user's hand movement (e.g., an entire hand movement in a corresponding gesture, a movement of one portion of the hand relative to another portion of the user's hand, and/or a relative movement between the hands) and/or manipulation relative to the physical object (e.g., touching, swipe, tap, open, move toward, and/or move relative). In some embodiments, the computer system displays some changes in the three-dimensional environment (e.g., displays additional virtual content, stops displaying existing virtual content, or transitions between displaying different immersion levels of visual content) based on inputs from sensors (e.g., image sensors, temperature sensors, biometric sensors, motion sensors, and/or proximity sensors) and contextual conditions (e.g., location, time, and/or presence of other people in the environment). In some embodiments, the computer system displays some changes in the three-dimensional environment (e.g., displays additional virtual content, stops displaying existing virtual content, and/or transitions between different immersion levels of displaying visual content) based on input from other computers used by other users sharing the computer-generated environment with users of the computer system (e.g., in a shared computer-generated experience, in a shared virtual environment, and/or in a shared virtual or augmented reality environment of a communication session). In some embodiments, the computer system displays some changes in the three-dimensional environment (e.g., displays movements, deformations, and/or changes in visual characteristics of the user interface, virtual surface, user interface object, and/or virtual landscape) based on input from sensors that detect movements of other people and objects and movements of the user that may not meet the criteria of the identified gesture input as triggering the associated operation of the computer system.

In some embodiments, the three-dimensional environment displayed via the display generation component described herein is a virtual three-dimensional environment that includes virtual objects and content at different virtual locations in the three-dimensional environment without a representation of the physical environment. In some implementations, the three-dimensional environment is a mixed reality environment that displays virtual objects at different virtual locations in the three-dimensional environment that are constrained by one or more physical aspects of the physical environment (e.g., location and orientation of walls, floors, surfaces, direction of gravity, time of day, and/or spatial relationship between physical objects). In some embodiments, the three-dimensional environment is an augmented reality environment that includes a representation of a physical environment. In some embodiments, the representations of the physical environment include respective representations of the physical objects and surfaces at different locations in the three-dimensional environment such that spatial relationships between the different physical objects and surfaces in the physical environment are reflected by spatial relationships between the representations of the physical objects and surfaces in the three-dimensional environment. In some embodiments, when a virtual object is placed relative to the position of a representation of a physical object and a surface in a three-dimensional environment, the virtual object appears to have a corresponding spatial relationship to the physical object and the surface in the physical environment. In some embodiments, the computer system transitions between displaying different types of environments based on user input and/or contextual conditions (e.g., transitions between rendering a computer-generated environment or experience with different levels of immersion and/or adjusting the relative saliency of audio/visual sensory input from the virtual content and the representation from the physical environment).

In some embodiments, the display generating component includes a passthrough portion in which a representation of the physical environment is displayed. In some implementations, the transparent portion of the display-generating component is a transparent or translucent (e.g., see-through) portion of the display-generating component that displays at least a portion of the physical environment around the user or within the field of view of the user. For example, the transparent portion is a portion of the head-mounted display or head-up display that is made translucent (e.g., less than 50%, 40%, 30%, 20%, 15%, 10%, or 5% opacity) or transparent so that a user can view the real world around the user through it without removing the head-mounted display or moving away from the head-up display. In some embodiments, the transparent portion gradually transitions from translucent or transparent to completely opaque when displaying a virtual or mixed reality environment. In some embodiments, the pass-through portion of the display generation component displays a real-time feed of images or video of at least a portion of the physical environment captured by one or more cameras (e.g., a rear facing camera of a mobile device or associated with a head mounted display, or other camera feeding image data to a computer system). In some embodiments, the one or more cameras are directed at a portion of the physical environment directly in front of the user's eyes (e.g., behind the display generating component relative to the user of the display generating component). In some embodiments, the one or more cameras are directed at a portion of the physical environment that is not directly in front of the user's eyes (e.g., in a different physical environment, or at the side or rear of the user).

In some implementations, when virtual objects are displayed at locations corresponding to the locations of one or more physical objects in the physical environment (e.g., at locations in a virtual reality environment, a mixed reality environment, or an augmented reality environment), at least some of the virtual objects are displayed in place of (e.g., in place of) a portion of a real-time view of the camera (e.g., a portion of the physical environment captured in the real-time view). In some embodiments, at least some of the virtual objects and content are projected onto a physical surface or empty space in the physical environment and are visible through the transparent portion of the display generating component (e.g., visible as part of a camera view of the physical environment or visible through a transparent or translucent portion of the display generating component). In some implementations, at least some of the virtual objects and virtual content are displayed to overlay a portion of the display and to obstruct a line of sight of at least a portion of the physical environment visible through the transparent or translucent portion of the display generating component.

In some embodiments, the display generation component displays a different view of the three-dimensional environment according to user input or movement that changes a viewpoint of a currently displayed view of the three-dimensional environment relative to a virtual location of the three-dimensional environment. In some implementations, when the three-dimensional environment is a virtual environment, the point of view moves according to a navigation or motion request (e.g., an air gesture, or a gesture performed by movement of one portion of the hand relative to another portion of the hand) without requiring movement of the user's head, torso, and/or display generating components in the physical environment. In some embodiments, movement of the user's head and/or torso, and/or movement of the display generating component or other location-aware element of the computer system (e.g., due to the user holding the display generating component or wearing the HMD) relative to the physical environment results in corresponding movement (e.g., with corresponding movement direction, movement distance, movement speed, and/or orientation changes) of the viewpoint relative to the three-dimensional environment, thereby causing corresponding changes in the current display view of the three-dimensional environment. In some embodiments, when the virtual object has a preset spatial relationship with respect to the viewpoint (e.g., is anchored or fixed to the viewpoint), movement of the viewpoint with respect to the three-dimensional environment will cause movement of the virtual object with respect to the three-dimensional environment while maintaining the position of the virtual object in the field of view (e.g., the virtual object is said to be head-locked). In some embodiments, the virtual object is physically locked to the user and moves relative to the three-dimensional environment as the user moves in the physical environment as a whole (e.g., carries or wears the display generating component and/or other position sensing components of the computer system), but will not move in the three-dimensional environment in response to individual user head movements (e.g., the display generating component and/or other position sensing components of the computer system rotate about a fixed position of the user in the physical environment). In some embodiments, the virtual object is optionally locked to another portion of the user, such as the user's hand or the user's wrist, and moves in the three-dimensional environment according to movement of the portion of the user in the physical environment to maintain a preset spatial relationship between the position of the virtual object and the virtual position of the portion of the user in the three-dimensional environment. In some embodiments, the virtual object is locked to a preset portion of the field of view provided by the display generating component and moves in a three-dimensional environment according to movement of the field of view, independent of movement of the user that does not cause a change in the field of view.

In some embodiments, as shown in fig. 7E-7J, the view of the three-dimensional environment sometimes does not include representations of the user's hands, arms, and/or wrists. In some embodiments, as shown in fig. 7B-7D, representations of a user's hands, arms, and/or wrists are included in a view of a three-dimensional environment. In some embodiments, the representation of the user's hand, arm, and/or wrist is included in a view of the three-dimensional environment as part of the representation of the physical environment provided via the display generating component. In some embodiments, these representations are not part of the representation of the physical environment and are captured (e.g., pointed at the user's hand, arm, and wrist by one or more cameras) and displayed separately in a three-dimensional environment independent of the current display view of the three-dimensional environment. In some embodiments, these representations include camera images captured by one or more cameras of the computer system or stylized versions of the arm, wrist, and/or hand based on information captured by the various sensors. In some embodiments, these representations replace a display of, are overlaid on, or block a view of, a portion of the representation of the physical environment. In some embodiments, when the display generating component does not provide a view of the physical environment and provides a full virtual environment (e.g., no camera view and no transparent passthrough portion), a real-time visual representation of one or both arms, wrists, and/or hands of the user (e.g., programming the representation or segmented camera image) is optionally still displayed in the virtual environment. In some embodiments, if no representation of the user's hand is provided in the view of the three-dimensional environment, a location corresponding to the user's hand is optionally indicated in the three-dimensional environment, e.g., by changing the appearance of the virtual content (e.g., by translucency and/or simulating changes in reflectivity) at a location in the three-dimensional environment corresponding to the location of the user's hand in the physical environment. In some embodiments, when the virtual location in the three-dimensional environment corresponding to the position of the user's hand or wrist is outside the current field of view provided via the display generating component, the representation of the user's hand or wrist is outside the current display view of the three-dimensional environment; and in response to the virtual positioning corresponding to the position of the user's hand or wrist moving within the current field of view due to movement of the display generating component, the user's hand or wrist, the user's head, and/or the user as a whole, the representation of the user's hand or wrist is made visible in the view of the three-dimensional environment.

Fig. 7A-7J are block diagrams illustrating user interactions with user interface objects displayed in a three-dimensional environment, according to some embodiments. In some embodiments, one or more of the user interface objects are provided as part of a control center user interface or home page experience in a three-dimensional environment. According to various embodiments, the behaviors described with respect to the user interface objects in some examples with reference to fig. 7A-7J (and fig. 8-10) apply to the user interface objects in other examples unless otherwise indicated in the specification.

Fig. 7A-7J illustrate an exemplary computer system (e.g., device 101 or another computer system) in communication with a first display generating component (e.g., display generating component 7100 or another display generating component). In some embodiments, the first display generating component is a heads-up display. In some implementations, the first display generating component is a Head Mounted Display (HMD). In some embodiments, the first display generating means is a stand-alone display, a touch screen, a projector, or another type of display. In some embodiments, the computer system communicates with one or more input devices including cameras or other sensors and input devices that detect movement of a user's hand, movement of the user's entire body, and/or movement of the user's head in a physical environment. In some embodiments, one or more input devices detect movements of a user's hands, face, and the user's entire body, as well as current pose, orientation, and positioning. In some embodiments, the one or more input devices include an eye tracking component that detects the location and movement of the user's gaze. In some embodiments, the first display generating component, and optionally the one or more input devices and the computer system, are part of a head-mounted device (e.g., an HMD, or a pair of AR/VR goggles/glasses) that moves and rotates with the user's head in a physical environment and changes the user's point of view into a three-dimensional environment provided via the first display generating component. In some embodiments, the first display generating means is a heads-up display that does not move or rotate with the user's head or the entire body of the user, but optionally changes the user's point of view into a three-dimensional environment according to the movement of the user's head or body relative to the first display generating means. In some embodiments, the first display generating component is optionally moved and rotated by the user's hand relative to the physical environment or relative to the user's head, and the viewpoint of the user is changed into the three-dimensional environment according to the movement of the first display generating component relative to the user's head or face or relative to the physical environment.

Fig. 7A to 7D are block diagrams illustrating the display of user interface objects (e.g., first user interface object 7016' and second user interface object 7018 ') in a three-dimensional environment at first respective locations corresponding to locations at or near a user's hand (e.g., hand 7020 or another hand) in a physical environment.

For example, fig. 7A illustrates a physical environment 7000 including a user 7002 interacting with a display generation component 7100. The user 7002 has two hands: a hand 7020 and a hand 7022. Also shown is the user's left arm 7028, which is connected to the user's left hand 7020. Physical environment 7000 includes physical object 7014 and physical walls 7004 and 7006. The physical environment 7000 also includes a physical floor 7008. As shown in fig. 7A, a computer system (e.g., display generation component 7100) displays a view of a three-dimensional environment (e.g., environment 7000', a virtual three-dimensional environment, an augmented reality environment, a perspective view of a physical environment, or a camera view of a physical environment). In some embodiments, the three-dimensional environment is a virtual three-dimensional environment without a representation of the physical environment 7000. In some embodiments, the three-dimensional environment is a mixed reality environment that is a virtual environment augmented by sensor data corresponding to a physical environment. In some embodiments, the three-dimensional environment is an augmented reality environment that includes one or more virtual objects and a representation of at least a portion of the physical environment surrounding the display generating component 7100 (e.g., representations 7004', 7006' of walls, representation 7008 'of floors, and/or representation 7014') of physical objects. In some implementations, the representation of the physical environment includes a camera view of the physical environment. In some embodiments, the representation of the physical environment includes a view of the physical environment through a transparent or translucent portion of the first display generating component.

Fig. 7B to 7D show a three-dimensional environment 7000' displayed using the display generating component 7100. Fig. 7B illustrates a first view from a user's point of view that is based in part on a user's position and/or location in a physical environment 7000. As shown in fig. 7B, a representation 7020' of the user's hand is displayed in a three-dimensional environment 7000 in accordance with the user moving the user's hand 7020 into the field of view of the display generating component 7100. In some embodiments, the representation 7020 'of the user's hand is a perspective view of the user's hand (e.g., the display generating component displays a camera view of the user's physical hand 7020). In some implementations, the representation 7020' of the user's hand is a generated (e.g., stylized, animated, and/or otherwise virtualized) representation of the user's hand. The representation 7020 'of the user's hand is updated as the user moves the user's physical hand 7020 within the physical environment relative to the user's first point of view (e.g., and/or relative to the field of view of the display generating component 7100). For example, the representation 7020 'of the user's hand moves in accordance with the movement of the user 7002 and/or the display generating component 7100 in the physical environment 7000.

In some embodiments, as shown in fig. 7B, the location in which the user interface object is displayed in the three-dimensional environment 7000' is anchored to (e.g., positioned relative to) a portion of the user's body (e.g., the user's head, eyes, face, torso, hands, and/or wrists). In some implementations, one or more user interface objects (e.g., first user interface object 7016 'and second user interface object 7018') are anchored to a hand that is determined to be a non-dominant hand of the user. For example, the computer system determines which hand is the dominant hand and the non-dominant hand based on past interactions of the user (e.g., and/or based on user selections indicating the user's non-dominant hand). In the example shown in fig. 7B-7D, the left hand (e.g., hand 7020) of the user is determined to be the user's non-dominant hand (e.g., and both user interface objects 7016' and 7018 'are positioned based on the current positioning of the user's hand 7020).

For example, the first user interface object 7016 'is anchored to the palm of the hand of the user that represents 7020'. In some implementations, the second user interface object 7018' is also positioned (e.g., anchored) relative to the user's hand (e.g., hovering over and/or behind the user's hand (e.g., at a predefined distance from the user). For example, the second user interface object 7018' is displayed to appear to be outside of the user's arm reach (e.g., the second user interface object 7018' is positioned at least a predetermined distance from the user in a three-dimensional environment). It will be appreciated that other user interface objects can be anchored to portions of the user's body and/or to the three-dimensional environment (e.g., such that the positioning of the user interface object does not move within the three-dimensional environment as the user moves in the physical environment).

In some implementations, the first user interface object 7016' and the second user interface object 7018' are displayed in the three-dimensional environment 7000 according to one or more gestures of the user's hand. For example, in response to the user (e.g., within the user's current viewpoint) opening the palm of the user's hand, the first user interface object 7016 'and/or the second user interface object 7018' are displayed. In some implementations, the first user interface object and the second user interface object are displayed in accordance with detection of a user gaze (e.g., in conjunction with a gesture) and/or detection that the user meets another attention-based criterion. In some implementations, in response to detecting other (e.g., additional) gestures of the user's hand, the first user interface object 7016' and/or the second user interface object 7018' are no longer displayed (e.g., and are replaced with a display of other controls (such as controls on the back of the user's hand) in response to the user closing the user's palm).

In some embodiments, the second user interface object 7018 'remains displayed (e.g., and the first user interface object 7016' positioned in the palm of the user's hand is no longer displayed) in accordance with the user's hand moving out of the point of view (e.g., out of the field of view of the display generating component 7100).

In some implementations, the first user interface object 7016 '(when anchored to the palm of the representation of the user's hand) is no longer displayed according to the user turning (e.g., rotating, along an axis corresponding to the user's wrist, and/or closing) the representation 7020' of the user's hand such that the user's palm is not displayed (e.g., the representation 7020 'of the user's hand is turned (e.g., flipped over) such that the user's palm faces away from the user (e.g., or the user closes the user's fist)). Thus, if the user turns over the user's palm or closes the user's fist, the first user interface object 7016' disappears.

In some implementations, in response to the user moving the representation of the user's hand out of the field of view, the first user interface object 7016' (e.g., and/or the second user interface object 7018 ') (e.g., as the representation of the user's hand moves out of the current field of view) continues to be displayed if the user's palm faces the user as the representation of the user's hand moves out of the field of view. In some embodiments, if the user's palm faces away from the user (e.g., the user's hand is flipped over) and/or the user's hand is closed (e.g., fisted), when the representation of the user's hand is moved out of the field of view, the first user interface object 7016' (e.g., and/or the second user interface object 7018 ') stops being displayed when the representation of the user's hand is not in the field of view.

In some implementations, the first user interface object 7016' is a first type of user interface object that includes various system level controls (e.g., a main menu control). In some implementations, the second user interface object 7018' is a second type of user interface object that includes various session-level (e.g., application) controls. It will be appreciated that in some embodiments, the controls included in the first user interface object 7016 'and/or the second user interface object 7018' are selected by the user (e.g., to enable the user to modify the position at which the controls are displayed within the three-dimensional environment).

In some implementations, the second user interface object 7018 'is initially displayed in a representation of the user's hand prior to movement (e.g., in the palm of a representation of the user's hand) (e.g., with an animation that appears as if it is popping from the user's hand to the anchored positioning of that hand). For example, if the user opens a fist of the user (e.g., a representation of the user's hand is open), the user interface object 7018' is displayed moving from a position over the palm of the representation of the user's hand to an anchored position of that hand.

In some implementations, the first user interface object 7016' includes a first set of controls (e.g., home button and/or other main menu controls) that are anchored to a first side of the representation of the user's hand (e.g., the user's palm). In some embodiments, a second set of controls (e.g., a control center including a plurality of selectable user interface elements for controlling various device functions) is anchored to a second side of the representation of the user's hand (e.g., the back of the user's hand). This enables the user to rotate the user's hand to access different types of controls (e.g., the user views the user's palm to access the home button, and the user views the back of the user's hand to access the control center).

In some implementations, the user is enabled to select a control (e.g., via user input) from a first set of controls (e.g., in user interface object 7016 ') and/or from a set of controls displayed in user interface object 7018'. For example, a user is enabled to gaze at the respective control, and in response to detecting a gaze of the user directed at the respective control, the respective control is visually emphasized (e.g., additional content and/or buttons are expanded and/or shown). In some implementations, a user is enabled to select a control by looking at the control and providing input (e.g., with a gesture of the user's hand). For example, the user is enabled to perform gaze and pinch gestures directed to the control and/or the user is enabled to perform tap inputs directed to the control. In some implementations, the user input is directed to the control (e.g., the user performs the input at a location corresponding to the control) or the input is directed to the control indirectly (e.g., the user performs the input while looking at the control, where the location of the user's hand is not at a location corresponding to the control while performing the input). For example, the user can direct the user's input to the control by initiating a gesture at or near the control (e.g., within 0.5cm, 1cm, 5cm, or within a distance between 0cm-5cm measured from the outer edge of the control or the center portion of the control). In some implementations, the user is further enabled to direct the user's input to the control by focusing on the control (e.g., looking at the control), and upon focusing on the control, the user initiates a gesture (e.g., at any location that is detectable by the computer system). For example, if the user is focusing on a control, there is no need to initiate a gesture at a location at or near the control.

In some implementations, the second user interface object 7018 'is displayed at a location in the three-dimensional environment that appears within a predefined distance in front of the user (e.g., within 20cm or 50cm of the user, or within reach of an arm) such that the user can easily interact with the second user interface object 7018' with another hand of the user (e.g., other than the hand 7020). For example, as explained below, the positioning of the second user interface object is further determined according to the current position of the user's hand 7020, such that the user can easily position the second user interface object 7018' sufficiently close to the user (by moving the user's hand 7020 towards the user) and such that the user can use the user's other hand to interact with selectable user interface elements displayed in the second user interface object 7018.

In some implementations, the positioning of the first user interface object 7016 'and the second user interface object 7018' is updated according to translational movements of the user's hand (e.g., as the user's hand 7020 is moved along axes which extend in the x-direction, the y-direction, and the z-direction). For example, movement of the user's hand 7020 in a horizontal direction (left/right), a vertical direction (up/down), or in depth (forward/backward) in the physical environment (which causes the representation 7020' of the user's hand to move relative to the user's point of view in the three-dimensional environment displayed) causes the positioning of the first user interface object 7016' and the second user interface object 7018' to also be updated (e.g., to follow the user's hand movements). For example, as shown in fig. 7C-7D, as the user moves the user's hand 7020 closer (e.g., translational movement in the z-direction), both the first user interface object 7016' and the second user interface object 7018' are also displayed at the new location (e.g., to maintain their relative positioning with respect to the representation 7020' of the user's hand).

For example, in fig. 7D, the representation 7020 'of the user's hand is closer to the user than in fig. 7C, and is thus displayed in fig. 7D in a larger size than the representation 7020 'of the user's hand shown in fig. 7C. In some implementations, the respective sizes of the first user interface object 7016 'and the second user interface object 7018' are also updated in accordance with movement of the user's hand forward or backward relative to the user's point of view (e.g., the first user interface object 7016 'and the second user interface object 7018' are also displayed in fig. 7D in a larger respective size than in fig. 7C in accordance with movement of the user's hand 7020 closer to the user's point of view).

In some implementations, the size of the user interface object is updated (e.g., proportionally) according to the current size of the user's hand. In some implementations, the size of the user interface object is selected based on the size of the user's hand (e.g., the size of the first user interface object 7016' is determined based on the size of the user's hand 7020 such that the user interface object appears to fit within the user's hand). For example, the size of the first user interface object 7016' (e.g., and/or the second user interface object 7018 ') is updated as the user opens and/or closes the user's hand (e.g., makes a fist). For example, the user interface object 7016 'is updated to appear smaller as the user closes the user's hand. It will be appreciated that the user's hand 7020 moving in other translational movements (e.g., other lateral movements) causes the user interface objects 7016' and 7018' to be updated accordingly.

In some implementations, only one of the user interface objects changes orientation according to a change in orientation of the user's hand. For example, the first user interface object 7016 'is oriented relative to the user's hand 7020. For example, as shown in fig. 7B-7C, the user's hand rotates (e.g., to the second orientation in fig. 7C, where the user's palm is more parallel to the representation 7008 ') of the floor. As shown in fig. 7C, the orientation of the first user interface object 7016 'is updated as the orientation of the user's hand changes (while the orientation of the second user interface object 7018 'is not updated as the orientation of the user's hand changes). For example, the second user interface object 7018 'is displayed in an orientation that is independent of the user's hand (e.g., the second user interface object 7018 'continues to be displayed parallel to the bottom edge of the representation 7008' of the floor).

Thus, while the positioning of both user interface objects 7016 'and 7018' is updated based on the translational movement of the user's hand, only one of the user interface objects (e.g., user interface object 7016' displayed within the palm of the user's hand) is updated in accordance with a change in the orientation (e.g., pose) of the user's hand 7020.

In some implementations, the importance (e.g., blurring, fading, and/or shrinking) of the first user interface object 7016' (e.g., and/or the second user interface object 7018 ') is visually impaired when the user's hand rotates, tilts, and/or closes (e.g., the user interface object 7016' is displayed in the palm of the user's hand and is visually impaired as the user closes the user's hand to cover the user's palm). In some embodiments, as shown in fig. 7C, the importance of the representation of the user's hand is optionally also visually impaired when the user's hand is rotated, tilted, and/or closed.

In some implementations, the second user interface object 7018 'is displayed in a three-dimensional environment at a location that is remote from a location corresponding to a location at or near the user's hand 7020 in the physical environment 7000. For example, the second user interface object 7018' corresponds to a location at or near a hand of the user (e.g., hand 7020 or another hand).

In some implementations, the second user interface object 7018 'is oriented such that downward is according to gravity in the physical environment 7000 (e.g., text will appear right-side-up when the ground is at the bottom of the user's current view of the three-dimensional environment). For example, the second user interface object 7018 'includes text such that the bottom of the text points to the floor 7008 (e.g., and a representation of the floor 7008'). Thus, even when the user changes the orientation of the user's head (e.g., rotates sideways), the second user interface object is still oriented downward relative to the ground.

In some implementations, the second user interface object 7018' is oriented based on the current head positioning of the user (e.g., not based on a floor). For example, when the user tilts (e.g., sideways) the user's head, the second user interface object 7018' is displayed to follow the tilt of the user's head (e.g., to always appear right-side-up relative to the user's current head pose). Thus, even when the user tilts their head, the displayed text appears right-side up to the user (e.g., such that the user's current view of the three-dimensional environment does not have a floor parallel to the bottom of the user's current view).

In some embodiments, the computer system provides the option to anchor the user interface object (e.g., the first user interface object 7016 'and/or the second user interface object 7018') to the three-dimensional environment 7000 '(e.g., instead of anchoring the user interface object to the user's hand 7020). For example, a button is provided (e.g., in the palm and/or hand of the user) that, when selected by the user, places the selected user interface object into the three-dimensional environment 7000 'such that the user interface object is positioned relative to the three-dimensional environment (e.g., independent of movement of the user's hand). For example, the user interface object is anchored to an object in the three-dimensional environment and/or the user interface object is placed at a predefined location within the current viewpoint in the three-dimensional environment 7000' (e.g., the object is displayed in the upper left corner of the current viewpoint).

In some implementations, controls are displayed in the representation 7020 'of the user's hand to switch between minimizing (e.g., stopping display of) and displaying the user interface object in the three-dimensional environment according to the user interface object being placed in the three-dimensional environment. In some implementations, in accordance with a determination that the user interface object is placed in a three-dimensional environment, the user interface object continues to be displayed in response to the user's hand moving out of the current view (e.g., until the user switches the user interface object to cease display). In some implementations, in response to determining that the user interface object is not placed in the three-dimensional environment 7000 '(e.g., and remains anchored to the user's hand), the computer system stops displaying the user interface object (e.g., until the user's hand returns to within the view) in response to the user's hand moving out of the current view.

Fig. 7E-7F are block diagrams illustrating controls displayed for a user when the user is engaged in a shared communication session, according to some embodiments. Fig. 7E-7F illustrate a shared three-dimensional environment (e.g., environment 7207', another VR, AR, or XR environment described with respect to fig. 7A-7D) accessed by (e.g., displayed for) multiple users, each viewing the shared three-dimensional environment with a respective display generating component. For example, a first user (e.g., user 7002, fig. 7A) views the shared three-dimensional environment 7207' via a first display generating component (e.g., display generating component 7100, a first display generating component described with respect to fig. 7A-7D, or other display generating component) in communication with a first computer system (e.g., computer system 101, a computer system described with respect to fig. 7A-7D), and a second user views the shared three-dimensional environment via a second display generating component (e.g., display generating component 7102) in communication with a second computer system (e.g., associated with the second user).

As shown in fig. 7E-7F, the current display view of the three-dimensional environment 7207' for the first user on the first display generating component 7100 also includes one or more user interface objects (e.g., user interface object 7204', user interface object 7206', and/or other user interface objects or virtual objects) displayed at various locations in the three-dimensional environment (e.g., locations corresponding to respective locations of physical objects or surfaces, or locations not corresponding to locations of physical objects and surfaces). In some embodiments, the behavior of the user interface described with respect to fig. 7E-7F (e.g., three-dimensional environment 7207 'and user interface objects) also applies to the user interface as described in fig. 7A-7D and 7G-7J (e.g., a home page user interface object and a plurality of user interface objects displayed at locations remote from a first location corresponding to the location of the user's hand).

In some embodiments, users active in (e.g., participating in, viewing, and/or participating in) the shared three-dimensional environment are represented within the shared three-dimensional environment 7207'. For example, in fig. 7E-7F, three users are active, each user having an associated representation of the user in a three-dimensional environment 7207': a first user representation 7200', a second user representation 7203', and a third user representation 7205'.

In some embodiments, as shown in fig. 7E, representations of users are arranged within a shared three-dimensional environment 7207 'relative to each other (e.g., such that a respective user views the positioning of other users relative to the respective user's point of view), referred to herein as a coexistence communication session (or a spatial communication session). For example, a first user (e.g., corresponding to the first user representation 7200 ') views the shared three-dimensional environment 7207' via the first display generating component 7100. The viewpoint of the first user includes a representation 7203 'of the second user to the left of the representation 7205' of the third user. In some embodiments, the first display generating component 7100 further comprises a displayed representation 7200 of the first user (e.g., wherein the user's own representation is displayed in a dedicated area (e.g., upper right corner) of the display generating component). In some embodiments, representations of one or more active users in the communication session are not displayed relative to each other (e.g., when the communication session is not a communication session, but the communication session includes active participants displayed in a list or gallery view). In some embodiments, the communication session includes a combination of users participating in the coexistence communication session (e.g., and considering representations of other users as being arranged in a three-dimensional environment relative to each other) and users not considering other users as being arranged in a three-dimensional space relative to each other (e.g., users viewing other participants in a list or gallery view).

In some embodiments, content shared in the three-dimensional environment 7207' is shared among multiple users while the communication session is ongoing. For example, in addition to the user's activity in the shared three-dimensional environment 7207', the users are also enabled to communicate with each other using audio (e.g., via microphones and/or speakers in communication with the user's respective computer system). In some embodiments, in a coexistence communication session (e.g., a spatial communication session), audio received from a respective user is simulated as being received from a location corresponding to a current location of the respective user in a three-dimensional environment. For example, when a user speaks in a communication session, the user's voice sounds as if it came from a region where the user's representation was displayed in a three-dimensional environment.

In some embodiments, the shared three-dimensional environment 7207' is updated in real-time as users communicate with each other in a coexisting communication session (e.g., using audio, physical movement, and/or a sharing application). In some embodiments, users in the coexisting communication session are not co-located with each other in the physical environment (e.g., not within a predefined physical proximity of each other), but rather share a three-dimensional environment 7207'. For example, a user views a shared three-dimensional environment from different physical environments (e.g., the shared three-dimensional environment may include one or more attributes of physical environments of one or more of the users).

In some embodiments, the physical environment of each of the users is displayed (e.g., as passthrough content), and the representations of other users active in the shared three-dimensional environment are enhanced in the displayed physical environment (e.g., displayed as passthrough content). For example, the first user 7002 is located in the physical environment 7000 including the physical object 7014 described with reference to fig. 7A. The physical object is also displayed in the view of the first user of the shared three-dimensional environment 7207 'displayed on the display generating component 7100, but the representation of the physical object 7014 is not displayed in the view of the second user of the shared three-dimensional environment 7207' displayed on the second display generating component 7102 (e.g., because the physical object 7014 is not present in the physical environment of the second user). Thus, in some embodiments, the shared three-dimensional environment 7207' is a mixed reality environment that includes portions of the respective physical environments of the respective users and includes representations (e.g., virtual representations) of other active users (e.g., and/or other virtual objects).

In some embodiments, as described above, the shared three-dimensional environment 7207' includes a representation for each of a plurality of users participating in the coexisting communication session. For example, as shown in three-dimensional environment 7207', representations 7200', 7203', and 7205' of users are positioned relative to one another within the three-dimensional environment, with the positioning of representations of users maintained across each of the user's devices.

For example, as shown in fig. 7F, in accordance with a determination that virtual object 7208' (e.g., a virtual sphere) is shared in three-dimensional environment 7207', each of the users is presented on a respective device of the user, with virtual object 7208' being in a consistent location across each of the devices. For example, user 7200 perceives virtual object 7208 'as being to the right of user's representation 7205', while user 7203 perceives virtual object 7208' as being between user's representation 7200' and user's representation 7205'. Thus, the three-dimensional environment 7207 'is shared among users such that when a user or other virtual object moves within the three-dimensional environment 7207', the movement is reflected in the viewpoint of each of the respective devices (e.g., the display generating components 7100 and 7102).

In some embodiments, as shown in fig. 7E, a representation of the user (e.g., the user himself) is displayed to the user on the display generating component. For example, for the first user 7200 of the display generating component 7100, the computer system displays a representation 7200 of the first user. In some embodiments, the representation 7200 "of the user includes an avatar of the user. In some embodiments, the representation 7200 "of the first user includes a real-time camera view of the user (e.g., captured by one or more cameras of the computer system).

In some embodiments, the computer system displays a control user interface object 7206' corresponding to a communication session (e.g., setting for a communication setting) with the other user. In some implementations, the control user interface object 7206' is a user interface object separate from representations (e.g., representation 7203' and representation 7205 ') of other users in the three-dimensional environment 7104 displayed during the communication session.

In some embodiments, the control user interface object includes one or more affordances for displaying additional content related to the communication session, such as affordances for changing a virtual environment (e.g., virtual landscape) for the communication session. For example, a user is enabled to add a virtual object (e.g., by selecting a control user interface object) to a coexistence communication session (e.g., virtual object 7208') and/or control placement of the virtual object within shared three-dimensional environment 7207. In some embodiments, the control user interface object 7206' is displayed only in a coexisting (e.g., spatial) communication session. For example, if representations of other users are displayed in the list and/or gallery view (e.g., rather than as representations updated relative to each other within the shared three-dimensional environment 7207 'as in the coexistence communication session), then no control user interface object 7206' (e.g., as well as controls for changing the immersive experience of the shared three-dimensional environment) is displayed, as described below.

For example, the shared three-dimensional environment 7207' (e.g., in the control user interface object 7206 ') provides an option for adjusting virtual properties (e.g., immersive experience) of the shared three-dimensional environment 7207 '. For example, the shared three-dimensional environment can be displayed with one or more topics, which are referred to herein as an immersive experience applied to the three-dimensional environment 7207' (e.g., the immersive experience includes an immersive animation or environment). For example, the user is provided with options for adding, removing, and/or changing virtual scenery, virtual lighting, and/or virtual wallpaper in the three-dimensional environment. In some implementations, the immersive experience is applied to all users participating in the coexistence communication session in response to user selection to change the current immersive experience (e.g., a respective display generation component for each participating user displays virtual content for the immersive experience).

In some embodiments, the immersive experience is displayed only to the display generation component of the user selecting the immersive experience (e.g., the immersive experience is not shared with other users in the coexistence communication session). In some implementations, the user is also provided with one or more options to share and/or remove (e.g., cease to share) the current immersive experience. Thus, the user is enabled to share (e.g., by selecting the first control option) an immersive experience to be applied to all users participating in the coexisting communication session, and to remove the immersive experience from being applied in the user's view and/or to cease sharing the immersive experience with other users in the coexisting communication session.

FIG. 7E illustrates a user input (e.g., or a series of user inputs) for selecting an affordance (e.g., included in control user interface object 7206') for changing a virtual landscape. For example, the user's gaze in combination with the hand gesture selects affordance 7206-1'. In some embodiments, when the user is focusing on an affordance in the control user interface object, the affordance is visually emphasized (e.g., enlarged, sketched, and/or highlighted) relative to other affordances displayed in the control user interface object to provide the user with an indication of which affordance the user has currently selected (e.g., via the user's detected gaze). For example, when the user is looking at affordance 7206-1', affordance 7206-1' is enlarged, indicating (in response to additional user input such as a gesture) that affordance 7206-1' is to be selected by the user. In response to a selection of affordance 7206-1 'by the user (e.g., using a gesture performed with the user's hand 7020), the immersive experience (e.g., virtual landscape) is updated, as shown in fig. 7F. For example, virtual floor 7008-t 'is displayed with a different virtual illumination and virtual object 7208' is displayed. In some implementations, the virtual object 7208' is part of an immersive experience in which the user selects an update. For example, changing the immersive experience includes updating virtual lighting, scenery, and/or wallpaper displayed in the three-dimensional environment 7207', including adding virtual objects included in the virtual scene. In some embodiments, other affordances displayed in control user interface object 7206' include other options for the immersive experience (e.g., changing the virtual scenery to include various other virtual lighting, objects, and/or wallpaper).

In some implementations, the control user interface object 7206' includes one or more selectable user interface objects (e.g., affordances) for controlling the level of immersion of the immersive experience. For example, a user is enabled to change (e.g., as a passthrough content) the amount of physical environment displayed and to change the amount of virtual content displayed in a three-dimensional environment. For example, a user selects a user interface object to increase the amount of passthrough content (e.g., decrease the immersion level), which causes the computer system to cease displaying at least a portion of the virtual content (and display more of the user's physical environment).

In some embodiments, the computer system displays an additional control 7204' for controlling the setting of the first user in the shared communication session. For example, the additional control 7204' provides a button for the user to mute the user in the communication session (e.g., turn off the user's microphone) (and/or mute other users participating in the communication session), hide the user's representation (e.g., stop the display of the avatar or representation of the user in the communication session), record the communication session, and end the communication session, add a message (e.g., text message) (e.g., add to another user), and/or start a collaborative drawing session.

For example, the computer system displays an option to add a message to be displayed in the shared communication session, causing the message to be displayed in the shared three-dimensional environment 7207'. In some embodiments, the message is shared with all users participating in the shared communication session. In some embodiments, messages (e.g., private messages sent to one or more other users in the shared communication session) are shared with only a subset (less than all) of the users participating in the shared communication session. In response to a user adding a message, the message is displayed in a current view displayed on a corresponding display generating component of the user with whom the message is shared.

In some embodiments, the computer system displays an option to start a collaborative drawing session, enabling participating users in the communication session to add and modify drawings to be viewed by active users in the communication session. In some embodiments, an option to start a collaborative mapping session is displayed for a spatial communication session and is not displayed for communications that are not spatial communication sessions. For example, if representations of participating users are shown in a list and/or gallery view (e.g., as opposed to representations displayed relative to other participants in a shared three-dimensional environment), then no option is provided to start a collaborative drawing session. In some implementations, a user is enabled to draw on (e.g., add content to) a collaborative drawing session using air gestures (e.g., gestures performed with the user's hand 7020 detected by one or more sensors and/or input devices of a computer system).

In some embodiments, the computer system displays additional information for the shared communication session. For example, one or more applications (e.g., application windows) are shared during the communication session such that the one or more applications (e.g., application windows for the applications) are displayed in the shared three-dimensional environment 7207' (e.g., and updated in real-time on each respective display generating component for respective users participating in the shared communication session). In some embodiments, the computer system optionally displays an indication of one or more active applications (e.g., applications shared in an ongoing communication session). For example, an indication of the active application is displayed in the user's current view even though the user is not currently viewing an application window for the active application (e.g., an application window is not displayed in a portion of the shared three-dimensional environment 7207' in the user's current view). In some embodiments, an indication of a user initiating sharing of the respective application is displayed. For example, an indication of the user (e.g., the user's name, the user's avatar, and/or the user's initials) is displayed with (e.g., overlapping or near) the corresponding activity application (and/or with an indication of the activity application).

In some embodiments, the computer system displays an option to remove (e.g., close) the shared application, and in response to a user selecting the option to remove the shared application, stops sharing and/or displays the shared active application. In some embodiments, only users sharing the application are provided with the option to remove the shared application. In some implementations, an option to remove the shared application is displayed (e.g., and selectable by) for all participating users in the shared communication session. Thus, in some embodiments, the user is provided with a different control for interacting with the shared three-dimensional environment (e.g., the control displayed for the first user is not displayed for the second user).

In some embodiments, the computer system displays an indication of the current immersion level. For example, a computer system provides information about how many physical environments are displayed using the perspective as compared to how many three-dimensional environments are virtual (e.g., generated). As described above, in some embodiments, a user is enabled to control the current immersion level (e.g., and control the immersive experience).

In some embodiments, as described above, representations of other users participating in the communication session (e.g., representation 7203 'of the second user and representation 7205') of the third user are displayed, and additional controls for controlling the participants are displayed (e.g., with representations of the participating users). For example, the display generation component displays an option to remove the user from the shared communication session (e.g., including the user himself). In some embodiments, the computer system further displays status indicators for other users in the communication session. For example, the computer system displays an indication of the type of device being used by each of the participants in the communication session to participate in the communication session and/or an indication of whether the participant is currently active or inactive.

In some embodiments, the computer system enables the user to request spatial cues that identify a particular application and/or where the user is located within the three-dimensional environment 7207'. For example, a user provides input (e.g., taps input) to a particular representation of an application and/or a representation of another user participating in a shared communication session, and in response to the input, the computer system plays a spatial audio cue (e.g., the audio cue is simulated as coming from the application and/or the other user) to indicate where the application and/or user is located within a three-dimensional environment relative to the user. For example, the application window and/or the representation of the participating user are located at a certain location in the three-dimensional environment, but are not displayed in the user's current view.

In some embodiments, the indication of the application and/or the participating user is displayed in the current view of the user such that even if the representation of the application window and/or the participating user is not within the current view of the user, the indication of the active application and/or the participant is displayed (e.g., as a list, or in a gallery view) such that the user may request a spatial audio prompt (e.g., and/or a spatial visual prompt) indicating where the representation of the application window and/or the participating user is located in the three-dimensional environment relative to the current view of the user. For example, the user perceives the spatial audio cues as audio from the location in the three-dimensional environment where the application window and/or the representation of the participating user is located. In some embodiments, the requested prompt is a visual prompt, and the computer system displays a visual prompt (e.g., an arrow or another visual indication) that indicates where the application and/or user is located within the three-dimensional environment relative to the user.

In some implementations, in response to a user looking away from the control user interface object 7206 '(e.g., no gaze input is detected), the computer system stops the display of the user interface object 7206'. For example, as shown in fig. 7E, the first user 7202 is looking at an affordance in a control user interface object 7206', which is still displayed for the first user. The second user (e.g., using display generating component 7102) is not looking at the display generating component, and the computer system for the second user is not controlling user interface object 7206' for the second user display. For example, in response to the second user being gazing at the display generation component, a control user interface object 7206' is displayed for the second user on the display generation component 7102. In some implementations, the user is enabled to close the control user interface object 7206 '(e.g., by performing a gesture to remove the control user interface object 7206', such as a swipe down air gesture).

Fig. 7G-7J are block diagrams illustrating user interface objects in a modal user interface object displayed to a user while the user is focusing on the modal user interface object. For example, according to some embodiments, if a modal user interface object is placed in front of a user within a predefined portion of a three-dimensional environment, the modal user interface object is automatically dismissed if the user looks away from the modal user interface object, if the modal user interface object is placed into the three-dimensional environment (e.g., such that the modal user interface object is anchored to the three-dimensional environment), the modal user interface object is not automatically dismissed if the user looks away from the modal user interface object. In some embodiments, the modal user interface objects include control panel user interface objects (e.g., first user interface object 7016', second user interface object 7018', or control user interface object 7206 ').

As shown in fig. 7G-7J, a three-dimensional environment (e.g., environment 7000', an environment as described with respect to fig. 7A-7D and/or 7E-7F, another VR, AR, or XR environment) is displayed via a display generating component (e.g., display generating component 7100, a first display generating component as described with respect to fig. 7A-7D and/or 7E-7F, or other display generating component) in communication with a computer system (e.g., computer system 101, a computer system as described with respect to fig. 7A-7D and/or 7E-7F). As shown in fig. 7G-7J, the current display view of the three-dimensional environment 7000' includes one or more user interface objects (e.g., user interface object 7304' (e.g., modal user interface object), and one or more of modal user interface object 7304', user interface object 7302', and/or other user interface objects or virtual objects make up user interface objects 734-1 and 7304-2) displayed at various locations in the three-dimensional environment 7000' (e.g., locations corresponding to respective locations of physical objects or surfaces, or locations not corresponding to locations of physical objects and surfaces).

As shown in fig. 7G-7J, the computer system displays a view of a three-dimensional environment (e.g., an environment 7000', a virtual three-dimensional environment, an augmented reality environment, a perspective view of a physical environment, or a camera view of a physical environment), such as described above with reference to fig. 7A-7B. In some embodiments, the three-dimensional environment is a virtual three-dimensional environment without a representation of a physical environment. In some embodiments, the three-dimensional environment is a mixed reality environment that is a virtual environment augmented by sensor data corresponding to a physical environment. In some embodiments, the three-dimensional environment is an augmented reality environment that includes representations of at least a portion of the physical environment surrounding the one or more virtual objects and the first display generating component (e.g., representations 7004 'and 7006' of walls, representation 7008 'of floors, and/or representation 7014' of physical objects). In some implementations, the representation of the physical environment includes a camera view of the physical environment. In some embodiments, the representation of the physical environment includes a view of the physical environment through a transparent or translucent portion of the first display generating component.

In fig. 7G, the computer system displays a first user interface object 7304' (e.g., a modal user interface object) that includes a plurality of constituent user interface objects, including user interface objects 7304-1 and 7304-2. In some implementations, the user interface object 7304' is displayed in the three-dimensional environment 7000' anchored at a location within the three-dimensional environment (e.g., anchored to a location of the user interface object in front of the representation 7004' of the wall). In some embodiments, as the user's head and/or the user's torso move (e.g., rotate or move laterally), user interface object 7304 'continues to be displayed at the same location in the three-dimensional environment relative to other objects displayed in the three-dimensional environment (e.g., independent of the user's movement). In some embodiments, the positioning of user interface object 7304' is positioned at greater than a predefined distance from the user in a three-dimensional environment. For example, when user interface object 7304 'is anchored within a three-dimensional environment, user interface object 7304' is positioned at a distance from the user that is greater than the arm length.

Fig. 7H illustrates a second user interface object 7302' (e.g., a modal user interface object) displayed at a location determined from the current viewpoint of the user in a three-dimensional environment. In some embodiments, user interface object 7302 'is bound to the user's head and/or torso in physical environment 7000. In some embodiments, user interface object 7304' is displayed within the user's personal space (e.g., within a distance from the length of the user's arm). For example, as the user's head and/or torso move (e.g., rotate or traverse), user interface object 7302' continues to be displayed at the same location relative to the user (e.g., within the user's personal space). Thus, even when the user moves within the physical environment, the user interface object 7302 'may be easily accessed by the user by allowing the user to interact with the user interface object 7302' displayed at a location in front of the user.

In some embodiments, user interface object 7304 'is displayed while the user is looking at a portion of user interface object 7304'. For example, in fig. 7G, the user is looking at user interface object 7304-1 (e.g., constituent user interface objects within user interface object 7304'). In some embodiments, in response to detecting a gaze on a portion of user interface object 7304-1, the computer system modifies one or more visual properties of user interface object 7304-1. For example, in fig. 7G, user interface object 7304-1 is displayed in outline (e.g., highlighted). Additional and/or alternative visual attributes may also be applied to a user interface object at which the user is currently looking (e.g., changing the size of the object (e.g., magnifying the object relative to other displayed virtual and/or physical objects), and/or changing the opacity and/or translucency of the object).

In some embodiments, user interface object 7304 'continues to be displayed in the three-dimensional environment even after it is determined that the user is no longer focusing on user interface object 7304'. For example, in fig. 7H, the user has changed the user's attention (e.g., as indicated by the dashed line from the user's eyes) to another user interface object 7302-1. Because user interface object 7304 'is anchored to a location in the three-dimensional environment (e.g., a location in front of representation 7004' where user interface object 7304 'is anchored to a wall), user interface object 7304' remains displayed while the portion of the three-dimensional environment to which user interface object 7304 'is anchored remains within the user's current view of the three-dimensional environment.

In some implementations, as shown in fig. 7G, the user performs a selection input (e.g., gaze and/or gesture using the user's hand 7020) directed to user interface object 7304-1. In response to detecting the selection input, the computer system displays (e.g., opens and/or initiates) a second user interface object 7302', as shown in fig. 7H. For example, user interface object 7304-1 is an application icon that, when selected, opens a user interface object (e.g., user interface object 7302') for an application of the application icon. In some embodiments, the user interface object 7302 'is open (e.g., displayed) at a predefined portion of the user's current view in the three-dimensional environment, the predefined portion of the user's current view corresponding to a portion of the three-dimensional environment directly in front of the user (and/or at the bottom of the user's current view).

In some implementations, the second user interface object 7302' is displayed in response to the user directing the user's attention to a predefined portion of the user's current view in the three-dimensional environment (e.g., the user gazes at a region in the three-dimensional environment directly in front of the user). For example, the second user interface object 7302' is a control mode that is displayed in front of the user (e.g., in front of the user at the bottom of the user's current view) when the user looks at a predefined portion of the user's current view. It will be appreciated that the predefined portion of the user's current view may be located at other locations within the three-dimensional environment (e.g., in front of the user at the top of the user's current view or at the left edge of the user's current view). In some embodiments, the predefined portion of the user's current view displaying user interface object 7302' is within a predefined distance from the user (e.g., near the user), such as within reach of the user's arm.

In some embodiments, user interface object 7302-1 (e.g., relative to other user interface objects displayed in user interface object 7302 ') is visually emphasized when the user is focused on user interface object 7302-1 displayed in user interface object 7302'. For example, user interface object 7302-1 is enlarged and sketched (e.g., or otherwise highlighted) to indicate that the user's attention is currently detected as pointing to user interface object 7302-1.

In some embodiments, the computer system adds virtual object 7306 to the three-dimensional environment in response to the user performing user input selecting user interface object 7302-1 (e.g., by gazing and/or performing a gesture). For example, user interface object 7302-1 is a control for adding virtual object 7306 to a three-dimensional environment.

In some embodiments, as shown in fig. 7I, in response to determining that the user is not looking at user interface object 7302-1, but the user continues to focus on a portion of user interface object 7302', user interface object 7302' remains displayed at a predefined portion of the three-dimensional environment.

Fig. 7J shows a representation 7014 'of a user's attention to a physical object. In some embodiments, in response to the user no longer focusing on user interface object 7302', as shown in fig. 7J, the importance of user interface object 7302' is visually impaired. In some embodiments, user interface object 7302' disappears (e.g., is no longer displayed).

In some embodiments, the user is enabled to change the positioning of the user interface object 7302' in the three-dimensional environment (e.g., and change the anchor point of the user interface object). For example, the user may move user interface object 7302' from a predefined portion of the three-dimensional environment (e.g., in front of the user) to a location within the three-dimensional environment. In some implementations, in response to a user moving user interface object 7302 'outside of a predefined portion of the three-dimensional environment, user interface object 7302' is anchored to a location within the three-dimensional environment (e.g., rather than to a predefined position updated relative to a current view of the user of the three-dimensional environment). For example, after the user interface object 7302' has been anchored to the three-dimensional environment, the user interface object 7302' does not follow the user's current view as the user moves in the physical environment. For example, user interface object 7302 'has a behavior similar to user interface object 7304' (e.g., it is anchored to a three-dimensional environment).

In some implementations, the user selects where to place the user interface object 7302' within the three-dimensional environment (e.g., outside of the predefined portion). For example, the user selects a location to anchor the user interface object 7302' in a three-dimensional environment. For example, the user may select a location within the three-dimensional environment that is not shown in the current view of FIG. 7J (e.g., the user may select user interface object 7302' (e.g., using a pinch gesture) and drag the user interface object to a location within the three-dimensional environment (e.g., and release the pinch gesture and/or otherwise perform a gesture that places the user interface object at the location). E.g., the user interface object is anchored to a location where the user has finished dragging input (e.g., where the user placed the user interface object).

In some implementations, a user is enabled to bring a located user interface object anchored in three dimensions (e.g., independent of the user's current view) into a predefined portion of the three-dimensional environment, such that the user interface object is subsequently anchored to the user's current view of the three-dimensional environment (e.g., and follows the user as the user moves in the physical environment). For example, the user may move user interface object 7304 'to a predefined portion in front of the user (e.g., where user interface object 7302' is displayed) to anchor user interface object 7304 'to the user's current view (and with the same behavior as described above with reference to user interface object 7302 '), such as being visually impaired in importance when the user is not focusing on the user interface object that is anchored to the user's current view.

In some embodiments, when a user interface object is anchored to a three-dimensional environment, other users (e.g., participating in a communication session, as described above) are able to view the user interface object. For example, when a user interface object is anchored to a three-dimensional environment during a communication session, the user interface object is in a common world view.

In some embodiments, while the user interface object is anchored to the predefined portion relative to the user in the three-dimensional environment (e.g., in front of the user) during the communication session, other users cannot view the user interface object (e.g., user interface objects placed within the predefined portion anchored to the user's current view are private to the user and not shared/viewable by other users).

In some implementations, the user is enabled to change the positioning of the user interface object 7302 'from being anchored to a predefined portion of the three-dimensional environment (e.g., in front of the user) to being anchored to a portion of the user's body. For example, the user is enabled to anchor the user interface object to the user's hand (e.g., similar to user interface objects 7016' and/or 7018' described with reference to fig. 7C-7D). In some embodiments, when user interface object 7302' is anchored to the user's hand, user interface object 7302' is displayed only when the user's hand is in the user's current view of the three-dimensional environment. In some implementations, a user is enabled to pinch (e.g., select) a user interface object and drag the user interface object from its anchored position in a predefined portion of the three-dimensional environment to the user's hand (e.g., where drag input is released over the user's hand so that the user interface object is anchored to the user's hand). In some implementations, the user is enabled to grasp the user interface object from the predefined portion (e.g., by extending with the user's hand 7020 and closing a fist over the user interface object), and in response to grasping the user interface object, the user interface object is anchored to the user's hand. In some embodiments, a selectable user interface icon is displayed with a representation of the user's hand, the selectable user interface icon corresponding to an option to move a user interface object to be anchored to the user's hand. For example, in response to a user selection (e.g., via gaze input or pinch input) of a selectable user interface icon, a user interface object anchored to a predefined portion of the three-dimensional environment (e.g., in front of the user) is moved to anchor to the user's hand (e.g., without the user having to drag the user interface object into the user's hand).

In some implementations, a user is enabled to move a user interface object from anchored to a user's hand to a predefined portion of the three-dimensional environment (e.g., in front of the user) and/or to anchor the user interface object within the three-dimensional environment. In some implementations, the user input for moving the user interface object from anchored to the user's hand to being placed (relative to the user's current view) in a predefined portion of the three-dimensional environment includes pinch and grab gestures. For example, the user input includes selecting a user interface object anchored to the user's hand (e.g., using a pinch gesture) and simulating throwing the user interface object (e.g., extending the user's hand 7020 and opening a fist for release) into the three-dimensional environment (e.g., within a predefined portion of the three-dimensional environment and/or within other portions).

In some embodiments, according to some embodiments, the input gestures used in the various examples and embodiments described herein (e.g., with respect to fig. 7A-7J and 8-10) include air gestures performed by movement of a user's finger with respect to other fingers or portions of the user's hand for interacting with a virtual or mixed reality environment.

In some embodiments, according to some embodiments, the input gestures (e.g., air gestures) used in the various examples and embodiments described herein (e.g., with respect to fig. 7A-7J and 8-10) include pinch inputs and tap inputs for interacting with a virtual or mixed reality environment. For example, pinch and tap inputs described below are performed as air gestures.

In some embodiments, according to some embodiments, the input gestures used in the various examples and embodiments described herein (e.g., with respect to fig. 7A-7J and 8-10) optionally include discrete small motion gestures (which are performed by moving a user's finger relative to other fingers or portions of the user's hand), without optionally requiring a larger movement of the user's entire hand or arm away from its natural position and posture to perform operations immediately before or during making gestures for interacting with a virtual or mixed reality environment.

In some embodiments, the input gesture is detected by analyzing data or signals captured by a sensor system (e.g., sensor 190, FIG. 1; image sensor 314, FIG. 3). In some embodiments, the sensor system includes one or more imaging sensors (e.g., one or more cameras, such as a motion RGB camera, an infrared camera, and/or a depth camera). For example, the one or more imaging sensors are components of a computing system (e.g., computing system 101 (e.g., display generating component 7100, 7102, or HMD) in fig. 1) or provide data to the computing system, the computing system including a display generating component (e.g., display generating component 120 in fig. 1, 3, and 4) (e.g., a touch screen display, a stereoscopic display, and/or a display with a transmissive portion that serves as a display and a touch sensitive surface) in some embodiments, the one or more imaging sensors include one or more rear-facing cameras on a side of the device opposite the display of the device in some embodiments, a sensor system (e.g., a VR headset that includes a stereoscopic display that provides a left image for a left eye of a user and a right image for a right eye of a user) of the head-mounted system. One or more imaging sensors are attached to an interior surface of the automobile. In some embodiments, the sensor system includes one or more depth sensors (e.g., a sensor array). For example, the one or more depth sensors include one or more light-based (e.g., infrared) sensors and/or one or more sound-based (e.g., ultrasonic) sensors. In some embodiments, the sensor system includes one or more signal emitters, such as light emitters (e.g., infrared emitters) and/or sound emitters (e.g., ultrasonic emitters). For example, as light (e.g., light from an infrared light emitter array having a predetermined pattern) is projected onto a hand (e.g., hand 7102), an image of the hand under illumination of the light is captured by the one or more cameras and the captured image is analyzed to determine the position and/or configuration of the hand. Using signals from an image sensor pointing at the hand to determine an input gesture, rather than using signals from a touch-sensitive surface or other direct contact mechanism or proximity-based mechanism, allows the user to freely choose whether to perform a large motion or remain relatively stationary while providing an input gesture with his/her hand, without being subject to the limitations imposed by a particular input device or input area.

In some embodiments, the tap input is optionally a tap input of the thumb of the user's hand over the index finger (e.g., on a side of the index finger adjacent to the thumb). In some embodiments, tap input is detected without the need to lift the thumb from the side of the index finger. In some embodiments, the tap input is detected in accordance with determining a downward movement of the thumb followed by an upward movement of the thumb, wherein the thumb contacts the contact of the index finger for less than a threshold amount of time. In some implementations, the tap-hold input is detected in accordance with determining a time that the thumb is moved from the raised position to the stroked position and held in the stroked position for at least a first threshold amount of time (e.g., a tap time threshold or another time threshold that is longer than the tap time threshold). In some embodiments, the computer system requires that the hand as a whole remain substantially stationary in position for at least a first threshold amount of time in order to detect a tap-hold input by the thumb on the index finger. In some embodiments, touch-and-hold input is detected without requiring the hand as a whole to remain substantially stationary (e.g., the hand as a whole may move while the thumb rests on the side of the index finger). In some embodiments, a tap-hold-drag input is detected when the thumb touches the side of the index finger and the hand as a whole moves while the thumb rests on the side of the index finger.

In some implementations, the flick gesture is optionally a push or flick input by movement of the thumb across the index finger (e.g., from the palm side to the back side of the index finger). In some embodiments, the stretching movement of the thumb is accompanied by an upward movement of the side away from the index finger, for example, as in an upward flick input by the thumb. In some embodiments, during forward and upward movement of the thumb, the index finger moves in a direction opposite to that of the thumb. In some embodiments, the reverse flick input is performed by movement of the thumb from an extended position to a retracted position. In some embodiments, during the rearward and downward movements of the thumb, the index finger moves in a direction opposite to the direction of the thumb.

In some embodiments, the swipe gesture is optionally a swipe input by movement of the thumb along the index finger (e.g., along a side of the index finger adjacent to the thumb or on that side of the palm). In some embodiments, the index finger is optionally in an extended state (e.g., substantially straight) or a curled state. In some embodiments, during movement of the thumb in the swipe input gesture, the index finger moves between the extended state and the curled state.

In some embodiments, different phalanges of the various fingers correspond to different inputs. Tap inputs of the thumb over various phalanges of various fingers (e.g., index finger, middle finger, ring finger, and optionally, pinky) are optionally mapped to different operations. Similarly, in some embodiments, different push or click inputs may be performed by a thumb across different fingers and/or different portions of the fingers to trigger different operations in the respective user interface contexts. Similarly, in some embodiments, different swipe inputs performed by the thumb along different fingers and/or in different directions (e.g., toward the distal or proximal ends of the fingers) trigger different operations in the respective user interface contexts.

In some implementations, the computer system treats tap input, flick input, and swipe input as different types of input based on the type of movement of the thumb. In some implementations, the computer system treats input having different finger positions tapped, touched, or swiped by a thumb as different sub-input types (e.g., proximal, middle, distal sub-types, or index, middle, ring, or little finger sub-types) of a given input type (e.g., tap input type, flick input type, or swipe input type). In some embodiments, the amount of movement performed by moving a finger (e.g., thumb) and/or other movement metrics associated with movement of the finger (e.g., speed, initial speed, ending speed, duration, direction, and/or movement pattern) are used to quantitatively affect the operation triggered by the finger input.

In some embodiments, the computer system identifies combination input types that combine a series of movements by the thumb, such as a tap-swipe input (e.g., a press of the thumb on the finger followed by a swipe along that side of the finger), a tap-flick input (e.g., a press of the thumb over the finger followed by a flick across the finger from the palm side to the back side of the finger), and a double-tap input (e.g., two consecutive taps on that side of the finger at about the same location).

In some implementations, the gesture input is performed by the index finger instead of the thumb (e.g., the index finger performs a tap or swipe on the thumb, or the thumb and index finger move toward each other to perform a pinch gesture). In some implementations, wrist movement (e.g., flicking of the wrist in a horizontal or vertical direction) is performed immediately before the finger movement input, immediately after the finger movement input (e.g., for a threshold amount of time), or concurrently with the finger movement input, as compared to a finger movement input without a modification input by wrist movement, to trigger additional, different, or modified operations in the current user interface context. In some embodiments, a finger input gesture performed with a user's palm facing the user's face is considered a different type of gesture than a finger input gesture performed with a user's palm facing away from the user's face. For example, an operation performed with a flick gesture performed with a user palm facing the user's face has increased (or decreased) privacy protection compared to an operation (e.g., the same operation) performed in response to a flick gesture performed with a user palm facing away from the user's face.

Although in the examples provided in this disclosure one type of finger input may be used to trigger a certain type of operation, in other embodiments other types of finger input are optionally used to trigger the same type of operation.

Additional description regarding fig. 7A-7J is provided below with reference to methods 8000, 9000 and 10000 described in connection with fig. 8-10 below.

FIG. 8 is a flow chart of a method 8000 for displaying a first user interface object and a second user interface object in respective spatial relationships with respect to an anchor location corresponding to a user's hand, and in response to detecting movement of the user's hand, panning both the first user interface object and the second user interface object with a panning movement of the user's hand and rotating the first user interface object but not the second user interface object with a rotating movement of the user's hand, according to some embodiments.

In some embodiments, method 8000 is performed at a computer system (e.g., computer system 101 in fig. 1, or a computer system described with respect to fig. 7A-7D, 7E-7F, 7G-7J). In some embodiments, the computer system is in communication with a first display generating component (e.g., display generating component 7100, another display generating component, a heads-up display, a head-mounted display (HMD), a display, a touch screen, and/or a projector) and one or more input devices (e.g., a camera or other sensor and input device that detects movement of a user's hand in a physical environment, movement of the user's entire body in a physical environment, and/or movement of the user's head in a physical environment; e.g., a controller, touch-sensitive surface, joystick, buttons, glove, watch, motion sensor, and/or orientation sensor). In some embodiments, the first display generating component is the display generating component 7100 described with respect to fig. 7A-7D, 7E-7F, and 7G-7J. In some embodiments, the first display generating means is a heads-up display that does not move or rotate with the user's head or the entire body of the user, but optionally changes the user's point of view into a three-dimensional environment according to the movement of the user's head or body relative to the first display generating means. In some embodiments, the first display generating component is optionally moved and rotated by the user's hand relative to the physical environment or relative to the user's head, and the viewpoint of the user is changed into the three-dimensional environment according to the movement of the first display generating component relative to the user's head or face or relative to the physical environment. According to some embodiments, many of the features of method 8000 are described with respect to fig. 7A-7D.

Method 8000 involves displaying different user interface elements that are tied to the position of the user's hand that is also displayed such that one of the user interface elements changes as the orientation of the user's hand changes, while the other user interface element does not change orientation as the orientation of the user's hand changes. Automatically adjusting the size and/or position and (optionally) orientation of certain user interface objects relative to the user's hand without adjusting the orientation of other user interface objects relative to the user's hand provides real-time visual feedback as the user moves their hand. Providing improved visual feedback to the user enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user provide proper input and reducing user error in operating/interacting with the system), which in turn reduces power usage and extends battery life of the system by enabling the user to use the system more quickly and efficiently.

In method 8000, the computer system displays (8002) the first user interface object and the second user interface object in a first view of a three-dimensional environment (e.g., an augmented reality environment, a virtual reality environment, and/or an augmented reality environment) via a first display generating component. The respective characteristic locations of the first user interface object in the three-dimensional environment have a first spatial relationship to the first anchor locations in the three-dimensional environment corresponding to the positions of the first hand of the user in the physical environment, and the respective characteristic locations of the second user interface object in the three-dimensional environment have a second spatial relationship to the first anchor locations in the three-dimensional environment corresponding to the positions of the first hand of the user in the physical environment (e.g., the representation 7020 'of the first user interface object relative to the hand of the user is displayed at the first location and the representation 7020' of the second user interface object relative to the hand of the user is displayed at the second location). In some embodiments, the first user interface object includes one or more user interface objects (e.g., user interface objects corresponding to different operations of the computer system, such as system-level operations and/or application-level operations) in a predetermined layout (e.g., a two-dimensional layout or a three-dimensional layout). In some implementations, the first user interface object is a user interface of a corresponding application (e.g., a messaging application, a communication application, or a news application), a user interface of an operating system (e.g., a launch pad or control panel), or a preconfigured set of icons, avatars, graphics, user interface objects, and/or controls disposed in a container object or over a background in a layout. In some implementations, the second user interface object is a user interface of a corresponding application (e.g., a messaging application, a communication application, or a news application), a user interface of an operating system (e.g., a launch pad or control panel), or a preconfigured set of icons, avatars, graphics, user interface objects, and/or controls (e.g., also referred to herein as session user interface objects) disposed in a container object or over a background in a layout. In some implementations, the first user interface object and the second user interface object correspond to the same application. In some implementations, the first user interface object and the second user interface object correspond to different levels of an operational hierarchy in an operating system. In some implementations, the respective characteristic locations of the first user interface object and/or the second user interface object are geometric centers of a two-dimensional arrangement of constituent objects of the first user interface object and/or the second user interface object, respectively. In some embodiments, the characteristic location of the first user interface object and/or the second user interface object is a geometric center of a front side of the three-dimensional arrangement of constituent objects of the first user interface object and/or the second user interface object, respectively. In some embodiments, the first anchor location in the three-dimensional environment corresponding to the location of the first hand of the user is a virtual location of the representation of the first hand. In some embodiments, the first anchor location is a center of a representation of a back of a palm of a user or a hand of the user in a three-dimensional environment.

When a first user interface object and a second user interface object are displayed in a first view of the three-dimensional environment (e.g., their respective property positions having a first spatial relationship and a second spatial relationship corresponding to the position of a first hand of a user are simultaneously displayed), the computer system detects (8004) a first movement of the first hand in the physical environment via one or more input devices. The first movement of the first hand corresponds to translational movement (e.g., sideways, up/down, and/or in depth) and rotational movement (e.g., about an axis corresponding to a wrist of the user) relative to a viewpoint corresponding to a first view of the three-dimensional environment. In some embodiments, the positioning of the viewpoint corresponds to a virtual positioning of the user's face or eyes in a three-dimensional environment. In some embodiments, the positioning of the viewpoint for the respective view of the three-dimensional environment is a virtual positioning from which the respective view of the three-dimensional environment will be visible to a virtual viewer.

In response to detecting (8006) a first movement of the first hand in the physical environment, the computer system translates (8008) the first user interface object and the second user interface object in the three-dimensional environment relative to the viewpoint (e.g., such that a first spatial relationship between respective characteristic locations of the first user interface object and the first anchor locations and a second spatial relationship between respective characteristic locations of the second user interface object and the first anchor locations are maintained in the three-dimensional environment) (e.g., the first user interface object and the second user interface object follow a lateral movement of a representation of the first hand relative to a viewpoint corresponding to a first view of the three-dimensional environment).

In response to detecting (8006) a first movement of the first hand in the physical environment, the computer system rotates (8010) the first user interface object in the three-dimensional environment relative to the viewpoint in accordance with the rotational movement of the first hand in the physical environment, without rotating the second user interface object in the three-dimensional environment (e.g., the first user interface object follows rotation of the first hand relative to the viewpoint, but the second user interface object does not follow rotation of the first hand relative to the viewpoint corresponding to the first view of the three-dimensional environment). In some embodiments, the first spatial relationship shows a first user interface object overlaid on (e.g., or replacing a display of) a portion of a representation of a surface of a first hand of the user, while the second spatial relationship shows a second user interface object having a three-dimensional relationship with respect to the surface of the first hand of the user (e.g., not overlapping with a portion of the surface of the first hand of the user). In some embodiments, the second user interface object is closer to the viewpoint than the representation of the first hand of the user and is in front of the representation of the first hand of the user relative to the viewpoint. In some embodiments, the second user interface object is further from the viewpoint than the representation of the first hand of the user and is behind the representation of the first hand of the user relative to the viewpoint. This is shown in fig. 7B-7D, which illustrate, for example, a first user interface object (e.g., first user interface object 7016 ') and a second user interface object (e.g., second user interface object 7018'). In response to detecting the first movement of the first hand (e.g., as shown by movement of the representation 7020 'of the user's hand), the computer system translates the first user interface object and the second user interface object in the three-dimensional environment relative to the viewpoint in accordance with the translational movement of the first hand in the physical environment (e.g., translates the first user interface object 7016 'and the second user interface object 7018' in accordance with the translational movement of the representation 7020 'of the user's hand). However, the computer system rotates only the first user interface object in the three-dimensional environment relative to the viewpoint according to the rotational movement of the first hand in the physical environment, and does not rotate the second user interface object in the three-dimensional environment (e.g., the first user interface object 7016 'rotates as the representation 7020' of the user's hand rotates, as shown in the various rotational states in fig. 7B-7D, but the second user interface object 7018' does not rotate).

In some implementations, the first spatial relationship requires that the respective characteristic locations of the first user interface object have the same distance to the viewpoint or are closer to the viewpoint than the first anchor locations, and the second spatial relationship requires that the respective characteristic locations of the second user interface object are farther from the viewpoint than the first anchor locations. For example, in some embodiments, a first user interface object appears to overlay or replace a display of a portion of a surface of a representation of a first hand of a user, while a second user interface object appears to be displayed behind the representation of the first hand relative to a viewpoint (e.g., outside of the arm length from the viewpoint and/or at a viewing distance from the viewpoint). In some embodiments, the first user interface object moves toward or away from the viewpoint according to movement of the user's hand toward or away from the user's face, while the second user interface object maintains its distance from the viewpoint independent of movement of the user's hand toward or away from the user's face. In some embodiments, the second user interface object moves laterally and vertically relative to the viewpoint and according to lateral and vertical movement of the first hand of the user, and does not move according to movement of the first hand toward or away from the face of the user or rotation about a hand joint connected to the first hand. This is illustrated in fig. 7B-7D, where a first user interface object (e.g., first user interface object 7016 ') appears closer to the point of view than a first anchor location (e.g., representation 7020' of the user's hand) and a second user interface object (e.g., second user interface object 7018') is farther from the point of view than the first anchor location. Automatically adjusting the size and/or position and (optionally) orientation of certain user interface objects relative to the user's hand without adjusting the orientation of other user interface objects relative to the user's hand provides real-time visual feedback as the user moves their hand in a physical environment, thereby providing improved visual feedback to the user.

In some embodiments, the second user interface object is oriented in the first view of the three-dimensional environment according to a direction in the physical environment (e.g., a downward direction of a layout of the second user interface object is aligned with the direction and/or a major axis of the second user interface object has a relationship to the direction) (e.g., a downward direction in the physical environment (e.g., a gravitational direction and/or a downward direction relative to the first display generating component) or a downward direction relative to an orientation of the user in the physical environment (e.g., when the user is on an upward facing bed, on a sideways facing bed, standing forward facing, or standing left facing)). In some embodiments, the second user interface object maintains an upright orientation relative to the user's head and/or relative to the first display generating component in a current display view of the three-dimensional environment, independent of rotational movement of the user's first hand (e.g., relative to a wrist connected to the first hand in the physical environment). This is illustrated, for example, in fig. 7B-7D, where a second user interface object (e.g., second user interface object 7018 ') is oriented in a first view of the three-dimensional environment according to a direction in the physical environment (e.g., a downward direction of the layout of second user interface object 7018' is aligned with the direction of gravity such that "downward" will point to representation 7008 ') of the floor). The orientation of certain user interface objects is automatically displayed to remain consistent with respect to the physical environment or the current perspective of the user with respect to the physical environment, providing real-time visual feedback as the user changes orientation with respect to the physical environment, even when the perspective of the user changes (e.g., by maintaining the orientation of the user interface objects with respect to the physical environment as the user's view of the physical space rotates), thereby providing improved visual feedback to the user.

In some embodiments, the second user interface object is oriented in the first view of the three-dimensional environment according to an upright direction of the first display generating component in the physical environment (e.g., a downward direction of a layout of the second user interface object is aligned with the direction and/or a major axis of the second user interface object has a relationship with the direction) (e.g., an upright direction of the HMD that corresponds to an upright direction of a head of a user wearing the HMD, an upright direction of the head-up display that corresponds to an upright direction of a head of a user viewing content displayed via the head-up display, or an upright direction of a device having a touch screen display that corresponds to an upright direction of a head of a user lifting the device to view content displayed via the touch screen). In some embodiments, the upright orientation of the first display generating component in the physical environment need not be aligned with the direction of gravity in the physical environment, and is optionally angled relative to the direction of gravity depending on the angle of inclination of the first display generating component relative to the direction of gravity in the forward/rearward and clockwise/counter-clockwise directions. In some embodiments, the first user interface object is oriented independent of the orientation of the first display generating component in the physical environment as compared to the second user interface object, and is oriented according to the orientation of the surface of the first hand of the user (e.g., the palm of the user or the back of the user's hand) in the physical environment. For example, as described with reference to fig. 7C-7D, the second user interface object 7018' is oriented such that text displayed on the second user interface object 7018' appears right-side-up based on the current orientation of the user's head (e.g., or orientation of the HMD). Automatically displaying the orientation of certain user interface objects to be consistent relative to the orientation of the display (such that the orientation of certain user interface objects is maintained when the user rotates the viewpoint (e.g., by rotating the display)) provides real-time visual feedback as the user moves the display, thereby providing improved visual feedback to the user.

In some implementations, when a second user interface object is displayed in a respective view of the three-dimensional environment (e.g., contemporaneously with a first user interface object in a first view of the three-dimensional environment, or when the first user interface object is not displayed in a respective view of the three-dimensional environment), the computer system detects a first user input corresponding to a request to display the second user interface object in the three-dimensional environment, regardless of a second spatial relationship with the first anchor location in the three-dimensional environment. In response to detecting the first user input, the computer system moves the second user interface object away from the respective property location having the second spatial relationship with the first anchor location to a location independent of the position of the first hand of the user in the physical environment. In some implementations, the location independent of the location of the first hand of the user corresponds to a location of the physical environment (e.g., a location selected by the first user input and/or a default location of the second user interface object in the physical environment). In some embodiments, the computer system, via the first display generating component, shows the second user interface object as a location independent of the position of the first hand of the user in the physical environment that continues to be anchored to the second user interface object in accordance with movement of the viewpoint of the current display view of the three-dimensional environment. In some implementations, the first anchor locates movement (e.g., the representation of the user's hand moves according to the user moving the user's hand in the physical environment) and the second user interface object does not move. For example, a second user interface object (e.g., a conversational user interface object) is placed on a representation of a physical object or virtual object that moves in a three-dimensional environment. Before a second user interface object (e.g., user interface object 7018') is placed on such an object (e.g., a representation of a physical object or a virtual object), the object is moved in a three-dimensional environment, while the second user interface object is bound to the representation of the hand and does not move with the object. In some embodiments, after the second user interface object is placed on the object, the second user interface object moves with the object and no longer moves with the hand. Allowing the user to easily change the anchor point of the user interface object using input (e.g., gestures) detected by the computer system (such that the user interface object will no longer follow the user as the user moves) allows the user to control anchor point positioning of the user interface object using gestures without requiring the user to navigate through a complex menu hierarchy, thereby providing the user with additional control options without using additional displayed controls to clutter the user's view.

In some implementations, detecting a first user input corresponding to a request to display a second user interface object in the three-dimensional environment without regard to a second spatial relationship to a first anchor location in the three-dimensional environment includes detecting a first gesture (e.g., an air tap gesture, or a pinch gesture in which two fingers move away from each other from a touch state and/or two fingers move into a touch state) of a second hand of the user. In some embodiments, the first gesture is directed to a first control object in the three-dimensional environment that is displayed at a location corresponding to a position of a first hand of the user in the physical environment (e.g., the first gesture is detected when a gaze input is detected at the first control object displayed at the location corresponding to the position of the first hand, when the first gesture is performed by the second hand at or near a portion of the first hand corresponding to the displayed location of the first control object, and/or when the first control object has an input focus). In some implementations, the first control object is a constituent object of the first user interface object. In some embodiments, the first control object is a constituent object of the second user interface object. In some embodiments, the first control object is separate from the first user interface object and the second user interface object in the three-dimensional environment. For example, as described with reference to fig. 7H-7I, the user selects user interface object 7302-1, and in response to user input, object 7306 is displayed with representation 7014 'of the physical object (e.g., wherein in some embodiments, the positioning of object 7306 was previously tied to the positioning of the representation of the user's hand). Providing an anchor point that allows a user to change certain user interface objects to be set (e.g., placed) within a three-dimensional environment, rather than a user interface object remaining anchored to an initial anchor point or another user interface control (which may be relative to the user's hand), provides the user with additional controls without requiring the user to navigate through a complex menu hierarchy, thereby providing the user with additional control options without cluttering the user's current view, and providing the user with improved visual feedback without requiring additional user input.

In some implementations, when the second user interface object is displayed at a location that is independent of the position of the first hand of the user in the physical environment, the computer system displays the second control object at a respective location in the three-dimensional environment that corresponds to the position of the first hand in the physical environment. In some embodiments, the second control object is the same as the first control object used to send the second user interface object to a location in the three-dimensional environment that is independent of the position of the first hand of the user. In some embodiments, the second control object is a component object in the first user interface object that has a first spatial relationship with a first anchor location corresponding to a position of the first hand in the physical environment. In some embodiments, the second control object is a newly displayed object that is separate from the first user interface object and the second user interface object and is displayed at a respective location anchored to the location of the first hand (e.g., the location of the palm of the first hand, the second hand, or any of the user's hands). In some embodiments, the computer system detects a second gesture (e.g., an air tap gesture, or a pinch gesture in which two fingers move away from each other and/or two fingers move into a touch state) of a second hand of the user when the second control object is displayed at a respective location in the three-dimensional environment corresponding to the location of the first hand in the physical environment, wherein the second gesture points to the second control object displayed at the respective location in the three-dimensional environment corresponding to the location of the first hand of the user (e.g., the second gesture is detected when a gaze input is detected at the second control object displayed at the location corresponding to the location of the first hand, when the second gesture is performed by the second hand at or near a portion of the first hand corresponding to the display location of the second control object, and/or when the second control object has an input focus). In some implementations, in response to detecting the second gesture directed to the second control object, the computer system stops displaying the second user interface object at a location that is independent of a position of the first hand of the user in the physical environment. For example, in some embodiments, the computer system stops displaying the second user interface object in the three-dimensional environment. In some embodiments, the computer system moves the second user interface object back to a position having a second spatial relationship to the first anchor position corresponding to the position of the first hand of the user. For example, the second user interface object is moved back to a position anchored to the position of the first hand of the user and moved according to the lateral movement of the first hand, but the orientation of the second user interface object is maintained independent of the rotational movement of the first hand. For example, as described with reference to fig. 7C-7D, after a user sends a user interface object (e.g., user interface object 7018 ') into the world, a control is displayed in a representation of the user's hand so that the user can switch (e.g., by using an air gesture, tapping input, pinching input, or other input selection buttons) to display the object in a three-dimensional environment. Providing a button or another user interface control that allows a user to seamlessly minimize (e.g., stop displaying) and/or redisplay a user interface object anchored to a three-dimensional environment that is accessible by (e.g., tied to) a user's hand provides additional controls for the user that are displayed in the user's hand without requiring the user to navigate through a complex menu hierarchy, thereby providing additional control options to the user without cluttering the user's view with additional displayed controls, and providing improved visual feedback to the user without requiring additional user input.

In some implementations, the computer system detects, via the one or more input devices, a second movement of the first hand in the physical environment when the second user interface object is displayed in a respective view of the three-dimensional environment (e.g., contemporaneously with the first user interface object in a first view of the three-dimensional environment, or when the first user interface object is not displayed in a respective view of the three-dimensional environment) (e.g., when the second user interface object is displayed at a location having a second spatial relationship to the first anchor location, or when the second user interface object is displayed at a respective location in the three-dimensional environment that is independent of a location of the first hand of the user), wherein the second movement of the first hand causes the representation of the first hand to move out of the respective view of the three-dimensional environment. In some embodiments, in accordance with a determination that the second movement of the first hand has caused the representation of the first hand to move out of the respective view of the three-dimensional environment, and in accordance with a determination that the second user interface object is currently displayed in its characteristic orientation having a second spatial relationship to the first anchor orientation, the computer system ceases to display the second user interface object in the respective view of the three-dimensional environment. In accordance with a determination that the second movement of the first hand has caused the representation of the first hand to move out of the respective view of the three-dimensional environment, and in accordance with a determination that the second user interface object is currently displayed at a location that is independent of the position of the first hand, the computer system maintains a display of the second user interface object in the respective view of the three-dimensional environment (e.g., independent of whether the representation of the first hand is within the respective view of the three-dimensional environment). For example, as described above with reference to fig. 7C-7D, after the user has placed the user interface object in a three-dimensional environment, the object remains displayed even when the user's hand moves out of the current view. Automatically maintaining the display of certain user interface objects as the user's hand moves out of view while not maintaining the display of other user interface objects as the user's hand moves out of view, depending on whether the respective user interface object is anchored to the user's hand, provides real-time visual feedback as the user moves the user's hand out of view, and automatically removes certain user interface objects bound to the user's hand, thereby providing improved visual feedback to the user without further user input when a set of conditions has been met.

In some embodiments, when displaying a second user interface object in a first view of the three-dimensional environment (e.g., with their respective property positions having a first spatial relationship and a second spatial relationship corresponding to a position of a first hand of a user displayed simultaneously), the computer system detects a third movement of the first hand in the physical environment via the one or more input devices, wherein the third movement of the first hand corresponds to a movement of the first anchor position toward or away from a point of view corresponding to the first view of the three-dimensional environment. In some embodiments, the positioning of the viewpoint corresponds to a virtual positioning of the user's face or eyes in a three-dimensional environment. In some embodiments, the positioning of the viewpoint corresponds to virtual positioning of one or more cameras or display generating components in a three-dimensional environment. In some embodiments, the positioning of the viewpoint for the respective view of the three-dimensional environment is a virtual positioning from which the respective view of the three-dimensional environment will be visible to a virtual viewer. In response to detecting a third movement of the first hand in the physical environment, the computer system changes a size of the second user interface object in accordance with the movement of the first anchor location toward or away from the point of view while maintaining a second spatial relationship between the respective characteristic locations of the second user interface object and the first anchor location in the three-dimensional environment (e.g., the second user interface object is enlarged or reduced in size while following the first hand moving toward or away from the user's face). In some implementations, the change in size of the second user interface object caused by the movement of the first hand is greater than the effect of the changed viewing distance of the second user interface object. For example, according to some embodiments, the angular range of the second user interface object is enlarged or reduced as a result of movement of the second user interface object toward or away from the viewpoint caused by movement of the first hand toward or away from the user's face. This is illustrated in fig. 7B and 7C, for example, where the third movement of the hand corresponds to movement of the first anchor location away from the viewpoint corresponding to the first view of the three-dimensional environment (e.g., as reflected in movement of the representation 7020 'of the user's hand away from the viewpoint, as illustrated in fig. 7B-7C), and in response, the computer system changes the size of the second user interface object according to movement of the first anchor location away from the viewpoint (e.g., the computer system reduces the size of the second user interface object 7018 'in fig. 7C as compared to fig. 7B) while maintaining a second spatial relationship between the respective characteristic locations of the second user interface object and the first anchor locations in the three-dimensional environment (e.g., the second user interface object 7018' maintains the same distance from the representation 7020 'of the user's hand). Automatically updating the size of certain user interface objects as the user's hand moves closer to or farther from the user's field of view to maintain the relative proportion between the user interface objects and the user's hand provides real-time visual feedback as the user moves the user's hand closer to or farther from the user's field of view, thereby providing improved visual feedback to the user.

In some implementations, the size of the second user interface object in the three-dimensional environment is selected based at least in part on the size of the first hand (e.g., when the second user interface object is displayed at a respective property location having a second spatial relationship with the first anchor location). In some implementations, the size of the first user interface object in the three-dimensional environment is selected based on the size of the first hand (e.g., when the first user interface object is displayed at a respective property location having a first spatial relationship with the first anchor location). In some implementations, the computer system dynamically resizes the first user interface object and/or the second user interface object according to a spatial extent of the first hand in the physical environment (e.g., when the first hand is opened or closed), as described above with reference to fig. 7D. Automatically displaying certain user interface objects at a size determined based on the current hand positioning of the user (such that the user interface objects appear in a predefined proportion relative to the user's hand) provides real-time visual feedback when the user moves the user's hand to open or close (e.g., this enlarges or reduces the size of the user's hand), thereby providing improved visual feedback to the user.

In some implementations, the orientation of the first user interface object (e.g., as well as controls and/or icons displayed in the first user interface object) is selected based on a characteristic orientation of the first hand (e.g., a hand orientation when the first user interface is initially displayed in a three-dimensional environment, or an average hand orientation for a period of time prior to the initial display of the first user interface object). For example, in some embodiments, the direction in which the fingers of the first hand extend away from the palm of the first hand is defined as an upright orientation of the first user interface object that is displayed as a portion of the representation that overlays (e.g., or replaces) the palm of the first hand. In some embodiments, the direction in which the thumb of the first hand extends away from the palm of the first hand is defined as an upright orientation of the first user interface object. In some embodiments, the first user interface object appears to be overlaid on or attached to a surface of the representation of the first hand (e.g., a surface of the palm or a surface of the back of the hand) in a view of the three-dimensional environment. In some embodiments, the first user interface object appears to be positioned perpendicular to a surface of the representation of the first hand (e.g., perpendicular to a surface of the palm or a surface of the back of the hand). For example, this is illustrated in fig. 7B-7D, where the orientation first user interface object (e.g., first user interface object 7016 ') is selected based on the characteristic orientation of the first hand (e.g., as reflected by the various rotational states of representation 7020' of the user's hand in fig. 7B-7D). Automatically displaying certain user interface objects as the user's hand moves with an orientation updated based on the orientation of the user's hand provides real-time visual feedback as the user moves the user's hand, thereby providing improved visual feedback to the user.

In some embodiments, the first user interface object is a control panel that includes a plurality of user interface objects corresponding to different device control functions (e.g., volume control, display brightness control, network connection control, and/or media player control) of the computer system (e.g., and/or input devices, output devices, network equipment, peripheral devices, and/or other devices in communication with the computer system). In some embodiments, the first user interface object comprising the control panel is displayed as a representation overlaying or replacing the back of the first hand in the first view of the three-dimensional environment. In some embodiments, when the representation of the palm of the first hand is displayed in the first view of the three-dimensional environment, the computer system stops displaying the control panel and displays another user interface object (e.g., a launch pad, dock, and/or home button) that overlays the representation of the palm of the first hand or replaces the display of the representation. For example, as described with reference to fig. 7B-7D, a second set of controls (e.g., a control center including a plurality of selectable user interface elements for controlling various device functions) is anchored to the back of the user's hand (e.g., while another set of controls is anchored to the user's palm). Buttons or other user interface controls that provide system functionality for controlling the device that is easily accessible by and tied to the user's hand provide the user with additional controls that are easily accessible by the user moving the user's hand into view without requiring the user to navigate through a complex menu hierarchy, thereby providing additional control options to the user without cluttering the user's view with additional displayed controls, and providing improved visual feedback to the user without requiring additional user input.

In some embodiments, the first user interface object is a dock that includes a plurality of user interface objects corresponding to different applications or experiences (e.g., an application icon, an icon that when activated causes display of a corresponding computer-generated real-world experience, and/or an icon that when activated causes initiation of a communication session with a user), wherein respective user interface objects in the dock when activated cause the computer system to initiate display of the respective application or computer-generated real-world experience. In some embodiments, the first user interface object comprising a dock is displayed as a representation of a palm overlaying a first hand in a first view of the three-dimensional environment or as a display replacing the representation. In some embodiments, when displaying the representation of the back of the first hand in the first view of the three-dimensional environment, the computer system stops displaying the dock and displays another user interface object (e.g., a control panel and/or media player control) that overlays or replaces the display of the representation of the back of the first hand, as described above with reference to fig. 7B-7D. In some embodiments, the placement of the control panel and dock is reversed (e.g., the control panel is displayed when the palm of the first hand is displayed and the dock is displayed when the back of the first hand is displayed). Providing different sets of user interface controls (one set being made available on the front of the user's hand (e.g., in the user's palm) and the other set being made available on the back of the user's hand) provides additional controls for the user so that the user can easily access desired controls by flipping the user's hand to display either the front or back of the user's hand to switch the displayed set of controls without requiring the user to navigate through a complex menu hierarchy, thereby providing additional control options to the user without cluttering the user's view with additional displayed controls, and providing improved visual feedback to the user without requiring additional user input.

In some implementations, when a first user interface object (and optionally a second user interface object) is displayed in a first view of the three-dimensional environment (e.g., displayed with their respective property positions having a first spatial relationship and a second spatial relationship corresponding to the position of a first hand of a user), the computer system detects a second movement of the first hand in the physical environment via the one or more input devices. In some embodiments, the second movement of the first hand causes a first side of the first hand (e.g., a side of the first hand that is currently visible in the first view and/or a side of the first hand that faces the user) to rotate away from a viewpoint corresponding to the first view of the three-dimensional environment. In response to detecting the second movement of the first hand in the physical environment and in accordance with a determination that the visibility of the first side of the first hand in the first view of the three-dimensional environment is below a threshold amount of visibility (e.g., the less than threshold portion of the first side of the first hand remains in the first view and/or the angle of the first side of the first hand is outside of the range of angles relative to the point of view), the computer system displays a third user interface object different from the first user interface object and the second user interface object, the third user interface object overlaying a portion of the second side of the first hand in the three-dimensional environment or replacing a display of the portion (and, optionally, the computer system stops displaying the first user interface object in the three-dimensional environment). In some embodiments, the first user interface object fades out gradually as the first side of the first hand rotates away from the viewpoint and the second side of the first hand rotates toward the viewpoint. In some embodiments, in response to detecting a second movement of the first hand in the physical environment and in accordance with a determination that the visibility of the first side of the first hand in the first view of the three-dimensional environment is above a threshold amount of visibility (e.g., the greater than threshold portion of the first side of the first hand remains in the first view and/or the angle of the first side of the first hand is within the angular range relative to the point of view), the computer system discards the display of the third user interface object in the three-dimensional environment (and optionally rotates the first user interface object in accordance with the rotation of the first side of the first hand (e.g., such that a first spatial relationship between the respective characteristic positioning of the first user interface object and the first anchor positioning remains in the three-dimensional environment) (e.g., the first user interface object rotates as the representation of the first side of the first hand moves away from the point of view.) for example, as described with reference to fig. 7C, the first user interface object 7016' is visually impaired in importance (e.g., blurred, and/or faded out) as the user's hand rotates, or the user interface is automatically rotated in accordance with the amount of the user's or the user's hand (e.g., the user's visual feedback is provided to the user's visual feedback is automatically changed in accordance with the amount of the rotation of the particular user's hand input).

In some implementations, when at least one of the first user interface object and the second user interface object is displayed in a first view of the three-dimensional environment (e.g., displayed with their respective property positions having a first spatial relationship and a second spatial relationship corresponding to a position of the first hand of the user), the computer system detects a third movement of the first hand in the physical environment via the one or more input devices, wherein the third movement of the first hand reduces a spatial extent (e.g., due to rotation/tilt of the first hand and/or closure of the first hand) of the representation of the first side of the first hand (e.g., a side of the first hand currently visible in the first view and/or a side of the first hand facing the user) in the first view of the three-dimensional environment. In some implementations, in response to detecting a third movement of the first hand in the physical environment, the computer system reduces a degree of visual prominence of at least one of the first user interface object and the second user interface object in a first view of the three-dimensional environment (e.g., by blurring, darkening, fading, becoming more translucent, and/or shrinking at least one of the first user interface object and the second user interface object, wherein a magnitude of the change in the degree of visual prominence is based on a magnitude of the third movement of the first hand). In some implementations, only one of the first user interface object and the second user interface object (e.g., the first user interface object instead of the second user interface object, or the second user interface object instead of the first user interface object) experiences the above-described reduction in visibility. In some embodiments, both the first user interface object and the second user interface object experience the above-described reduction in visibility. For example, as described with reference to fig. 7C, the first user interface object 7016 'is visually impaired in importance (e.g., blurred, faded-in, and/or shrunk) when the user's hand is tilted or closed. Automatically displaying a particular user interface object with a different visual effect (such as fading the object in and out) according to an orientation or positioning change of the user's hand provides real-time visual feedback as the user tilts the user's hand or performs other movements with the user's hand, thereby providing improved visual feedback to the user.

In some implementations, when at least one of the first user interface object and the second user interface object (e.g., the first user interface object but not the second user interface object, or the second user interface object but not the first user interface object, the first user interface object and the second user interface object) is displayed in a first view of the three-dimensional environment (e.g., displayed with their respective property positions having a first spatial relationship and a second spatial relationship corresponding to a position of a first hand of a user), the computer system detects a fourth movement of the first hand in the physical environment via the one or more input devices. In some implementations, in response to detecting the fourth movement of the first hand in the physical environment, in accordance with determining that the fourth movement of the first hand causes the representation of the first hand to leave the first view of the three-dimensional environment (e.g., due to the first hand being translated out of the field of view of the camera), and determining that a viewpoint of the first view of the three-dimensional environment that the first side of the first hand (e.g., palm side or back side of the hand) faces when the representation of the first hand leaves the first view of the three-dimensional environment, the computer system remains displaying at least one of the first user interface object and the second user interface object in the first view of the three-dimensional environment (e.g., at an edge of the first view and/or at a default location in the first view). In some implementations, in accordance with determining that the fourth movement of the first hand causes the representation of the first hand to leave the first view of the three-dimensional environment (e.g., due to the first hand being translated out of the field of view of the camera), and determining that the first side of the first hand (e.g., palm side or back side of the hand) does not have a view point facing the first view of the three-dimensional environment when the representation of the first hand leaves the first view of the three-dimensional environment (e.g., due to the rotation or closing of the first hand), the computer system stops display of at least one of the first user interface object and the second user interface object in the first view of the three-dimensional environment. In some implementations, the computer system stops displaying at least one of the first user interface object and the second user interface object in accordance with determining that the first side of the first hand (e.g., the palm side or back side of the hand) (e.g., due to rotation or closing of the first hand) is no longer facing the viewpoint of the first view of the three-dimensional environment without requiring the representation of the first hand to leave the first view of the three-dimensional environment, as described with reference to fig. 7B-7D. Automatically maintaining the display of certain user interface objects as the user's hand moves out of view based on the hand facing (e.g., turning to) the first direction and not maintaining the display of other user interface objects as the user's hand faces the second direction provides real-time visual feedback as the user moves the user's hand to face a different direction and out of view, and automatically updating the user's current view based on the positioning of the user's hand, thereby automatically providing improved visual feedback to the user without further user input when a set of conditions has been met.

In some embodiments, the first hand is a non-dominant hand associated with the user, and in accordance with a determination that the first hand is a non-dominant hand associated with the user, the computer system displays the first user interface object and the second user interface object in a first spatial relationship and a second spatial relationship corresponding to a first anchor location in the three-dimensional environment corresponding to a position of the first hand, while both the first hand and the second hand of the user are visible in a first view of the three-dimensional environment. For example, in some embodiments, the computer system registers or detects that the first hand is a non-dominant hand of the user, and places one or both of the first user interface object and the second user interface object relative to the representation of the first hand, even though the representation of the two hands of the user (e.g., at the moment) may be within the first view of the three-dimensional environment. In some implementations, the computer system defaults to positioning one of the first user interface object and the second user interface object (e.g., the first user interface object but not the second user interface object, or the second user interface object but not the first user interface object) relative to positioning of the representation of the non-inertial hand in the three-dimensional environment, and does not impose such limitations on the other of the first user interface object and the second user interface object. In some embodiments, the non-dominant hand is automatically selected by the computer system based on the monitored use of the hand. In some embodiments, the non-dominant hand is set by the user (e.g., in a setup interface), or selected during initial setup of the computer system, as described above with reference to fig. 7B. Automatically determining the user's non-dominant hand and assigning the user interface object as bound to the user's non-dominant hand provides the user with real-time visual feedback that automatically displays the user interface object when the user's non-dominant hand is detected to be within the user's current view, thereby performing an operation when a set of conditions has been met without further user input.

In some implementations, the computer system moves the second user interface object from an initial position to a respective property position of the second user interface object in a first view of the three-dimensional environment before displaying the second user interface object in a respective property position of the second user interface object having a second spatial relationship to the first anchor position in the three-dimensional environment, wherein the initial position of the second user interface object is closer to the first anchor position in the three-dimensional environment than the respective property position of the second user interface object having a second spatial relationship to the first anchor position. For example, as described with reference to fig. 7C-7D, in some embodiments, the computer system shows the second user interface object popping up from the representation of the first hand and moving to a position having a second spatial relationship with the first anchor position on the representation of the first hand. Automatically updating a display of the user's hand to display user interface objects appearing from the user's hand as the user's hand moves within the three-dimensional environment before the user interface objects follow a representation of the user's hand provides real-time visual feedback as the user moves the user's hand closer to or farther from the user's field of view, thereby providing improved visual feedback to the user.

In some implementations, when at least one of the first user interface object and the second user interface object (e.g., the first user interface object but not the second user interface object, the second user interface object but not the first user interface object, or the first user interface object and the second user interface object) is displayed in a first view of the three-dimensional environment (e.g., displayed with their respective property positions having a first spatial relationship and a second spatial relationship corresponding to a position of a first hand of a user), the computer system detects input directed to the respective user interface object of the at least one of the first user interface object and the second user interface object via the one or more input devices. In response to detecting an input directed to a respective user interface object of at least one of the first user interface object and the second user interface object, and in accordance with a determination that the input satisfies an activation criterion, the computer system performs an operation corresponding to activation of the respective user interface object of the at least one of the first user interface object and the second user interface object, wherein the activation criterion is capable of being satisfied by: gaze input directed to the respective user interface object in conjunction with a first gesture of a second hand of the user detected remotely from the characteristic location of the respective user interface object, or a second gesture detected at a location corresponding to the characteristic location of the respective user interface object, as described above with reference to fig. 7C-7D. Automatically performing an operation in response to detecting any one of a plurality of possible user inputs (e.g., gestures) provides additional controls for the user without requiring the user to navigate through a complex menu hierarchy and provides the user with flexibility to perform the operation in response to the plurality of gestures, thereby providing the user with improved visual feedback and reducing the amount of input required to perform the operation.

In some implementations, when at least one of the first user interface object and the second user interface object (e.g., the first user interface object but not the second user interface object, or the second user interface object but not the first user interface object, the first user interface object and the second user interface object) is displayed in a first view of the three-dimensional environment (e.g., displayed with their respective property positions having a first spatial relationship and a second spatial relationship corresponding to a position of a first hand of the user), the computer system detects, via the one or more input devices, a first gaze input directed to the respective user interface object of the at least one of the first user interface object and the second user interface object. In some implementations, in response to detecting a first gaze input directed to a respective user interface object of at least one of the first user interface object and the second user interface object, the computer system displays an expanded version of the respective user interface object (e.g., enlarges, shows additional information, and/or displays additional constituent objects of the respective user interface object), as described above with reference to fig. 7C-7D. Displaying an extended view of the user interface object in response to the user gazing at the user interface object without requiring additional user input (such that the extended user interface object includes additional controls) provides real-time visual feedback to the user as the user gazes at various objects and provides additional controls to the user without requiring the user to navigate through a complex menu hierarchy, thereby providing improved visual feedback to the user and reducing the amount of input required to perform the operation.

It should be understood that the particular order in which the operations in fig. 8 are described is merely an example and is not intended to suggest that the order is the only order in which the operations may be performed. Those of ordinary skill in the art will recognize a variety of ways to reorder the operations described herein. In addition, it should be noted that the details of other processes described herein with respect to other methods described herein (e.g., methods 9000 and 10000) apply in a similar manner to method 8000 described above with respect to fig. 8 as well. For example, the gesture, gaze input, physical object, user interface object, control, movement, criteria, three-dimensional environment, display generation component, surface, representation of physical object, virtual object, and/or animation described above with reference to method 8000 optionally has one or more of the characteristics of gesture, annotation input, physical object, user interface object, control, movement, criteria, three-dimensional environment, display generation component, surface, representation of physical object, virtual object, and/or animation described herein with reference to other methods described herein (e.g., methods 9000 and 10000). For the sake of brevity, these details are not repeated here.

Fig. 9 is a flow diagram of a method 9000 for modifying an appearance of a shared view of a communication session (e.g., a shared three-dimensional environment) between multiple users, according to some embodiments.

In some embodiments, method 9000 is performed at a first computer system (e.g., computer system 101 in fig. 1, the computer systems described with respect to fig. 7A-7D, 7E-7F, and 7G-7J). In some embodiments, the first computer system communicates with a first display generating component (e.g., display generating component 7100, another display generating component, a heads-up display, a head-mounted display (HMD), a display, a touch screen, and/or a projector) and one or more input devices (e.g., a camera or other sensor and input device that detects movement of a user's hand in a physical environment, movement of the user's entire body in a physical environment, and/or movement of the user's head in a physical environment; e.g., a controller, a touch-sensitive surface, a joystick, buttons, gloves, watches, motion sensors, and/or orientation sensors). In some embodiments, the first display generating component is the display generating component 7100 described with respect to fig. 7A-7D, 7E-7F, and 7G-7J. In some embodiments, the first display generating means is a heads-up display that does not move or rotate with the user's head or the entire body of the user, but optionally changes the user's point of view into a three-dimensional environment according to the movement of the user's head or body relative to the first display generating means. In some embodiments, the first display generating component is optionally moved and rotated by the user's hand relative to the physical environment or relative to the user's head, and the viewpoint of the user is changed into the three-dimensional environment according to the movement of the first display generating component relative to the user's head or face or relative to the physical environment. According to some embodiments, many of the features of method 9000 are described with respect to fig. 7E to 7F.

The method 9000 disclosed herein relates to displaying selectable controls that allow a user to adjust various applications and other settings related to other users currently interacting with the user (e.g., in a communication session). Displaying selectable options for activating the application and receiving activation inputs selecting and/or pointing to the selectable options (including the option to modify the shared three-dimensional environment during the communication session) provides real-time visual feedback when the user selects various options, real-time visual feedback when the user activates various options to modify the shared three-dimensional environment, and additional controls to the user while the user is participating in the communication session without requiring the user to browse a complex menu hierarchy. Providing improved visual feedback to the user enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user provide proper input and reducing user error in operating/interacting with the system), which in turn reduces power usage and extends battery life of the system by enabling the user to use the system more quickly and efficiently.

In method 9000, the first computer system displays (9002), via the first display generating component, a view of a communication session (e.g., the first user and the second user participating in a meeting and/or game in a shared three-dimensional environment) between a first user of the first display generating component and a second user of a second display generating component different from the first display generating component, wherein the view of the communication session comprises a view of the three-dimensional environment comprising at least some virtual content shared between the first user and the second user. Displaying the view of the three-dimensional environment of the communication session includes displaying a respective representation of the second user in the view of the three-dimensional environment, and determining the respective representation of the second user based on a virtual spatial relationship between the first user and the second user in the three-dimensional environment.

When displaying a view of the communication session, the first computer system displays (9004) a user interface for controlling the communication session via the first display generation component (e.g., within the view of the communication session, within the view of the three-dimensional environment, or alongside the view of the three-dimensional environment). In some embodiments, a user interface for controlling a communication session includes a first control object that, when activated by a first user (e.g., by a gesture (e.g., an indirect gesture) combined with a gaze input directed to the first control object, or by a gesture (e.g., a direct gesture) detected at a location corresponding to a location of the first control object), causes the first computer system to perform a respective operation (e.g., changing virtual wallpaper of the three-dimensional environment, changing virtual lighting in the three-dimensional environment, adding, removing, and/or changing virtual scenery, virtual lighting, and/or virtual wallpaper in the three-dimensional environment) that modifies an appearance of a three-dimensional area of the three-dimensional environment.

When a view of the three-dimensional environment is displayed, the first computer system detects (9006) a first user input activating a first control object.

In response to detecting (9008) the first user input to activate the first control object, the first computer system modifies (9010) an appearance of a three-dimensional region of the three-dimensional environment (e.g., changes a virtual enhancement of a representation of a physical environment in a view of the three-dimensional environment) for the first user of the first display generation component (e.g., changes a virtual wallpaper of the three-dimensional environment, changes virtual lighting in the three-dimensional environment, adds, removes, or changes a virtual landscape, virtual lighting, and/or virtual wallpaper in the three-dimensional environment).

The first computer system initiates (9012) a process for (e.g., causing) modifying, for a second user of the second display generating component, an appearance of a three-dimensional region of the three-dimensional environment displayed at the second display generating component. For example, in some embodiments, the same modifications applied to the appearance of the three-dimensional region of the three-dimensional environment (e.g., to the virtual wallpaper, virtual lighting, and/or change in virtual scenery of the three-dimensional region shown to the first user via the first display generating component) are also applied to the three-dimensional region shown to the second user via the second display generating component. In some embodiments, the modification applied to the appearance of the three-dimensional area shown to the second user via the second display generating component is determined according to (e.g., a correspondence (e.g., a timing offset (e.g., a difference in time of day or season between the locations of the first and second users), a direction offset (e.g., a facing direction that faces in an opposite direction, or is selected based on the current facing directions of the first and second users), a spatial position offset (e.g., separated according to the virtual positions of the first and second users), or other attribute of offset between the first and second users)) applied to the modification of the appearance of the three-dimensional area of the three-dimensional environment of the three-dimensional area shown to the first user via the first display generating component (e.g., a change in virtual wallpaper, virtual lighting, and/or virtual scenery). This is illustrated in fig. 7E and 7F, for example, where the first computer system displays a user interface (e.g., user interface 7206 ') for controlling the communication session, including a first control object (e.g., affordance 7206-1'). When the first control object is activated, the first computer system modifies the appearance of the three-dimensional region of the three-dimensional environment for the first user of the first display generating component (e.g., the computer system modifies the virtual floor 7008-t' as shown on the display generating component 7100). The first computer system also initiates a process for modifying, for a second user of the second display generating component, the appearance of the three-dimensional region of the three-dimensional environment displayed at the second display generating component (e.g., modifying the virtual floor 7008-t' also for the display generating component 7102).

In some embodiments, when displaying a view of the communication session, the first computer system displays (e.g., within the view of the communication session, within the view of the three-dimensional environment, or beside the view of the three-dimensional environment) a respective user interface of one or more applications shared in the communication session (e.g., currently accessible (e.g., viewable, visible, and/or controllable by the participants of the communication session). In some embodiments, when a respective application is shared by one participant of a communication session, a user interface of the respective application is displayed for multiple participants of the communication session in a view of the communication session. In some embodiments, respective user interfaces of one or more shared applications are displayed along with respective indications corresponding to identities of users of the shared applications. For example, an avatar or color scheme of a first user is displayed on a user interface of a first application shared by the first user and/or an avatar or color scheme of a second user is displayed on a user interface of a second application shared by the second user. In some embodiments, an application shared in a communication session accepts user input provided by multiple participants (e.g., a first user and a second user) of the communication session via respective input devices of the multiple participants. In some implementations, the application user interface is visible to multiple participants, but accepts user input from only one of the participants (e.g., the user sharing the application or another user having current control of the application). For example, as described with reference to fig. 7E-7F, the computer system optionally displays an indication of one or more active applications (e.g., applications shared in an ongoing communication session). The automatic display of an active application currently being used or shared via a communication session with another user and an indication of the user sharing the respective application provides real-time visual feedback as the shared active application is updated in real-time by the user or another user, thereby providing improved visual feedback to the user.

In some embodiments, when a view of the communication session is displayed, the first computer system (e.g., within the view of the communication session, within the view of the three-dimensional environment, beside the view of the three-dimensional environment, or within a user interface for controlling the communication session) displays a second control object that, when activated (e.g., by a gesture combined with a gaze input directed to the second control object, or by a gesture detected at a location corresponding to the location of the second control object), causes the first computer system to perform a respective operation that ceases sharing the first application currently shared in the communication session. In some embodiments, the first computer system detects user input from the first user activating the second control object when displaying the view of the three-dimensional environment. In some implementations, in response to detecting user input that activates the second control object, the first computer system initiates a process for (e.g., causing to cease displaying a user interface of the first application at the second display generating component (e.g., in accordance with a determination that the first user is a user that initially shared the first application in the communication session). In some embodiments, in response to detecting the user input, the first computer system optionally closes the first application at the first computer system, stops displaying the first application in a view of the three-dimensional environment at the first display generating component, or moves the user interface of the first application to another three-dimensional region of the three-dimensional environment that is not shared with other participants of the communication session (e.g., a region that is private to the first user, or a region that is private to a smaller subset of the participants identified by the first user). In some embodiments, the first computer system displays the second control object in a user interface for controlling the communication session in response to a first user's request to share the first application in the communication session and in conjunction with sharing the first application in the communication session. In some embodiments, the first computer system stops displaying the second control object in the user interface for controlling the communication session in response to the first user's request to stop sharing the first application in the communication session (e.g., by activating the second control object) and in conjunction with performing the operation of stopping sharing the first application in the communication session. For example, as described with reference to fig. 7E-7F, the computer system displays an option to remove (e.g., close) the shared application. Buttons or other user interface controls are provided that allow a user to remove a display of a shared application while engaged in a communication session to provide additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby providing improved visual feedback to the user without requiring additional user input.

In some embodiments, when displaying the view of the communication session, the first computer system displays (e.g., within the view of the communication session, within the view of the three-dimensional environment, beside the view of the three-dimensional environment, or within a user interface for controlling the communication session) a respective identifier (e.g., a virtual wallpaper, a color scheme, an icon and/or a textual description of an environment, theme, and/or virtual scenery) corresponding to the currently selected virtual augmentation applied to the three-dimensional environment. In some embodiments, the first computer system allows the first user to customize the virtual augmentation applied to the three-dimensional environment shown at the first display generating component and/or shared among the plurality of participants of the communication session. In some embodiments, the virtual augmentation includes various virtual elements and/or visual effects applied to a view of a physical environment in a three-dimensional environment or a view of a virtual three-dimensional environment of a communication session. In some embodiments, the virtual augmentation includes virtual objects, surfaces, coverings, lighting, and/or scenery designed to give the three-dimensional environment a look and feel of a selected theme, natural environment, and/or physical scene. In some embodiments, the respective identifier corresponding to the currently selected virtual enhancement is displayed with a selection indicator (e.g., a check mark in a check box, or a selector that visually distinguishes the respective identifier from other non-selected virtual enhancement identifiers). In some embodiments, selecting the respective identifier by user input causes the first computer system to display the respective identifier for the currently selected virtual enhancement along a list of identifiers corresponding to different virtual enhancements available for selection and application to the three-dimensional environment. In some embodiments, the virtual enhanced multiple sets are applied simultaneously to the three-dimensional environment or a view of the three-dimensional environment shown at the first display generating component, and respective identifiers of the virtual enhanced multiple sets are visually indicated in the view of the communication session (e.g., in a user interface for controlling the communication session and/or overlaying the view of the three-dimensional environment). In some embodiments, each set of virtual enhancements is optionally displayed along with a corresponding sound effect (e.g., sound of nature, waves, wind, insects, animals, traffic, or music) corresponding to the virtual enhanced visual element. This is shown in FIG. 7F, for example, where the first computer system displays a corresponding identifier corresponding to the currently selected virtual enhancement applied to the three-dimensional environment (e.g., an outline display that is affordance 2016-1' to indicate that it is the currently selected virtual enhancement). The automatic display of an indication of attributes describing the currently active virtual environment provides real-time visual feedback as the user views the different virtual environments, thereby providing improved visual feedback to the user.

In some embodiments, when displaying the view of the communication session, the first computer system displays a third control object (e.g., within the view of the communication session, within the view of the three-dimensional environment, or beside the view of the three-dimensional environment, within the user interface for controlling the communication session) that, when activated, causes the first computer system to remove a first set of currently selected virtual enhancements (e.g., a set of virtual wallpaper, color scheme, environment, virtual decals corresponding to the respective theme, and/or virtual scenery) applied to the three-dimensional environment. In some embodiments, the first computer system allows the first user to customize the virtual augmentation applied to a three-dimensional environment shown at the first display generating component or shared among a plurality of participants of the communication session. In some embodiments, the virtual augmentation includes various virtual elements and/or visual effects applied to a view of a physical environment in a three-dimensional environment or a view of a virtual three-dimensional environment of a communication session. In some embodiments, the virtual augmentation includes virtual objects, surfaces, coverings, lighting, and/or scenery designed to give the three-dimensional environment a look and feel of a selected theme, natural environment, and/or physical scene. In some embodiments, the first computer system displays the respective identifier corresponding to the currently selected set of virtual enhancements with the respective selection indicator (e.g., a check mark in a check box, or a selector that visually distinguishes the respective identifier from other non-selected virtual enhanced identifiers), and the selection indicator may be used as a control object to individually remove the corresponding set of virtual enhancements. In some embodiments, the first computer system provides a control object that, when activated, removes from the three-dimensional environment all or a subset of the currently applied virtual enhancements. For example, as described with reference to fig. 7E, a user is enabled to change the immersive experience, including removing the immersive experience from the current view (e.g., or reducing the level of immersion). Providing buttons or other user interface controls that allow the user to remove the display of the currently active virtual environment provides additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby providing improved visual feedback to the user without requiring additional user input.

In some embodiments, when a view of the communication session is displayed, the first computer system (e.g., within the view of the communication session, within the view of the three-dimensional environment, or beside the view of the three-dimensional environment, within a user interface for controlling the communication session) displays a fourth control object that, when activated, causes the first computer system to perform respective operations that share a respective set of virtual enhancements available for application to the three-dimensional environment (e.g., virtual wallpaper, color scheme, environment, set of virtual decals and/or virtual scenery corresponding to the respective theme) with at least one other participant of the communication session (e.g., the second user and/or a third user using the third display generating component). In some embodiments, after at least one other participant (e.g., less than all of the subset, or all of the other participants) accepts the sharing of the respective set of virtual enhancements, the first computer system is responsive to selection of the respective set of virtual enhancements by the first user to modify an appearance of the three-dimensional environment, the appearance of the view of the three-dimensional environment shown at the first display generating component is modified in accordance with the respective set of virtual enhancements, and the first computer system initiates a process that causes the computer system of the at least one other participant to modify the view of the three-dimensional environment displayed to the at least one other participant in accordance with the respective set of virtual enhancements. For example, as described with reference to fig. 7E, the user is enabled to change the virtual scenery of the shared three-dimensional environment 7207'. Buttons or other user interface controls are provided that allow a user to share a currently active virtual environment with one or more other users in a communication session with the user, providing additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby providing improved visual feedback to the user without requiring additional user input.

In some embodiments, when displaying the view of the communication session, the first computer system displays (e.g., within the view of the communication session, within the view of the three-dimensional environment, beside the view of the three-dimensional environment, or within a user interface for controlling the communication session) an indicator of a current immersion level at which the view of the communication session is displayed, wherein the current immersion level is selected from a first immersion level and a second immersion level, and wherein the second immersion level comprises a reduced transmissibility of the physical environment in the view of the communication session (e.g., a reduced extent or range of content from the physical environment transmissible to the three-dimensional environment). In some implementations, the first immersion level is an augmented reality view of a physical environment surrounding the first user, and the second immersion level is a virtual reality view without a representation of the physical environment. In some embodiments, the first immersion level and the second immersion level are both augmented reality views of the physical environment, but the view of the communication session displayed at the second immersion level includes more virtual content modifying the view of the physical environment than the view of the communication session displayed at the first immersion level. In some embodiments, the first immersion level has more audio passthrough of sound from the physical environment, and the second immersion level has less audio passthrough of sound from the physical environment. In some embodiments, the first computer system displays a respective control object (e.g., a button, selector, switch, or slider control) that discretely switches between different immersion levels for the view of the communication session or gradually adjusts the immersion level of the view for the communication session according to user input activating the respective control object, as described above with reference to fig. 7E. Automatically displaying an indication describing a current immersion level of the virtual environment and providing the user with control options to change the current immersion level provides real-time visual feedback as the user changes the throughput for the current virtual session, thereby providing improved visual feedback to the user.

In some embodiments, when displaying the view of the communication session, the first computer system displays a respective representation of one or more participants of the communication session in the control area (e.g., displays a list of avatars of users participating in the communication session in the control area of the communication session), within the view of the communication session, within the view of the three-dimensional environment, beside the view of the three-dimensional environment, or within a user interface for controlling the communication session. In some embodiments, the respective representations of the one or more participants shown in the control region are different from the respective representations of the one or more participants shown in the view of the three-dimensional environment (e.g., the respective representations of the second user displayed in the view of the three-dimensional environment according to the virtual spatial relationship between the first user and the second user in the three-dimensional environment). In some embodiments, as more users join and/or leave the communication session, the first computer system changes the appearance of the respective representations of the users shown in the control region to indicate whether the users are currently in the communication session (e.g., their representations are lit or highlighted) or have left the communication session (e.g., their representations are darkened or grayed out). In some embodiments, the first computer system changes the appearance of the representation of the participants to indicate whether they are currently active in the communication session. For example, if a participant is currently performing an action, moving, and/or speaking, the respective representations of the participants in the control region are highlighted (e.g., lit or flashing), and if the participant ceases to actively participate in the communication session (e.g., stops speaking, mutes, is lost from the camera view, and/or does not participate for a period of time), the respective representations of the participants in the control region are no longer highlighted relative to the respective representations of the other participants in the control region. For example, as described with reference to fig. 7E, a representation of active users in a communication session is displayed (e.g., with a status indicator). The automatic display of an indication describing attributes of a currently active user in the communication session provides real-time visual feedback as the user's liveness in the communication session changes, thereby providing improved visual feedback to the user.

In some embodiments, when displaying a view of the communication session, the first computer system displays a first selectable option in association with a visual indication of a respective participant in the control region, wherein the first selectable option removes the respective participant from the communication session when activated. In some embodiments, selectable options are displayed in association with respective participants in the control region in response to selection of the respective representations of the participants. In some embodiments, the selectable option is displayed when the participant is active in the communication session (e.g., in response to selection of a corresponding representation of the participant) and is not displayed when the participant is inactive in the communication session. In some embodiments, the first computer system detects a user input activating the first selectable option in association with the second user, and in response to the user input, the first computer system removes the second user from the communication session, stops displaying a corresponding representation of the second user in the view of the three-dimensional environment and stops displaying the second user as active in the communication session, and causes the representation of the second user to be removed from the view of the three-dimensional environment shown to the other participants of the communication session. For example, as shown in fig. 7E, a control 7204' for controlling settings of a user (and other participants) in a communication session is displayed. Providing buttons or other user interface controls that allow a user to remove one or more currently active users that are participating in a communication session provides additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby providing improved visual feedback to the user without requiring additional user input.

In some embodiments, when displaying the view of the communication session, the first computer system displays a second selectable option in association with a visual indication of the respective participant in the control region, wherein the second selectable option mutes audio input from the respective participant in the communication session when activated. In some embodiments, selectable options are displayed in association with respective participants in the control region in response to selection of the respective representations of the participants. In some embodiments, the selectable option is displayed when the participant is active in the communication session or generating an audio output (e.g., in response to selection of a respective representation of the participant), and is not displayed when the participant is inactive in the communication session or does not generate an audio output. In some embodiments, the first computer system detects user input associated with a second user that activates the second selectable option; and responsive to the user input, the first computer system mutes audio input of a second user in the communication session, thereby no longer generating audio output for the first user and other participants of the communication session. In some implementations, activating the second selectable option in association with the second user does not mute the second user for other participants of the communication session, and mutes the second user only for users (e.g., the first user) that activate the second selectable option in association with the second user. For example, as described with reference to fig. 7E, additional controls 7204' include buttons for a user to mute the user and/or mute other users participating in the communication session. Providing a button or other user interface control that allows a user to mute the user and/or one or more other users in a communication session with the user provides additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby providing additional control options to the user without cluttering the user's view with additional displayed controls.

In some embodiments, when displaying the view of the communication session, the first computer system displays a third selectable option in association with a visual indication of the respective participant in the control region, wherein the third selectable option, when activated, causes the respective representation of the respective participant to be removed from the view of the three-dimensional environment. In some embodiments, the respective participant continues to participate in the communication session using his/her computer system and the display generating component, and the respective participant's computer system continues to show a view of the three-dimensional environment including respective representations of other participants in the communication session. In some embodiments, each participant of the communication session is allowed to individually control which other participants are hidden or shown in its own three-dimensional view of the environment. For example, based on the selection of the first user, the first computer system shows the representation of the second user and the representation of the third user in a view of the three-dimensional environment displayed to the first user, and hides the representation of the fourth user in a view of the three-dimensional environment displayed to the first user. Meanwhile, based on the selection of the second user, the second computer system shows the representation of the first user and the representation of the fourth user in a view of the three-dimensional environment displayed to the second user, and hides the representation of the third user in a view of the three-dimensional environment displayed to the second user. Allowing the user to choose which other participants to see in his/her own view of the three-dimensional environment reduces visual clutter and makes the user's experience in the three-dimensional environment more efficient and reduces input errors. In some embodiments, selectable options are displayed in association with respective participants in the control region in response to selection of the respective representations of the participants. In some embodiments, the selectable option is displayed when the participant is active in the communication session or generating an audio output (e.g., in response to selection of a respective representation of the participant), and is not displayed when the participant is inactive in the communication session or does not generate an audio output. In some embodiments, the first computer system detects a user input activating the third selectable option in association with the second user and, in response to the user input, the first computer system stops displaying the representation of the second user in the view of the three-dimensional environment shown by the first display generating component (e.g., when the second user continues to participate in the communication session and when the view of the representation of the second user is unaffected at the display generating component of the other participants of the communication session). In some embodiments, respective representations of the hidden participants from view of the three-dimensional environment remain displayed in the control region and respective selectable options are provided to the first user to cause the hidden participants to un-hide. For example, as described with reference to FIG. 7E, controls 7204' include controls for hiding the avatar (e.g., of the user and/or of the participant). Buttons or other user interface controls are provided that allow a user to remove an avatar or other visual representation of the user and/or one or more other users with whom the user is in a communication session to provide additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby providing additional control options to the user without cluttering the user's view with additional displayed controls.

In some embodiments, when the view of the communication session is displayed, the first computer system displays a fifth control object (e.g., within the view of the communication session, within the view of the three-dimensional environment, beside the view of the three-dimensional environment, or within a user interface for controlling the communication session) that, when activated, causes the first computer system to display an input area (e.g., a text input area with a virtual keyboard or dictation tool) in the view of the communication session for composing a text message to be displayed in a first three-dimensional area of the three-dimensional environment (e.g., to be displayed in a corresponding view of the three-dimensional area shown to a different participant of the communication session). In some embodiments, after the first computer system detects input from the first user activating the fifth control object, the first computer system displays a text input area for the first user to compose a text message. Upon detection (e.g., in response to detection) of a request by a first user to send a composed message, the first computer system causes the message to be displayed in a three-dimensional region of the three-dimensional environment (e.g., a location at or near the representation of the first user, a message board that has been displayed in the three-dimensional environment, and/or a new message board that has been displayed in the three-dimensional environment in response to input by the first user) such that participants (including the first user, the second user, and optionally other users) of the communication session see the message in respective views of the three-dimensional environment presented to those participants, as described with reference to fig. 7E. Providing buttons or other user interface controls that allow a user to compose a text message to share with other users engaged in a communication session and automatically display the composed message to other users provides additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby providing additional control options to the user without cluttering the user's view with additional displayed controls, and providing improved visual feedback to participants capable of viewing the shared message.

In some embodiments, when the view of the communication session is displayed, the first computer system displays a sixth control object (e.g., within the view of the communication session, within the view of the three-dimensional environment, beside the view of the three-dimensional environment, or within a user interface for controlling the communication session) that, when activated, causes the first computer system to display an input area (e.g., a sketch pad for drawing input and/or handwriting input) in a second three-dimensional area of the three-dimensional environment. In some embodiments, the input region is configured to receive drawing input from a plurality of participants of the communication session and present the drawing input in a second three-dimensional region of the three-dimensional environment (e.g., in the input region and/or in another output region in the three-dimensional environment) (e.g., displayed in a respective view of the three-dimensional region shown to a different participant of the communication session). In some embodiments, after the first computer system detects input from the first user activating the sixth control object, the first computer system displays a shared drawing area for participants of the communication session to cooperatively create the drawing or sketch. In some embodiments, sketch inputs from multiple participants are displayed simultaneously in a shared sketch region in a respective view of a second three-dimensional region shown by a display generation component used by the participants, as described with reference to fig. 7E. Buttons or other user interface controls are provided that allow a user to initiate a collaborative drawing session with one or more other participants in the user's communication session and automatically update a shared drawing pad in real-time with drawings entered by the participants in the communication session to provide additional controls for the user without requiring the user to navigate through a complex menu hierarchy and to provide real-time visual feedback to the user, thereby providing additional control options to the user and improved visual feedback to the user.

In some embodiments, when the view of the communication session is displayed, in accordance with a determination that the view of the communication session includes a respective representation of the participants of the communication session at respective first locations in the view of the three-dimensional environment (e.g., the spatial view of the communication session) in accordance with virtual spatial relationships of the participants in the three-dimensional environment, the first computer system displays a sixth control object (e.g., within the view of the communication session, within the view of the three-dimensional environment, beside the view of the three-dimensional environment, or within a user interface for controlling the communication session) that, when activated, causes the first computer system to display an input region (e.g., a sketch pad for drawing input and/or handwriting input) in a second three-dimensional region of the three-dimensional environment. In some embodiments, the input region is configured to receive drawing input from a plurality of participants of the communication session and present the drawing input in a second three-dimensional region of the three-dimensional environment. In some embodiments, in accordance with a determination that the view of the communication session includes a representation of a participant of the communication session at a respective second location in the view of the three-dimensional environment (e.g., in a list of communication sessions and/or in a gallery view), the first computer system foregoes displaying the sixth control object. For example, as described with reference to fig. 7E-7F, if representations of participating users are shown in a list and/or gallery view (e.g., as opposed to representations of participants displayed relative to other participants at locations within a three-dimensional environment in a shared three-dimensional environment), then no option is provided to begin a collaborative drawing session. Automatically removing the option to initiate the collaborative drawing session when the communication session is not a spatial communication session (e.g., when the user views a list or other participant in a gallery view, rather than maintaining a spatial relationship with other participants in the three-dimensional environment) provides the user with real-time visual feedback that varies based on the current type of communication session, and provides a different set of control options to avoid cluttering the current view of the three-dimensional environment, thereby providing the user with improved visual feedback.

In some embodiments, when displaying a view of the communication session, the first computer system displays respective status indicators for one or more participants of the communication session (e.g., displays a list of avatars of users in a control area that is participating in the communication session, along with their status indicators (e.g., active, talking, exhibiting video, participating in spatial mode or non-spatial mode, or another current status)), e.g., within the view of the communication session, within the view of the three-dimensional environment, beside the view of the three-dimensional environment, or within a user interface for controlling the communication session). In some embodiments, the respective status indicators of the one or more participants indicate respective manners in which the one or more participants participated in the communication session. In some implementations, the status indicator indicates a device type and/or mode (e.g., spatial mode, non-spatial mode, voice-only mode, or audio-video mode) used by the participant to participate in the communication session. In some embodiments, the first computer system updates the status indicator in response to detecting a change in the type and/or mode of the device used by the participant to participate in the communication session. For example, as described with reference to fig. 7E, status indicators are displayed for participants in a communication session. Status indicators, such as device types, are automatically displayed for users engaged in a communication session and updated in accordance with the current status of users engaged in the communication session, providing real-time visual feedback to the users that changes based on the current status of other users so that the users are aware of different features available depending on the status of other users, thereby providing improved visual feedback to the users.

In some implementations, the first computer system detects a first input (e.g., an air tap input, a tap input in combination with a gaze input, or an up-slide gesture) directed to a respective representation of a first participant of the one or more participants of the communication session when displaying the respective representation of the one or more participants of the communication session in the control region of the communication session (e.g., displaying a list of avatars of users participating in the communication session in the control region of the communication session). In response to detecting the first input, the first computer system provides a first spatial cue (e.g., a spatial audio cue and/or a visual cue) corresponding to a first spatial location of the first participant in the three-dimensional environment. In some implementations, providing the first spatial cue includes displaying a visual indication of a spatial direction of the representation of the first participant in the three-dimensional environment relative to the current display area of the three-dimensional environment in a view of the communication session (e.g., when the representation of the first participant is not in the current display area of the three-dimensional environment, e.g., to the left or right of the current display area of the three-dimensional environment). In some implementations, providing the spatial cue includes highlighting a representation of the first participant in the three-dimensional environment in a current display area of the three-dimensional environment in a view of the communication session. For example, as described with reference to fig. 7E-7F, a user is enabled to request spatial cues that identify a particular application and/or where the user is located within the shared three-dimensional environment. Providing the user with the option of selecting a representation of the other participants in the communication session and automatically providing spatial cues indicating the relative positioning of the selected participants as compared to the user within the three-dimensional environment allows the user to request spatial cues for the respective participants without browsing through a complex menu hierarchy, thereby providing the user with additional control options without cluttering the user's view using additional displayed controls.

In some implementations, providing a first spatial cue corresponding to a first spatial location of a first participant in the three-dimensional environment includes outputting a first spatial audio output having a first virtual location corresponding to the first spatial location of the first participant in the three-dimensional environment. In some implementations, the spatial audio is audio having a reference frame separate from the reference frame of the audio output device. Depending on the particular use scenario, the frame of reference of the audio may be tied to a three-dimensional environment, moving objects in a three-dimensional environment, the user's head, the user's hand, physical objects in a physical environment, an HMD, a user, and/or a viewpoint. For example, as with reference to fig. 7E-7F, in response to a request for a spatial cue, the computer system generates and outputs the spatial cue. An audio indication automatically providing a relative positioning of a selected participant as compared to a user within a three-dimensional environment provides real-time audio feedback to the user as the user and/or participant moves back and forth in the three-dimensional environment, thereby providing improved audio feedback to the user.

Existing stereo audio output modes and mono audio output modes provide audio with reference to a frame of reference that is tied to the audio output device. For a stationary audio output device, sound sounds from the position of the audio output device in the physical environment independent of the user's movement in the physical environment and independent of changes in the visual content of the computer-generated experience (e.g., changes in the three-dimensional environment of the computer-generated experience due to movement of virtual sound sources and/or movement of viewpoints). For a wearable audio output device that remains stationary relative to a portion of a user's body (e.g., an ear or head), sound sounds locked relative to the portion of the user's body independent of changes in visual content of the computer-generated experience in a three-dimensional environment of the computer-generated experience (e.g., changes due to movement of a virtual sound source and/or changes due to movement of a point of view (e.g., movement of a point of view caused by and not corresponding to a movement of the user or a portion of the user's body). In some cases, the audio output device and the display generating component of the computer system are housed separately and are movable relative to one another in a physical environment during presentation of the computer-generated content via the audio output device and the display generating component. In such cases, the sound still sounds originating from the audio output device independent of changes in the visual content of the computer-generated experience (e.g., changes in the three-dimensional environment of the computer-generated experience due to movement of the virtual sound source (e.g., representation of the first participant in such cases) and/or movement of the viewpoint (e.g., movement caused by a request for movement in the displayed environment, or movement in the physical environment in response to and in accordance with the user or a portion thereof) of the display generating component in the physical environment). Generally, when a stereo audio output mode or a mono audio output mode is used to provide audio content of a computer-generated experience to a user, the stereo audio output mode and the mono audio output mode provide a less realistic and less immersive listening experience than the spatial audio output mode.

In some embodiments, the reference frame is a reference frame based on a virtual three-dimensional environment of a computer-generated experience provided via a display generation component of the computer system. In some implementations, where the frame of reference is based on a virtual three-dimensional environment (e.g., an environment of a virtual three-dimensional movie, three-dimensional game, or virtual office), the one or more perceived sound sources have respective spatial locations in the virtual three-dimensional environment. In some implementations, as the audio output device moves in the physical environment, the audio output from the audio output device is adjusted such that the audio continues to sound as if from one or more perceived sound sources at respective spatial locations in the virtual three-dimensional environment. In the case where the one or more perceived sound sources are moving sources that move through a sequence of spatial locations surrounding the physical environment, the audio output from the audio output device is adjusted so that the audio continues to sound as if from the one or more perceived sound sources at the sequence of spatial locations in the virtual three-dimensional environment. In some embodiments, when audio content is output using a spatial audio output mode and a frame of reference of a three-dimensional environment based on a computer-generated experience, a viewpoint of a current display view of the three-dimensional environment changes according to movement of a user and/or display generating component in a physical environment; and the user will perceive sound as if it came from the virtual location of the virtual sound source and experience the visual content of the three-dimensional environment in the same frame of reference. In some embodiments, when audio content is output using a spatial audio output mode and a frame of reference of a three-dimensional environment based on a computer-generated experience, the viewpoint of a current display view of the three-dimensional environment varies according to a user-provided movement request and/or movement of a user and/or display generating component in a physical environment; and the user will perceive sound as if it came from the virtual location of the virtual sound source and experience the visual content of the three-dimensional environment in the same frame of reference, with the user's virtual location tied to the viewpoint of the current display view.

In some embodiments, the frame of reference for the spatial audio output mode is fixed to an electronic device, such as a display generating component (e.g., a sound following display generating component), that outputs visual content corresponding to audio content output via the audio output device. For example, movement of the position of the analog audio source in the physical environment corresponds to movement of the display generating component in the physical environment (e.g., when a representation of the first participant is displayed in a peripheral portion of the field of view provided by the HMD), but not movement of the audio output device in the physical environment. For example, in some embodiments, the display generating component is a head mounted display device or a handheld display device, while the audio output device is placed in a physical environment and does not follow the movements of the user. In some embodiments, the frame of reference of the spatial audio effect is fixed to the display generating component and indirectly to the user as the display generating component and the user move in a physical environment relative to the audio output device. In some embodiments, when audio content is output using a spatial audio output mode and a frame of reference of a three-dimensional environment based on a computer-generated experience, the viewpoint of a current display view of the three-dimensional environment varies according to a user-provided movement request and/or movement of a user and/or display generating component in a physical environment; and the user will perceive sound as if it came from the virtual location of the virtual sound source and experience the visual content of the three-dimensional environment in the same frame of reference, with the user's virtual location tied to the viewpoint of the current display view.

In some embodiments, providing the first spatial cue corresponding to the first spatial location of the first participant in the three-dimensional environment includes visually highlighting the representation of the first participant in the view of the three-dimensional environment (e.g., illuminating a virtual spotlight over the representation of the first participant in the view of the three-dimensional environment, displaying a contour line around the representation of the first participant in the view of the three-dimensional environment, and/or displaying a blinking arrow pointing to the representation of the first participant in the view of the three-dimensional environment). For example, as described with reference to fig. 7E-7F, the computer system provides visual (e.g., spatial) cues. Automatically providing a visual indication of the relative positioning of the selected participant as compared to the user within the three-dimensional environment provides real-time visual feedback to the user as the user and/or participant moves back and forth in the shared three-dimensional environment, thereby providing improved visual feedback to the user.

In some embodiments, when displaying the view of the communication session, the first computer system displays (e.g., within the view of the communication session, within the view of the three-dimensional environment, beside the view of the three-dimensional environment, or within a user interface for controlling the communication session) a respective representation of one or more applications shared in the communication session (e.g., a list of application icons for applications shared in the communication session in a control area of the communication session). While displaying respective representations of one or more applications shared in the communication session, the first computer system detects a second input (e.g., an air tap input, a tap input in combination with a gaze input, or an up-slide gesture) directed to respective representations of a first application of the one or more applications shared in the communication session. In response to detecting the second input, the first computer system provides a second spatial cue (e.g., a spatial audio cue and/or a visual cue) corresponding to a second spatial location of the first application in the three-dimensional environment. In some implementations, providing the second spatial cue includes displaying a visual indication of a spatial orientation of the representation of the first application in the three-dimensional environment relative to the current display area of the three-dimensional environment in a view of the communication session (e.g., when the user interface of the first application is not in the current display area of the three-dimensional environment, e.g., to the left or right of the current display area of the three-dimensional environment). In some embodiments, providing the second spatial cue includes highlighting a user interface of the first application in the three-dimensional environment in a current display area of the three-dimensional environment in a view of the communication session, as described above with reference to fig. 7E-7F. The method includes providing the user with an option to select a representation of a shared application within a communication session and automatically providing an audio and/or visual spatial indication of a relative positioning of the selected application within a three-dimensional environment as compared to the user, providing real-time audio and/or visual feedback to the user as the user selects a different application, thereby providing improved audio and/or visual feedback to the user.

In some implementations, providing a second spatial cue corresponding to a second spatial location of the first application in the three-dimensional environment includes outputting a second spatial audio output having a second virtual location corresponding to the second spatial location of the first application in the three-dimensional environment, as described above with reference to fig. 7E-7F. An audio indication of the relative positioning of the selected application as compared to the user within the three-dimensional environment is automatically provided, real-time audio feedback is provided to the user as the user moves back and forth in the three-dimensional environment to indicate the current relative positioning between the user and the selected application, thereby providing improved audio feedback to the user.

In some embodiments, providing a second spatial cue corresponding to a second spatial location of the first application in the three-dimensional environment includes visually highlighting a user interface of the first application in the view of the three-dimensional environment (e.g., illuminating a virtual spotlight on the user interface of the first application in the view of the three-dimensional environment, displaying a contour line around an application window of the first application in the view of the three-dimensional environment, and/or displaying a blinking arrow pointing to the application window of the first application in the view of the three-dimensional environment), as described above with reference to fig. 7E-7F. Visual indications of the relative positioning of the selected application program as compared to the user within the three-dimensional environment are automatically provided, as the user moves back and forth in the three-dimensional environment, providing real-time visual feedback to the user to indicate the current relative positioning between the user and the selected application program, thereby providing improved visual feedback to the user.

In some embodiments, when a user interface for controlling a communication session is displayed, the first computer system detects movement of gaze input directed to the user interface away from the user interface. In response to detecting movement of the gaze input away from the user interface, the first computer system stops displaying the user interface in a view of the three-dimensional environment. In some embodiments, the user interface for controlling the communication session has the characteristics and behavior of the second user interface object 7018', as described above with reference to fig. 7A-7D and method 8000. In some embodiments, other user interface objects, selectable controls, control objects described herein are also displayed in the user interface for controlling the communication session. This is illustrated in fig. 7I-7J, for example, where in response to detecting movement of gaze input away from the user interface (the dashed line from the user's eyes shown in fig. 7I no longer points to user interface object 7302' in fig. 7J), the computer system stops displaying the user interface in the view of the three-dimensional environment (e.g., visually impairing the importance of user interface object 7302' in fig. 7J (e.g., while transitioning to no longer being displayed)). Automatically ceasing the display of certain user interface objects in response to the user no longer gazing or focusing on the user interface object provides real-time visual feedback to the user by moving the user interface object away from the user interface object and reduces the amount of input required by the user to remove the user interface object, thereby providing improved visual feedback to the user without requiring additional user input.

In some implementations, the first computer system detects movement of the first user's hand in a direction (e.g., sliding down in the air or sliding sideways in the air) when displaying a user interface for controlling the communication session. In response to detecting movement of the first user's hand in the direction, the first computer system stops displaying the user interface in the view of the three-dimensional environment. In some embodiments, the user interface for controlling the communication session has the characteristics and behavior of the second user interface object 7018', as described above with reference to fig. 7A-7D and method 8000. In some embodiments, other user interface objects, selectable controls, and/or control objects described herein are also displayed in the user interface for controlling the communication session. Providing the user with the option to remove certain user interface objects in response to the user performing a gesture provides the user with additional controls for interacting with the user interface objects without displaying additional control buttons or requiring the user to navigate through a complex hierarchy, thereby reducing the amount of input required to perform the operation.

In some embodiments, the first computer system receives a request to display a control for a communication session while the communication session is ongoing. In response to receiving a request to display a control for a communication session, in accordance with a determination that a view of the communication session includes respective representations of participants located at respective first locations in a view of the three-dimensional environment (e.g., a spatial view of the communication session) in accordance with virtual spatial relationships of the participants of the communication session in the three-dimensional environment, the first computer system displays a user interface for controlling the communication session that includes a first control object. In some embodiments, in accordance with a determination that the view of the communication session includes a representation of a participant of the communication session located at a respective second location in the view of the three-dimensional environment, the first computer system displays a different user interface for controlling the communication session, the different user interface including a plurality of control objects but not including a control object for modifying an appearance of the three-dimensional region of the three-dimensional environment. In some embodiments, the plurality of control objects are displayed in a list or gallery view of the communication session. In some embodiments, as described with reference to fig. 7E-7F, other user interface objects, selectable controls, and/or control objects described herein that cause changes to the view of the three-dimensional environment (e.g., avatars of participants shown in the control area, representations of shared and/or available applications shown in the control area, and/or representations of shared and/or available environments) are also displayed in the spatial mode of the communication session, but not in the non-spatial mode of the communication session. In some embodiments, the different user interfaces for controlling the communication session include one or more of the same controls described above for controlling the communication session. Automatically displaying different buttons and control options for the user based on whether the communication session is of a spatial or non-spatial type provides additional control options for the user, and provides real-time visual feedback to the user based on the type of communication session the user is currently participating in, thereby providing improved visual feedback to the user.

It should be understood that the particular order in which the operations in fig. 9 are described is merely an example and is not intended to suggest that the order is the only order in which the operations may be performed. Those of ordinary skill in the art will recognize a variety of ways to reorder the operations described herein. In addition, it should be noted that the details of other processes described herein with respect to other methods described herein (e.g., methods 8000 and 10000) are equally applicable in a similar manner to method 9000 described above with respect to fig. 9. For example, the gestures, gaze inputs, physical objects, user interface objects, controls, movements, criteria, three-dimensional environments, display generating components, surfaces, representations of physical objects, virtual objects, and/or animations described above with reference to method 9000 optionally have one or more of the characteristics of gestures, annotation inputs, physical objects, user interface objects, controls, movements, criteria, three-dimensional environments, display generating components, surfaces, representations of physical objects, virtual objects, and/or animations described herein with reference to other methods described herein (e.g., methods 8000 and 10000). For the sake of brevity, these details are not repeated here.

Fig. 10 is a flowchart of a method 10000 for determining whether to stop displaying at least a portion of a first user interface object in a view of a three-dimensional environment or to maintain display of the first user interface object in the three-dimensional environment based on a spatial relationship of an object type of the first user interface object and a viewpoint relative to the view of the three-dimensional environment when a user's attention is no longer directed to the first user interface object, according to some embodiments.

In some embodiments, method 10000 is performed at a computer system (e.g., computer system 101 in fig. 1, or a computer system described with respect to fig. 7A-7D, 7E-7F, 7G-7J). In some embodiments, the computer system is in communication with a first display generating component (e.g., display generating component 7100, another display generating component, a heads-up display, a head-mounted display (HMD), a display, a touch screen, and/or a projector) and one or more input devices (e.g., a camera or other sensor and input device that detects movement of a user's hand in a physical environment, movement of the user's entire body in a physical environment, and/or movement of the user's head in a physical environment; e.g., a controller, touch-sensitive surface, joystick, buttons, glove, watch, motion sensor, and/or orientation sensor). In some embodiments, the first display generating component is the display generating component 7100 described with respect to fig. 7A-7D, 7E-7F, and 7G-7J. In some embodiments, the first display generating means is a heads-up display that does not move or rotate with the user's head or the entire body of the user, but optionally changes the user's point of view into a three-dimensional environment according to the movement of the user's head or body relative to the first display generating means. In some embodiments, the first display generating component is optionally moved and rotated by the user's hand relative to the physical environment or relative to the user's head, and the viewpoint of the user is changed into the three-dimensional environment according to the movement of the first display generating component relative to the user's head or face or relative to the physical environment. According to some embodiments, many of the features of method 10000 are described with respect to fig. 7G-7J.

The method 10000 disclosed herein relates to displaying user interface elements for a user in a three-dimensional environment and for automatically removing certain user interface elements that have not been placed in the three-dimensional environment in response to the user looking away from the user interface elements. Other user interface elements that have been placed in the three-dimensional environment are maintained as the user looks away from the other user interface elements.

The display of one or more user interface elements that have been placed in the three-dimensional environment (e.g., anchored to the three-dimensional environment) is maintained even when the user is looking away from the one or more user interface elements, while the display of other user interface elements that have not been placed in the three-dimensional environment is not maintained when the user is looking away from the other user interface elements, providing real-time visual feedback when the user is looking away from the various user interface elements. Providing improved visual feedback to the user enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user provide proper input and reducing user error in operating/interacting with the system), which in turn reduces power usage and extends battery life of the system by enabling the user to use the system more quickly and efficiently.

In method 10000, the computer system displays (10002) a first three-dimensional computer-generated experience (e.g., an application experience, a home or dashboard experience, and/or a coexistence experience) (e.g., an augmented reality experience, a mixed reality experience, or a virtual reality experience) in a view of a three-dimensional environment (e.g., an augmented reality environment, a virtual reality environment, or an augmented reality environment).

When a first three-dimensional computer-generated experience is displayed in a view of the three-dimensional environment, the computer system detects (10004) a first event (e.g., detects user input corresponding to a request to open a conversational user interface for a currently displayed computer-generated experience, detects a system event that automatically opens the conversational user interface, detects user input that opens a new window, and/or detects user input that moves a window from near a point of view (e.g., a location anchored to the point of view, a virtual location of a portion of a user (e.g., a head, torso, and/or hand)) to a location within the three-dimensional environment anchored to a physical environment.

In response to detecting the first event, the computer system displays (10006) a first user interface object (e.g., a conversational user interface (e.g., a window) corresponding to a currently displayed computer-generated experience) in a view of the three-dimensional environment, wherein the first user interface object includes one or more user interface objects (e.g., user interface objects corresponding to different operations of the computer system (such as system-level operations and/or application-level operations) that, when activated, cause the computer system to perform respective operations (e.g., changing virtual wallpaper, changing visual enhancement applied to a representation of the physical environment, and/or providing additional user interface objects for additional functions and parameters of the first computer-generated experience) that modify at least one aspect of the display of the first computer-generated experience in the three-dimensional environment.

When a first user interface object is displayed in a view of the three-dimensional environment, the computer system detects (10008) that the attention of the user (e.g., the user in a position to view the first computer-generated experience via the display generating component) is no longer directed to the first user interface object (e.g., based on movement and/or location of the user's gaze and/or based on head movement of the user or the display generating component indicating that the attention of the user previously directed to the first user interface object is no longer directed to the first user interface object).

In response to detecting (10010) that the user's attention is no longer directed to the first user interface object, and in accordance with a determination that the first user interface object is a first object type having a first spatial relationship relative to a viewpoint of a view of the three-dimensional environment (e.g., a corresponding object of the first object type has a positioning of a viewpoint of the view of the three-dimensional environment that is anchored to the three-dimensional environment and thus moves as the viewpoint of the user changes), the computer system stops (10012) displaying at least a portion of the first user interface object in the view of the three-dimensional environment (e.g., eliminating the first user interface object in the view of the three-dimensional environment, reducing a size of the first user interface object, or minimizing the first user interface object). In some implementations, the computer system stops displaying at least a portion of the first user interface object in the view of the three-dimensional environment in accordance with determining that the first user interface object has a positioned object type in the three-dimensional environment that anchors to a position of a portion of the user (e.g., eyes, face, head, torso, and/or hands). For example, as described above with reference to fig. 7I-7J, the computer system stops displaying a portion of user interface object 7302' (e.g., visually impairing the importance of user interface object 7302' or stopping the display of user interface object 7302 ') in response to detecting that the user's attention is no longer directed to user interface object 7302' (e.g., as indicated by the dashed line from the user's eyes shown in fig. 7I, which is no longer directed to user interface object 7302 ') in fig. 7J). For example, user interface object 7302' is a first type of object having a first spatial relationship with respect to the three-dimensional environment that is different from the first spatial relationship (e.g., user interface object 7302' is displayed at a location with respect to a portion of the user (e.g., with respect to a point of view of the user)), such as user interface object 7302' is displayed at a location that is independent of (e.g., not anchored to) a portion of the three-dimensional environment.

In response to detecting (10010) that the user's attention is no longer directed to the first user interface object, in accordance with a determination that the first user interface object is a second object type having a second spatial relationship relative to the three-dimensional environment that is different from the first spatial relationship (e.g., is a representation of a respective object that is fixed with respect to a portion of the three-dimensional environment (e.g., a physical surface, a physical object, a point in free space, a virtual surface, or a virtual object in the three-dimensional environment, instead of a respective object that is fixed with respect to a virtual location corresponding to a portion of the user), the computer system maintains (10014) a display of the first user interface object in the three-dimensional environment (e.g., such that a spatial relationship between the characteristic location of the first user interface object and the three-dimensional environment is maintained). For example, as described above with reference to fig. 7G-7H, the computer system maintains the display of the first user interface object (e.g., user interface object 7304 ') in response to detecting that the user's attention is no longer directed to the first user interface object (e.g., the dashed line from the user's eyes shown in fig. 7G is no longer directed to user interface object 7304' in fig. 7H, but maintains the display of user interface object 7304 '). In some embodiments, user interface object 7304 'is a second object type having a second spatial relationship with respect to the three-dimensional environment (e.g., user interface object 7304' is anchored to the three-dimensional environment).

In some implementations, respective user interface objects of a first object type having a first spatial relationship with respect to a viewpoint of a view of a three-dimensional environment are displayed at predefined locations within the view of the three-dimensional environment. For example, in some implementations, the predefined location within the view of the three-dimensional environment includes a sub-portion (e.g., bottom edge portion, top edge portion, center portion, upper left corner, lower right corner, and/or another sub-portion) of the field of view provided by the display generating component. In some implementations, the predefined location within the view of the three-dimensional environment is at a depth (e.g., arm length, 15 centimeters, 20 inches, or a user-selected distance) from the viewpoint of the view of the three-dimensional environment. In some embodiments, the point of view through the HMD locks to the user's head or torso, and therefore, moves as the user's head or torso moves. For example, user interface object 7302 '(e.g., an object of a first object type) is displayed at a location "in front of" the user such that user interface object 7302' locks to the user's point of view and moves with the point of view (which moves with the user's head or torso). For example, in fig. 7H, user interface object 7302' is anchored to the viewpoint of the view of the three-dimensional environment and displayed at a predefined location within the view of the three-dimensional environment (e.g., at the bottom edge portion of the field of view provided by display generating component 7100). Displaying a user interface object at a predefined location in front of the user based on the current viewpoint of the user (such that the user interface object is anchored at a predefined location relative to the viewpoint of the user) provides real-time visual feedback to the user as the user moves the viewpoint of the user, thereby providing improved visual feedback to the user.

In some implementations, respective user interface objects of a second object type having a second spatial relationship with respect to the three-dimensional environment are displayed at a user-selected location among a plurality of eligible locations within the view of the three-dimensional environment. For example, a user may move a respective user interface object of a second object type across a surface or path or from one location to another within a view of the three-dimensional environment, and select to place the respective user interface object at any of a plurality of locations during movement of the respective user interface object, and fix the respective user interface object to that location (e.g., until the respective user interface object moves again or ceases to be displayed due to user input). In some embodiments, the viewpoint may move relative to the three-dimensional environment and objects in the three-dimensional environment as the user's head or torso moves in the physical environment. For example, user interface object 7304' is displayed at a location in the three-dimensional environment that is in front of representation 7004' of the wall (e.g., at a corresponding height from representation 7008' of the floor and at a corresponding distance from the corner where representation 7006' of the wall intersects representation 7004' of the wall). Enabling the user to move the user interface object 7304' to any other part of the three-dimensional environment. For example, the user is enabled to anchor the user interface object 7304 'at a location that overlaps with the representation 7014' of the physical object. Allowing the user to select an anchor location for the user interface object (such that the user interface object is displayed at another location in the three-dimensional environment) and automatically updating the display of the user interface object (such that the user interface object is anchored to the selected location) provides real-time visual feedback to the user when the user selects a new anchor point for the user interface object, thereby providing improved visual feedback to the user.

In some embodiments, the first user interface object is a first object type having a first spatial relationship with respect to a viewpoint of a view of the three-dimensional environment. Before displaying the first user interface object in a first spatial relationship relative to a viewpoint of a view of the three-dimensional environment, the computer system displays the first user interface object at a first location having a third spatial relationship relative to a location in the three-dimensional environment corresponding to a position of a first hand of a user in the physical environment. When a first user interface object is displayed at a first location in the three-dimensional environment, the computer system detects a first event (e.g., detection of a gesture provided by a first hand, a gesture provided by a second hand of a user, a voice command, and/or activation of the user interface object by a user). In response to detecting the first event, the computer system moves the first user interface object from a first location in the three-dimensional environment to a second location having a first spatial relationship relative to a viewpoint of a view of the three-dimensional environment (e.g., switches from a location of a representation that anchors the first user interface object to a first hand of the user to a viewpoint of a view that anchors the first user interface object to the three-dimensional environment). In some embodiments, the computer system displays an animated transition showing the positioning of the first user interface object from the representation of the first hand to in front of the point of view in the three-dimensional environment. For example, as described with reference to fig. 7G-7J, a user is enabled to move a user interface object (e.g., user interface object 7302 ') from anchored to the user's hand to a predefined portion that is anchored to the front of the user. Providing the user with the option of changing whether the user interface object is anchored relative to the user's hand or relative to a predefined portion of the three-dimensional environment that changes as the user's point of view changes provides real-time visual feedback to the user and additional controls to the user without requiring the user to navigate through a complex menu hierarchy, thereby providing improved visual feedback to the user without requiring additional user input.

In some implementations, detecting the first event includes detecting a gesture provided by the second hand of the user at a location remote from the first hand of the user. In some implementations, the gesture is a pinch gesture followed by a drag gesture or a throw gesture performed by the second hand of the user. In some implementations, the gesture is a pinch and drag or throw gesture detected in conjunction with gaze input directed to the first user interface object when the first user interface object is displayed at a first location having a third spatial relationship relative to a location corresponding to a location of the first hand of the user. In some implementations, the position of the pinch gesture corresponds to a first location of the first user interface object, and the first user interface object is displayed at the first location in the three-dimensional environment. In some implementations, the position of the pinch gesture corresponds to a location of a handle of a first user interface object (e.g., a user interface object on a top edge, bottom edge, or corner) displayed in a view of the three-dimensional environment. For example, as described with reference to fig. 7J, the user selects user interface object 7302' and drags the user interface object to a location within the three-dimensional environment. Automatically detecting a plurality of different gestures, including pinch and drag or throw gestures, and other user inputs that cause the system to perform different operations and automatically performing operations in response to the respective detected gestures provides additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby reducing the amount of input required to perform the operations.

In some implementations, detecting the first event includes detecting a throwing gesture provided by a first hand of the user (e.g., the first hand moving in a direction away from the user at a speed and/or distance greater than a threshold amount). In some embodiments, the throwing gesture includes an initial capture of the object, and then releases the object at the end of (e.g., or during) the movement (e.g., throwing), as described with reference to fig. 7J-7G. The method includes automatically detecting a plurality of different gestures (including a throwing gesture) that cause the system to perform different operations and automatically performing the operations in response to the respective detected gestures to provide additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby reducing the amount of input required to perform the operations.

In some implementations, detecting the first event includes detecting a gesture provided by a second hand of the user at a location on the first hand of the user. In some embodiments, the gesture is a flick gesture performed by a second hand of the user. For example, as described with reference to fig. 7J-7G, in some embodiments, a flick gesture is detected at a location on the first hand corresponding to a location within the first user interface object. In some implementations, a flick gesture is detected at a location on the first hand corresponding to a location of a control of the first user interface object. Automatically detecting a plurality of different gestures that cause the system to perform different operations (including tap gestures on predefined locations on the user's hand) and automatically performing operations in response to the respective detected gestures provides the user with additional controls without requiring the user to navigate through a complex menu hierarchy, thereby reducing the amount of input required to perform the operations.

In some embodiments, the first user interface object is a first object type having a first spatial relationship with respect to a viewpoint of a view of the three-dimensional environment. In some implementations, the computer system detects a second event (e.g., detection of a gesture provided by the first hand, a gesture provided by the second hand of the user, a voice command, or activation of the user interface object by the user) when the first user interface object is displayed in a first spatial relationship relative to a viewpoint of the view of the three-dimensional environment. In response to detecting the second event, the computer system moves the first user interface object from a third corresponding location in the three-dimensional environment having a first spatial relationship relative to a viewpoint of the view of the three-dimensional environment to a fourth location having a third spatial relationship relative to a location corresponding to a position of the first hand of the user in the physical environment (e.g., switches from the viewpoint of the view of anchoring the first user interface object to the three-dimensional environment to a location of the representation of anchoring the first user interface object to the first hand of the user). For example, as described with reference to fig. 7G-7J, the user is enabled to move user interface object 7302' from a predefined portion anchored to the three-dimensional environment (e.g., in front of the user) to a portion anchored to the user's body (e.g., the user's hand). In some embodiments, the computer system displays an animated transition showing a representation of the first user interface object flying from its location in front of a point of view in the three-dimensional environment to a first hand in the three-dimensional environment. Providing the user with an option to change whether the user interface object is anchored relative to the user's hand or to a predefined location within the three-dimensional environment that is independent of the user's hand and automatically detecting the user input to determine where the user has selected to anchor the user interface object provides real-time visual feedback to the user and provides the user with additional controls without requiring the user to navigate through a complex menu hierarchy, thereby providing the user with improved visual feedback without requiring additional user input.

In some embodiments, after moving the first user interface object to a fourth location having a third spatial relationship with respect to a location corresponding to the location of the first hand of the user in the physical environment, the computer system detects movement of the first hand of the user to a second location in the physical environment, and in response to detecting movement of the first hand of the user, the computer system moves the first user interface object with respect to a point of view of the three-dimensional environment to maintain the third spatial relationship with respect to the location corresponding to the second location of the first hand of the user in the physical environment. This is illustrated in fig. 7B-7D, for example, where after moving the first user interface object to a fourth location having a third spatial relationship relative to the location corresponding to the position of the first hand of the user in the physical environment, an orientation of the first user interface object (e.g., user interface object 7016 ') is selected based on the characteristic orientation of the first hand (e.g., as described above in fig. 7B-7D with the user rotating the user's hand). Moving the first user interface object relative to the viewpoint of the view of the three-dimensional environment to maintain a third spatial relationship relative to the positioning corresponding to the second position of the first hand of the user in the physical environment provides real-time visual feedback to the user as the first hand of the user moves, thereby providing improved visual feedback to the user.

In some implementations, detecting the second event includes detecting a gesture provided by the first hand of the user (e.g., at a location remote from a location corresponding to the third location of the first user interface object). In some embodiments, the gesture is a pinch gesture followed by a flick gesture performed by the first hand of the user. In some implementations, the gesture is a pinch and grab gesture detected in conjunction with a gaze input directed to the first user interface object when the first user object is displayed at a third location having a first spatial relationship relative to a viewpoint of a view of the three-dimensional environment. In some implementations, the position of the pinch gesture corresponds to a third location of the first user interface object, and the first user interface object is displayed at the third location in the three-dimensional environment. In some implementations, the position of the pinch gesture corresponds to a location of a handle of a first user interface object (e.g., a user interface object on a top edge, bottom edge, or corner) displayed in a view of the three-dimensional environment. This is shown in fig. 7G and 7H, for example, where detecting the second event includes detecting a gesture provided by a first hand of the user (e.g., the user's hand 7020). The method includes automatically detecting a plurality of different gestures (including pinch and drag gestures) and other user inputs that cause the system to perform different operations and automatically performing the operations in response to the respective detected gestures to provide additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby providing additional control options without cluttering the user's view with additional displayed controls.

In some implementations, detecting the second event includes detecting a gesture provided by the second hand of the user at a location on the first hand of the user. In some embodiments, the gesture is a flick gesture performed by a second hand of the user. In some implementations, a flick gesture is detected on the first hand at a location corresponding to the positioning of a control corresponding to the first user interface object, as described with reference to fig. 7G-7J. Automatically detecting a plurality of different gestures that cause the system to perform different operations (including tap gestures on predefined locations on the user's hand) and automatically performing operations in response to the respective detected gestures provides additional controls for the user without requiring the user to navigate through a complex menu hierarchy, thereby providing improved visual feedback, and providing additional control options without cluttering the user's view with additional displayed controls.

In some embodiments, the first user interface object is a first object type having a first spatial relationship with respect to a viewpoint of a view of the three-dimensional environment. In such an embodiment, the computer system detects a third event (e.g., detection of a gesture provided by the first hand, a gesture provided by the second hand of the user, a voice command, and/or activation of the user interface object by the user) when the first user interface object is displayed in a first spatial relationship relative to a viewpoint of the view of the three-dimensional environment. In some implementations, in response to detecting the third event, the computer system moves the first user interface object from a third corresponding position in the three-dimensional environment having a first spatial relationship relative to a viewpoint of a view of the three-dimensional environment to a fifth position having a second spatial relationship relative to the three-dimensional environment (e.g., switches from anchoring the first user interface object to the viewpoint of the view of the three-dimensional environment to anchoring the first user interface object to the three-dimensional environment, as described with reference to fig. 7G-7J). In some embodiments, the computer system displays an animated transition showing the first user interface object flying from its position in front of its point of view in the three-dimensional environment to a position in the three-dimensional environment that is fixed relative to the three-dimensional environment (e.g., a position in the three-dimensional environment corresponding to a wall or desktop in the physical environment). The method includes providing the user with an option to change whether the user interface object is anchored relative to a current viewpoint of the user or to a predefined location within the three-dimensional environment that is independent of the viewpoint of the user and automatically detecting the user input to determine where the user has selected to anchor the user interface object to the anchor point where the user changed the user interface object, providing real-time visual feedback to the user, and providing the user with additional controls without requiring the user to navigate through a complex menu hierarchy, thereby providing the user with improved visual feedback without requiring additional user input.

In some implementations, a first three-dimensional computer-generated experience (e.g., an application experience, a home or dashboard experience, or a coexistence experience) (e.g., an augmented reality experience, a mixed reality experience, or a virtual reality experience) displayed in a view of a three-dimensional environment is a shared experience between a user corresponding to a computer system and at least one other user corresponding to a second computer system. In such an embodiment, when the first user interface object is displayed at a third location having a first spatial relationship with respect to a viewpoint of the view of the three-dimensional environment, the computer system maintains at least a portion of the content in the first user interface object private to a user corresponding to the computer system. In response to detecting the third event, the computer system presents the portion of the content in the first user interface object to at least one other user corresponding to the second computer system after the first user interface object moves from the third corresponding location in the three-dimensional environment to a fifth location having a second spatial relationship with respect to the three-dimensional environment. In some embodiments, a user shares a three-dimensional environment with one or more other users (e.g., participants). For example, the user and other participants may view user interface objects that have been shared (e.g., not private) with other participants. In some embodiments, other participants cannot view the portion of the content that is private to the user. For example, in addition to the user's private content, the user may also see "world" content (e.g., public content), while other participants may only view "world" content and their own respective participants private content, but not view the user's private content. In some implementations, a user is enabled to move various user interface objects and/or portions of content between being displayed in public content and being displayed in private content. For example, the user selects the first object from the public content to move the first object to the private content of the user and/or selects the first object from the private content of the user to move to the public content, as described with reference to fig. 7G-7J. Providing the user with the option of changing whether the user interface object is shared with other users that are participating in the communication session or whether the user interface object is privately displayed to the user without sharing the user interface object with other users improves visual feedback provided to the user when the user selects which user interface object is shared with or without other users, and provides additional controls to the user without requiring the user to navigate through a complex menu hierarchy, thereby providing improved visual feedback to the user without requiring additional user input.

In some embodiments, the respective object of the first object type having the first spatial relationship to the viewpoint of the view of the three-dimensional environment has a display depth greater than a threshold distance corresponding to the arm length of the user away from the viewpoint of the view of the three-dimensional environment. For example, the user interface object 7302 'described in fig. 7G-7J is displayed within the three-dimensional environment at a depth perceived to be outside the reach of the user (e.g., a distance from the user's point of view). Automatically displaying the user interface object at a location relative to the user's arm (such that the object appears to be out of reach of the user) provides real-time feedback to the user by maintaining the display of the user interface object even while the user moves the user's arm, thereby providing improved visual feedback to the user without requiring additional user input.

It should be understood that the particular order in which the operations in fig. 10 are described is merely an example and is not intended to suggest that the order is the only order in which the operations may be performed. Those of ordinary skill in the art will recognize a variety of ways to reorder the operations described herein. In addition, it should be noted that the details of the other processes described herein with respect to the other methods described herein (e.g., methods 8000 and 9000) apply in a similar manner to method 10000 described above with respect to fig. 10. For example, the gestures, gaze inputs, physical objects, user interface objects, controls, movements, criteria, three-dimensional environments, display generating components, surfaces, representations of physical objects, virtual objects, and/or animations described above with reference to method 10000 optionally have one or more of the characteristics of gestures, annotation inputs, physical objects, user interface objects, controls, movements, criteria, three-dimensional environments, display generating components, surfaces, representations of physical objects, virtual objects, and/or animations described herein with reference to other methods described herein (e.g., methods 8000 and 9000). For the sake of brevity, these details are not repeated here.

The operations described above with reference to fig. 8, 9 and 10 are optionally implemented by the components depicted in fig. 1-6. In some embodiments, aspects/operations of methods 8000, 9000 and 10000 may be interchanged, substituted, and/or added between those methods. For the sake of brevity, these details are not repeated here.

Furthermore, in a method described herein in which one or more steps are dependent on one or more conditions having been met, it should be understood that the method may be repeated in multiple iterations such that during the iteration, all conditions that determine steps in the method have been met in different iterations of the method. For example, if a method requires performing a first step (if a condition is met) and performing a second step (if a condition is not met), one of ordinary skill will know that the stated steps are repeated until both the condition and the condition are not met (not sequentially). Thus, a method described as having one or more steps depending on one or more conditions having been met may be rewritten as a method that repeats until each of the conditions described in the method have been met. However, this does not require the system or computer-readable medium to claim that the system or computer-readable medium contains instructions for performing the contingent operation based on the satisfaction of the corresponding condition or conditions, and thus is able to determine whether the contingent situation has been met without explicitly repeating the steps of the method until all conditions to decide on steps in the method have been met. It will also be appreciated by those of ordinary skill in the art that, similar to a method with optional steps, a system or computer readable storage medium may repeat the steps of the method as many times as necessary to ensure that all optional steps have been performed.

The foregoing description, for purposes of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention and various described embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method, comprising:

at a computer system in communication with a first display generating component and one or more input devices:

Displaying, via the first display generating component, a first user interface object and a second user interface object in a first view of a three-dimensional environment, wherein respective characteristic locations of the first user interface object in the three-dimensional environment have a first spatial relationship with a first anchor location in the three-dimensional environment corresponding to a position of a first hand of a user in a physical environment, and respective characteristic locations of the second user interface object in the three-dimensional environment have a second spatial relationship with the first anchor location in the three-dimensional environment corresponding to the position of the first hand of the user in the physical environment, and wherein the first user interface object comprises one or more user interface objects in a predetermined layout;

Detecting, via the one or more input devices, a first movement of the first hand in the physical environment while the first user interface object and the second user interface object are displayed in the first view of the three-dimensional environment, the first movement of the first hand corresponding to translational and rotational movement relative to a point of view corresponding to the first view of the three-dimensional environment; and

In response to detecting the first movement of the first hand in the physical environment:

Translating the first user interface object and the second user interface object in the three-dimensional environment relative to the viewpoint according to the translational movement of the first hand in the physical environment; and

In accordance with the rotational movement of the first hand in the physical environment, the first user interface object is rotated in the three-dimensional environment relative to the viewpoint without rotating the second user interface object in the three-dimensional environment.

2. The method of claim 1, wherein the first spatial relationship requires the respective characteristic locations of the first user interface object to have a same distance to the viewpoint or to be closer to the viewpoint than the first anchor locations, and the second spatial relationship requires the respective characteristic locations of the second user interface object to be farther from the viewpoint than the first anchor locations.

3. The method of any of claims 1-2, wherein the second user interface object is oriented in the first view of the three-dimensional environment according to a direction in the physical environment.

4. A method according to claim 3, wherein the second user interface object is oriented in the first view of the three-dimensional environment according to an upright direction of the first display generating component in the physical environment.

5. The method according to any one of claims 1 to 4, comprising:

detecting a first user input corresponding to a request to display the second user interface object in the three-dimensional environment without regard to the second spatial relationship to the first anchor location in the three-dimensional environment while the second user interface object is displayed in a respective view of the three-dimensional environment; and

In response to detecting the first user input:

The second user interface object is moved away from the respective characteristic location having the second spatial relationship with the first anchor location to a location that is independent of the position of the first hand of the user in the physical environment.

6. The method of claim 5, wherein detecting the first user input corresponding to the request to display the second user interface object in the three-dimensional environment without regard to the second spatial relationship with the first anchor location in the three-dimensional environment comprises:

A first gesture of a second hand of the user is detected, wherein the first gesture is directed to a first control object displayed in the three-dimensional environment at a location corresponding to a position of the first hand of the user in the physical environment.

7. The method according to any one of claims 5 to 6, comprising:

Displaying a second control object in the three-dimensional environment at a respective location corresponding to the location of the first hand in the physical environment when the second user interface object is displayed at the location independent of the location of the first hand in the physical environment;

Detecting a second gesture of a second hand of the user while the second control object is displayed in the three-dimensional environment at the respective location corresponding to the position of the first hand in the physical environment, wherein the second gesture is directed to the second control object displayed in the three-dimensional environment at the respective location corresponding to the position of the first hand of the user in the physical environment; and

In response to detecting the second gesture directed to the second control object, ceasing to display the second user interface object at the location independent of the position of the first hand of the user in the physical environment.

8. The method according to any one of claims 5 to 7, comprising:

detecting, via the one or more input devices, a second movement of the first hand in the physical environment while the second user interface object is displayed in a respective view of the three-dimensional environment, wherein the second movement of the first hand causes a representation of the first hand to move out of the respective view of the three-dimensional environment; and

In accordance with a determination that the second movement of the first hand has caused the representation of the first hand to move out of the respective view of the three-dimensional environment:

In accordance with a determination that the second user interface object is currently displayed in its property location having the second spatial relationship with the first anchor location, ceasing to display the second user interface object in the respective view of the three-dimensional environment; and

In accordance with a determination that the second user interface object is currently displayed at a location that is independent of the position of the first hand, display of the second user interface object in the respective view of the three-dimensional environment is maintained.

9. The method according to any one of claims 1 to 8, comprising:

detecting, via the one or more input devices, a third movement of the first hand in the physical environment when the second user interface object is displayed in the first view of the three-dimensional environment, wherein the third movement of the first hand corresponds to movement of the first anchor location toward or away from the point of view, the point of view corresponding to the first view of the three-dimensional environment; and

In response to detecting the third movement of the first hand in the physical environment:

The size of the second user interface object is changed in accordance with the movement of the first anchor location toward or away from the viewpoint while maintaining the second spatial relationship between the respective characteristic locations of the second user interface object and the first anchor location in the three-dimensional environment.

10. The method of any of claims 1-9, wherein a size of the second user interface object in the three-dimensional environment is selected based at least in part on a size of the first hand.

11. The method of any of claims 1 to 10, wherein the orientation of the first user interface object is selected based on a characteristic orientation of the first hand.

12. The method of any of claims 1-11, wherein the first user interface object is a control panel comprising a plurality of user interface objects corresponding to different device control functions of the computer system.

13. The method of any of claims 1-11, wherein the first user interface object is a dock comprising a plurality of user interface objects corresponding to different applications or experiences, wherein respective user interface objects in the dock, when activated, cause the computer system to initiate display of the respective applications or computer-generated real-world experiences.

14. The method according to any one of claims 1 to 13, comprising:

Detecting, via the one or more input devices, a second movement of the first hand in the physical environment while the first user interface object is displayed in the first view of the three-dimensional environment, the second movement of the first hand causing a first side of the first hand to rotate away from the viewpoint corresponding to the first view of the three-dimensional environment; and

In response to detecting the second movement of the first hand in the physical environment:

In accordance with a determination that the visibility of the first side of the first hand in the first view of the three-dimensional environment is below a threshold amount of visibility, displaying a third user interface object different from the first user interface object and the second user interface object, the third user interface object overlaying or replacing a display of a portion of the second side of the first hand in the three-dimensional environment; and

In accordance with a determination that the visibility of the first side of the first hand in the first view of the three-dimensional environment is above the threshold amount of visibility, the display of the third user interface object in the three-dimensional environment is aborted.

15. The method according to any one of claims 1 to 14, comprising:

detecting, via the one or more input devices, a third movement of the first hand in the physical environment when at least one of the first user interface object and the second user interface object is displayed in the first view of the three-dimensional environment, wherein the third movement of the first hand reduces a spatial extent of a first side of the first hand represented in the first view of the three-dimensional environment; and

In response to detecting the third movement of the first hand in the physical environment, reducing a visual prominence of the at least one of the first user interface object and the second user interface object in the first view of the three-dimensional environment.

16. The method according to any one of claims 1 to 15, comprising:

Detecting, via the one or more input devices, a fourth movement of the first hand in the physical environment while at least one of the first user interface object and the second user interface object is displayed in the first view of the three-dimensional environment; and

In response to detecting the fourth movement of the first hand in the physical environment:

In accordance with determining that the fourth movement of the first hand causes the representation of the first hand to leave the first view of the three-dimensional environment, and determining that a first side of the first hand is facing toward the viewpoint of the first view of the three-dimensional environment when the representation of the first hand leaves the first view of the three-dimensional environment, maintaining the display of the at least one of the first user interface object and the second user interface object in the first view of the three-dimensional environment; and

In accordance with a determination that the fourth movement of the first hand causes the representation of the first hand to leave the first view of the three-dimensional environment, and a determination that the first side of the first hand does not face the viewpoint of the first view of the three-dimensional environment when the representation of the first hand leaves the first view of the three-dimensional environment, ceasing to display the at least one of the first user interface object and the second user interface object in the first view of the three-dimensional environment.

17. The method of any of claims 1-16, wherein the first hand is a non-dominant hand associated with the user, and in accordance with a determination that the first hand is the non-dominant hand associated with the user, the computer system displays the first user interface object and the second user interface object in the first spatial relationship and the second spatial relationship corresponding to the first anchor location in the three-dimensional environment corresponding to the location of the first hand while the first hand and the second hand of the user are both visible in the first view of the three-dimensional environment.

18. The method according to any one of claims 1 to 17, comprising:

Moving the second user interface object from an initial position to the respective property position of the second user interface object in the first view of the three-dimensional environment before displaying the second user interface object in the respective property position of the second user interface object having the second spatial relationship with the first anchor position in the three-dimensional environment, wherein the initial position of the second user interface object is closer to the first anchor position in the three-dimensional environment than the respective property position of the second user interface object having the second spatial relationship with the first anchor position.

19. The method according to any one of claims 1 to 18, comprising:

While at least one of the first user interface object and the second user interface object is displayed in the first view of the three-dimensional environment, detecting, via the one or more input devices, input directed to a respective user interface object of the at least one of the first user interface object and the second user interface object; and

In response to detecting the input directed to the respective user interface object of the at least one of the first user interface object and the second user interface object, in accordance with a determination that the input satisfies an activation criterion, performing an operation corresponding to the activation of the respective user interface object of the at least one of the first user interface object and the second user interface object, wherein the activation criterion can be satisfied by any one of:

Gaze input directed to the respective user interface object in conjunction with the detected first gesture of the second hand of the user located away from the characteristic of the respective user interface object, or

A second gesture detected at a location corresponding to the characteristic location of the respective user interface object.

20. The method according to any one of claims 1 to 19, comprising:

while at least one of the first user interface object and the second user interface object is displayed in the first view of the three-dimensional environment, detecting, via the one or more input devices, a first gaze input directed to a respective user interface object of the at least one of the first user interface object and the second user interface object; and

In response to detecting the first gaze input directed to the respective user interface object of the at least one of the first user interface object and the second user interface object, an expanded version of the respective user interface object is displayed.

21. A computer system, comprising:

a first display generation section;

One or more input devices;

One or more processors; and

A memory storing one or more programs, wherein the one or more programs are configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-20.

22. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer system comprising a first display generating component and one or more input devices, cause the computer system to perform any of the methods of claims 1-20.

23. A graphical user interface on a computer system comprising a first display generation component, one or more input devices, a memory, and one or more processors to execute one or more programs stored in the memory, the graphical user interface comprising user interfaces displayed in accordance with any of the methods of claims 1-20.

24. A computer system, comprising:

a first display generation section;

one or more input devices; and

Means for performing any one of the methods of claims 1 to 20.

25. An information processing apparatus for use in a computer system including a first display generating component and one or more input devices, the information processing apparatus comprising:

Means for performing any one of the methods of claims 1 to 20.

26. A method, comprising:

At a first computer system in communication with a first display generating component and one or more input devices:

Displaying, via the first display generating component, a view of a communication session between a first user of the first display generating component and a second user of a second display generating component different from the first display generating component, wherein the view of the communication session comprises a view of a three-dimensional environment comprising at least some virtual content shared between the first user and the second user, wherein displaying the view of the three-dimensional environment of the communication session comprises displaying a respective representation of the second user in the view of the three-dimensional environment, and wherein the respective representation of the second user is determined based on a virtual spatial relationship between the first user and the second user in the three-dimensional environment;

Displaying, via the first display generating means, a user interface for controlling the communication session when the view of the communication session is displayed, wherein the user interface for controlling the communication session comprises a first control object which, when activated by the first user, causes the first computer system to perform a respective operation of modifying an appearance of a three-dimensional region of the three-dimensional environment;

Detecting a first user input activating the first control object while the view of the three-dimensional environment is displayed; and

In response to detecting the first user input activating the first control object:

modifying the appearance of the three-dimensional region of the three-dimensional environment for the first user of the first display generating component; and

A process is initiated for modifying the appearance of the three-dimensional region of the three-dimensional environment displayed at the second display generating component for the second user of the second display generating component.

27. The method of claim 26, comprising:

when the view of the communication session is displayed, respective user interfaces of one or more applications shared in the communication session are displayed.

28. The method of any one of claims 26 to 27, comprising:

When the view of the communication session is displayed, a second control object is displayed that, when activated, causes the first computer system to perform a respective operation that stops sharing the first application currently shared in the communication session.

29. The method of any one of claims 26 to 28, comprising:

when the view of the communication session is displayed, a respective identifier corresponding to a currently selected virtual enhancement applied to the three-dimensional environment is displayed.

30. The method of any one of claims 26 to 29, comprising:

When the view of the communication session is displayed, a third control object is displayed that, when activated, causes the first computer system to remove a first set of currently selected virtual enhancements applied to the three-dimensional environment.

31. The method of any one of claims 26 to 30, comprising:

When the view of the communication session is displayed, a fourth control object is displayed that, when activated, causes the first computer system to perform respective operations that share a respective set of virtual enhancements available for application to the three-dimensional environment with at least one other participant of the communication session.

32. The method of any one of claims 26 to 31, comprising:

When displaying the view of the communication session, an indicator of a current immersion level at which the view of the communication session is displayed, wherein the current immersion level is selected from a first immersion level and a second immersion level, and wherein the second immersion level comprises a reduced transmissibility of a physical environment in the view of the communication session.

33. The method of any one of claims 26 to 32, comprising:

When the view of the communication session is displayed, respective representations of one or more participants of the communication session in a control region are displayed, wherein the respective representations of the one or more participants shown in the control region are different from the respective representations of the one or more participants shown in the view of the three-dimensional environment.

34. The method of claim 33, comprising:

when the view of the communication session is displayed, a first selectable option is displayed in association with a visual indication of a respective participant in the control region, wherein the first selectable option removes the respective participant from the communication session when activated.

35. The method of any one of claims 33 to 34, comprising:

When the view of the communication session is displayed, a second selectable option is displayed in association with a visual indication of a respective participant in the control region, wherein the second selectable option mutes audio input from the respective participant in the communication session when activated.

36. The method of any one of claims 33 to 35, comprising:

When the view of the communication session is displayed, a third selectable option is displayed in association with a visual indication of a respective participant in the control region, wherein the third selectable option, when activated, causes the respective representation of the respective participant to be removed from the view of the three-dimensional environment.

37. The method of any one of claims 26 to 36, comprising:

When the view of the communication session is displayed, a fifth control object is displayed that, when activated, causes the first computer system to display an input area in the view of the communication session for composing a text message to be displayed in a first three-dimensional region of the three-dimensional environment.

38. The method of any one of claims 26 to 37, comprising:

When the view of the communication session is displayed, a sixth control object is displayed that, when activated, causes the first computer system to display an input area in a second three-dimensional region of the three-dimensional environment, wherein the input area is configured to receive drawing input from a plurality of participants of the communication session and to present the drawing input in the second three-dimensional region of the three-dimensional environment.

39. The method of claim 38, comprising, when displaying the view of the communication session:

In accordance with a determination that the view of the communication session includes a respective representation of a participant of the communication session at a respective first location in the view of the three-dimensional environment in accordance with a virtual spatial relationship of the participant in the three-dimensional environment, displaying a sixth control object that, when activated, causes the first computer system to display an input area in a second three-dimensional region of the three-dimensional environment, wherein the input area is configured to receive drawing input from a plurality of participants of the communication session and to present the drawing input in the second three-dimensional region of the three-dimensional environment; and

In accordance with a determination that the view of the communication session includes a representation of the participant of the communication session at a respective second location in the view of the three-dimensional environment, the sixth control object is relinquished from being displayed.

40. The method of any one of claims 26 to 39, comprising:

When the view of the communication session is displayed, respective status indicators for one or more participants of the communication session are displayed, wherein the respective status indicators of the one or more participants indicate respective manners in which the one or more participants participated in the communication session.

41. The method of any one of claims 26 to 40, comprising:

detecting a first input directed to a respective representation of a first participant of the one or more participants of the communication session while displaying the respective representation of the one or more participants of the communication session in the control region of the communication session; and

In response to detecting the first input, a first spatial cue corresponding to a first spatial location of the first participant in the three-dimensional environment is provided.

42. The method of claim 41, wherein providing the first spatial cue corresponding to the first spatial location of the first participant in the three-dimensional environment comprises outputting a first spatial audio output having a first virtual location corresponding to the first spatial location of the first participant in the three-dimensional environment.

43. The method of any of claims 41-42, wherein providing the first spatial cue corresponding to the first spatial location of the first participant in the three-dimensional environment comprises visually highlighting a representation of the first participant in the view of the three-dimensional environment.

44. The method of any one of claims 26 to 43, comprising:

When the view of the communication session is displayed, displaying respective representations of one or more applications shared in the communication session;

Detecting, while displaying the respective representations of one or more applications shared in the communication session, a second input directed to the respective representations of a first application of the one or more applications shared in the communication session; and

In response to detecting the second input, a second spatial cue corresponding to a second spatial location of the first application in the three-dimensional environment is provided.

45. The method of claim 44, wherein providing the second spatial cue corresponding to the second spatial location of the first application in the three-dimensional environment comprises outputting a second spatial audio output having a second virtual location corresponding to the second spatial location of the first application in the three-dimensional environment.

46. The method of any of claims 44-45, wherein providing the second spatial cue corresponding to the second spatial location of the first application in the three-dimensional environment includes visually highlighting a user interface of the first application in the view of the three-dimensional environment.

47. The method of any one of claims 26 to 46, comprising:

detecting movement of gaze input directed to the user interface away from the user interface while the user interface for controlling the communication session is displayed; and

In response to detecting the movement of the gaze input away from the user interface, ceasing to display the user interface in the view of the three-dimensional environment.

48. The method of any one of claims 26 to 47, comprising:

Detecting movement of the first user's hand in a direction while displaying the user interface for controlling the communication session; and

In response to detecting the movement of the hand of the first user in the direction, ceasing to display the user interface in the view of the three-dimensional environment.

49. The method of any one of claims 26 to 48, comprising:

receiving a request to display a control for the communication session while the communication session is ongoing; and

In response to receiving the request to display a control for the communication session:

In accordance with a determination that the view of the communication session includes a respective representation of a participant of the communication session at a respective first location in the view of the three-dimensional environment in accordance with a virtual spatial relationship of the participant in the three-dimensional environment, displaying the user interface for controlling the communication session including the first control object;

In accordance with a determination that the view of the communication session includes a representation of the participant of the communication session at a respective second location in the view of the three-dimensional environment, a different user interface for controlling the communication session is displayed, the different user interface including a plurality of control objects but not including a control object for modifying an appearance of a three-dimensional region of the three-dimensional environment.

50. A computer system, comprising:

a first display generation section;

One or more input devices;

One or more processors; and

A memory storing one or more programs, wherein the one or more programs are configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 27-49.

51. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer system comprising a first display generating component and one or more input devices, cause the computer system to perform any of the methods of claims 27-49.

52. A graphical user interface on a computer system comprising a first display generation component, one or more input devices, a memory, and one or more processors to execute one or more programs stored in the memory, the graphical user interface comprising user interfaces displayed in accordance with any of the methods of claims 27-49.

53. A computer system, comprising:

a first display generation section;

one or more input devices; and

Means for performing any one of the methods of claims 27-49.

54. An information processing apparatus for use in a computer system including a first display generating component and one or more input devices, the information processing apparatus comprising:

means for performing any one of the methods of claims 27-49.

55. A method, comprising:

At a computer system in communication with a display generation component and one or more input devices:

displaying a first three-dimensional computer-generated experience in a view of a three-dimensional environment;

detecting a first event when the first three-dimensional computer-generated experience is displayed in the view of the three-dimensional environment;

In response to detecting the first event, displaying a first user interface object in the view of the three-dimensional environment, wherein the first user interface object comprises one or more user interface objects that, when activated, cause the computer system to perform a respective operation that modifies at least one aspect of the display of the first computer-generated experience in the three-dimensional environment;

detecting that the user's attention is no longer directed to the first user interface object when the first user interface object is displayed in the view of the three-dimensional environment;

In response to detecting that the attention of the user is no longer directed to the first user interface object:

Stopping displaying at least a portion of the first user interface object in the view of the three-dimensional environment in accordance with determining that the first user interface object is a first object type having a first spatial relationship with respect to a viewpoint of the view of the three-dimensional environment; and

In accordance with a determination that the first user interface object is a second object type having a second spatial relationship different from the first spatial relationship with respect to the three-dimensional environment, display of the first user interface object in the three-dimensional environment is maintained.

56. The method of claim 55, wherein respective user interface objects of the first object type having the first spatial relationship with respect to the viewpoint of the view of the three-dimensional environment are displayed at predefined locations within the view of the three-dimensional environment.

57. The method of any of claims 55-56, wherein respective user interface objects of the second object type having the second spatial relationship with respect to the three-dimensional environment are displayed at user-selected locations among a plurality of eligible locations within the view of the three-dimensional environment.

58. The method of any of claims 55-57, wherein the first user interface object is the first object type having the first spatial relationship with respect to the viewpoint of the view of the three-dimensional environment, and wherein the method comprises:

Prior to displaying the first user interface object in the first spatial relationship with respect to the viewpoint of the view of the three-dimensional environment:

displaying the first user interface object at a first location having a third spatial relationship to a location in the three-dimensional environment corresponding to a position of the first hand of the user in the physical environment;

detecting a first event when the first user interface object is displayed at the first location in the three-dimensional environment; and

In response to detecting the first event, the first user interface object is moved from the first location in the three-dimensional environment to a second location having the first spatial relationship relative to the viewpoint of the view of the three-dimensional environment.

59. The method of claim 58, wherein detecting the first event comprises detecting a gesture provided by a second hand of the user at a location remote from the first hand of the user.

60. The method of claim 58, wherein detecting the first event comprises detecting a throwing gesture provided by the first hand of the user.

61. The method of claim 58, wherein detecting the first event comprises detecting a gesture provided by a second hand of the user at a location on the first hand of the user.

62. The method of any of claims 55-61, wherein the first user interface object is the first object type having the first spatial relationship with respect to the viewpoint of the view of the three-dimensional environment, and wherein the method comprises:

Detecting a second event when the first user interface object is displayed in the first spatial relationship with respect to the viewpoint of the view of the three-dimensional environment; and

In response to detecting the second event, the first user interface object is moved from a third corresponding location in the three-dimensional environment having the first spatial relationship relative to the viewpoint of the view of the three-dimensional environment to a fourth location having a third spatial relationship relative to a location corresponding to a position of the first hand of the user in the physical environment.

63. The method of claim 62, further comprising, after moving the first user interface object to the fourth location having the third spatial relationship relative to a location corresponding to a position of the first hand of the user in the physical environment:

Detecting movement of the first hand of the user in the physical environment to a second location; and

In response to detecting the movement of the first hand of the user, moving the first user interface object relative to the point of view of the three-dimensional environment to maintain the third spatial relationship relative to the location corresponding to the second location of the first hand of the user in the physical environment.

64. The method of claim 62, wherein detecting the second event comprises detecting a gesture provided by the first hand of the user.

65. The method of claim 62, wherein detecting the second event comprises detecting a gesture provided by a second hand of the user at a location on the first hand of the user.

66. The method of any of claims 55-65, wherein the first user interface object is the first object type having the first spatial relationship with respect to the viewpoint of the view of the three-dimensional environment, and wherein the method comprises:

Detecting a third event when the first user interface object is displayed in the first spatial relationship with respect to the viewpoint of the view of the three-dimensional environment; and

In response to detecting the third event, the first user interface object is moved from a third corresponding location in the three-dimensional environment having the first spatial relationship relative to the viewpoint of the view of the three-dimensional environment to a fifth location having the second spatial relationship relative to the three-dimensional environment.

67. The method of claim 66, wherein the first three-dimensional computer-generated experience displayed in the view of the three-dimensional environment is a shared experience between the user corresponding to the computer system and at least one other user corresponding to a second computer system, and the method comprises:

When displaying the first user interface object at the third location having the first spatial relationship relative to the viewpoint of the view of the three-dimensional environment, keeping at least a portion of content in the first user interface object private to the user corresponding to the computer system; and

In response to detecting the third event, the portion of the content in the first user interface object is presented to the at least one other user corresponding to the second computer system after the first user interface object moves from the third corresponding location in the three-dimensional environment to the fifth location having the second spatial relationship with respect to the three-dimensional environment.

68. The method of any of claims 55-67, wherein respective objects of the first object type having the first spatial relationship with the viewpoint of the view of the three-dimensional environment have a display depth away from the viewpoint of the view of the three-dimensional environment that is greater than a threshold distance corresponding to an arm length of the user.

69. A computer system, comprising:

A display generation section;

One or more input devices;

One or more processors; and

A memory storing one or more programs, wherein the one or more programs are configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 56-68.

70. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer system comprising a first display generation component and one or more input devices, cause the computer system to perform any of the methods of claims 56-68.

71. A graphical user interface on a computer system with a display generation component, one or more input devices, a memory, and one or more processors to execute one or more programs stored in the memory, the graphical user interface comprising user interfaces displayed in accordance with any of the methods of claims 56-68.

72. A computer system, comprising:

A display generation section;

one or more input devices; and

Means for performing any one of the methods of claims 56-68.

73. An information processing apparatus for use in a computer system including a first display generating component and one or more input devices, the information processing apparatus comprising:

means for performing any one of the methods of claims 56-68.