CN118435158A

CN118435158A - Device, method and graphical user interface for capturing and displaying media

Info

Publication number: CN118435158A
Application number: CN202280080213.2A
Authority: CN
Inventors: A·达维加; 林家仰; J·B·曼扎里; A·孟席斯; T·里克; B·L·施米特钦; W·A·索伦帝诺三世; J·拉瓦兹; I·马尔科维奇; A·S·Y·张; A·莫林; L·S·布劳顿; S·O·勒梅
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2021-12-03
Filing date: 2022-11-23
Publication date: 2024-08-02

Abstract

The present disclosure relates generally to techniques and user interfaces for capturing media, displaying previews of the media, displaying recording indicators, displaying a camera user interface, and/or displaying previously captured media.

Description

Device, method and graphical user interface for capturing and displaying media

Cross Reference to Related Applications

The present application claims priority from U.S. patent application Ser. No. 17/992,789, titled "DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR CAPTURING AND DISPLAYING MEDIA", filed 11/22/2022, U.S. provisional patent application Ser. No. 63/409,690, titled "DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR CAPTURING ANDDISPLAYING MEDIA", filed 5/2022, U.S. provisional patent application Ser. No. 63/338,864, titled "DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR CAPTURING ANDDISPLAYING MEDIA", and U.S. provisional patent application Ser. No. 63/285,897, titled "DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR CAPTURING ANDDISPLAYING MEDIA", filed 12/2021. The contents of each of these patent applications are incorporated herein by reference in their entirety.

Technical Field

The present disclosure relates generally to computer systems in communication with a display generation component and optionally one or more input devices and one or more cameras that provide a computer-generated experience, including, but not limited to, electronic devices that provide virtual reality and mixed reality experiences via a display.

Background

In recent years, the development of computer systems for capturing and/or displaying media in various environments, such as an augmented reality environment, has increased significantly. An example augmented reality environment includes at least some virtual elements that replace or augment the physical world. Input devices (such as cameras, controllers, joysticks, touch-sensitive surfaces, and touch screen displays) for computer systems and other electronic computing devices are used to interact with the virtual/augmented reality environment. Exemplary virtual elements include virtual objects such as digital images, video, text, icons, and control elements (such as buttons and other graphics).

Disclosure of Invention

Some methods and interfaces for capturing and/or displaying media in a variety of environments are cumbersome, inefficient, and limited. For example, systems that provide inadequate visual feedback for capturing media, systems that require a series of complex inputs to perform the media capturing process, and systems in which the display of the media is complex, cumbersome, and error-prone can place a significant cognitive burden on the user and detract from the experience of the virtual/augmented reality environment. In addition, these methods take longer than necessary, wasting energy from the computer system. This latter consideration is particularly important in battery-powered devices.

Accordingly, there is a need for a computer system with improved methods and interfaces to provide a user with a computer-generated experience, thereby making capturing and/or displaying media in a variety of environments more efficient and intuitive for the user. Such methods and interfaces optionally complement or replace conventional methods for capturing and/or displaying media in a variety of environments. Such methods and interfaces reduce the number, extent, and/or nature of inputs from a user by helping the user understand the association between the inputs provided and the response of the device to those inputs, thereby forming a more efficient human-machine interface.

The above-described drawbacks and other problems associated with user interfaces of computer systems are reduced or eliminated by the disclosed systems. In some embodiments, the computer system is a desktop computer with an associated display. In some embodiments, the computer system is a portable device (e.g., a notebook, tablet, or handheld device). In some embodiments, the computer system is a personal electronic device (e.g., a wearable electronic device such as a watch or a head-mounted device). In some embodiments, the computer system has a touch pad. In some embodiments, the computer system has one or more cameras. In some implementations, the computer system has a touch-sensitive display (also referred to as a "touch screen" or "touch screen display"). In some embodiments, the computer system has one or more eye tracking components. In some embodiments, the computer system has one or more hand tracking components. In some embodiments, the computer system has, in addition to the display generating component, one or more output devices including one or more haptic output generators and/or one or more audio output devices. In some embodiments, a computer system has a Graphical User Interface (GUI), one or more processors, memory and one or more modules, a program or set of instructions stored in the memory for performing a plurality of functions. In some embodiments, the user interacts with the GUI through contact and gestures of a stylus and/or finger on the touch-sensitive surface, movements of the user's eyes and hands in space relative to the GUI (and/or computer system) or the user's body (as captured by cameras and other motion sensors), and/or voice input (as captured by one or more audio input devices). In some embodiments, the functions performed by the interactions optionally include image editing, drawing, presentation, word processing, spreadsheet making, game playing, phone calls, video conferencing, email sending and receiving, instant messaging, test support, digital photography, digital video recording, web browsing, digital music playing, notes taking, and/or digital video playing. Executable instructions for performing these functions are optionally included in a transitory and/or non-transitory computer readable storage medium or other computer program product configured for execution by one or more processors.

There is a need for an electronic device with improved methods and interfaces for capturing and/or displaying media in a variety of environments. Such methods and interfaces may supplement or replace conventional methods for capturing and/or displaying media. Such methods and interfaces reduce the amount, degree, and/or nature of input from a user and result in a more efficient human-machine interface. For battery-powered computing devices, such methods and interfaces save power and increase the separation between battery charges and reduce the amount of processing power.

According to some embodiments, a method performed at a computer system in communication with a display generation component and one or more cameras is described. The method comprises the following steps: upon displaying a first user interface overlaid on top of a representation of a physical environment via the display generating component, detecting a request to display a media capture user interface, wherein the representation of the physical environment changes as a portion of the physical environment corresponding to the representation of the physical environment changes and/or a viewpoint of a user changes; and in response to detecting the request to display the media capture user interface, displaying, via the display generating component, a media capture preview comprising a representation of a portion of a field of view of the one or more cameras, the media capture preview having content that is updated as the portion of the physical environment in the portion of the field of view of the one or more cameras changes, wherein: the media capture preview indication is to be a boundary of media captured in response to detecting a media capture input while the media capture user interface is displayed; displaying the media capture preview when a first portion of the representation of the physical environment is visible, wherein the first portion of the representation of the physical environment is visible prior to detecting the request to display the media capture user interface; and a media capture preview is displayed in place of a second portion of the representation of the physical environment, wherein the first portion of the representation of the physical environment is updated as the portion of the physical environment corresponding to the first portion of the representation of the physical environment changes and/or the viewpoint of the user changes.

According to some embodiments, a non-transitory computer readable storage medium is described. The non-transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component and the one or more cameras, the one or more programs comprising instructions for: upon displaying a first user interface overlaid on top of a representation of a physical environment via the display generating component, detecting a request to display a media capture user interface, wherein the representation of the physical environment changes as a portion of the physical environment corresponding to the representation of the physical environment changes and/or a viewpoint of a user changes; and in response to detecting the request to display the media capture user interface, displaying, via the display generating component, a media capture preview comprising a representation of a portion of a field of view of the one or more cameras, the media capture preview having content that is updated as the portion of the physical environment in the portion of the field of view of the one or more cameras changes, wherein: the media capture preview indication is to be a boundary of media captured in response to detecting a media capture input while the media capture user interface is displayed; displaying the media capture preview when a first portion of the representation of the physical environment is visible, wherein the first portion of the representation of the physical environment is visible prior to detecting the request to display the media capture user interface; and a media capture preview is displayed in place of a second portion of the representation of the physical environment, wherein the first portion of the representation of the physical environment is updated as the portion of the physical environment corresponding to the first portion of the representation of the physical environment changes and/or the viewpoint of the user changes.

According to some embodiments, a transitory computer readable storage medium is described. The transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component and the one or more cameras, the one or more programs comprising instructions for: upon displaying a first user interface overlaid on top of a representation of a physical environment via the display generating component, detecting a request to display a media capture user interface, wherein the representation of the physical environment changes as a portion of the physical environment corresponding to the representation of the physical environment changes and/or a viewpoint of a user changes; and in response to detecting the request to display the media capture user interface, displaying, via the display generating component, a media capture preview comprising a representation of a portion of a field of view of the one or more cameras, the media capture preview having content that is updated as the portion of the physical environment in the portion of the field of view of the one or more cameras changes, wherein: the media capture preview indication is to be a boundary of media captured in response to detecting a media capture input while the media capture user interface is displayed; displaying the media capture preview when a first portion of the representation of the physical environment is visible, wherein the first portion of the representation of the physical environment is visible prior to detecting the request to display the media capture user interface; and a media capture preview is displayed in place of a second portion of the representation of the physical environment, wherein the first portion of the representation of the physical environment is updated as the portion of the physical environment corresponding to the first portion of the representation of the physical environment changes and/or the viewpoint of the user changes.

According to some embodiments, a computer system is described in communication with a display generation component and one or more cameras. The computer system includes: one or more processors; a memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for: upon displaying a first user interface overlaid on top of a representation of a physical environment via the display generating component, detecting a request to display a media capture user interface, wherein the representation of the physical environment changes as a portion of the physical environment corresponding to the representation of the physical environment changes and/or a viewpoint of a user changes; and in response to detecting the request to display the media capture user interface, displaying, via the display generating component, a media capture preview comprising a representation of a portion of a field of view of the one or more cameras, the media capture preview having content that is updated as the portion of the physical environment in the portion of the field of view of the one or more cameras changes, wherein: the media capture preview indication is to be a boundary of media captured in response to detecting a media capture input while the media capture user interface is displayed; displaying the media capture preview when a first portion of the representation of the physical environment is visible, wherein the first portion of the representation of the physical environment is visible prior to detecting the request to display the media capture user interface; and a media capture preview is displayed in place of a second portion of the representation of the physical environment, wherein the first portion of the representation of the physical environment is updated as the portion of the physical environment corresponding to the first portion of the representation of the physical environment changes and/or the viewpoint of the user changes.

According to some embodiments, a computer system is described in communication with a display generation component and one or more cameras. The computer system includes: means for detecting a request to display a media capturing user interface while displaying a first user interface overlaid on top of a representation of a physical environment via the display generating means, wherein the representation of the physical environment changes as a portion of the physical environment corresponding to the representation of the physical environment changes and/or a viewpoint of a user changes; and means for displaying, via the display generating component, a media capture preview comprising a representation of a portion of a field of view of the one or more cameras in response to detecting the request to display the media capture user interface, the media capture preview having content that is updated as the portion of the physical environment in the portion of the field of view of the one or more cameras changes, wherein: the media capture preview indication is to be a boundary of media captured in response to detecting a media capture input while the media capture user interface is displayed; displaying the media capture preview when a first portion of the representation of the physical environment is visible, wherein the first portion of the representation of the physical environment is visible prior to detecting the request to display the media capture user interface; and a media capture preview is displayed in place of a second portion of the representation of the physical environment, wherein the first portion of the representation of the physical environment is updated as the portion of the physical environment corresponding to the first portion of the representation of the physical environment changes and/or the viewpoint of the user changes.

According to some embodiments, a computer program product is described. The computer program product includes one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component and the one or more cameras. The one or more programs include instructions for: upon displaying a first user interface overlaid on top of a representation of a physical environment via the display generating component, detecting a request to display a media capture user interface, wherein the representation of the physical environment changes as a portion of the physical environment corresponding to the representation of the physical environment changes and/or a viewpoint of a user changes; and in response to detecting the request to display the media capture user interface, displaying, via the display generating component, a media capture preview comprising a representation of a portion of a field of view of the one or more cameras, the media capture preview having content that is updated as the portion of the physical environment in the portion of the field of view of the one or more cameras changes, wherein: the media capture preview indication is to be a boundary of media captured in response to detecting a media capture input while the media capture user interface is displayed; displaying the media capture preview when a first portion of the representation of the physical environment is visible, wherein the first portion of the representation of the physical environment is visible prior to detecting the request to display the media capture user interface; and a media capture preview is displayed in place of a second portion of the representation of the physical environment, wherein the first portion of the representation of the physical environment is updated as the portion of the physical environment corresponding to the first portion of the representation of the physical environment changes and/or the viewpoint of the user changes.

According to some embodiments, a method performed at a computer system in communication with a display generation component and one or more cameras is described. The method comprises the following steps: when a viewpoint of a user is in a first pose, displaying, via a display generation component, an augmented reality user interface comprising a preview of a field of view of the one or more cameras overlaid on a first portion of a three-dimensional environment visible in the viewpoint of the user, wherein the preview comprises a representation of the first portion of the three-dimensional environment and is displayed with a corresponding spatial configuration relative to the viewpoint of the user; detecting a change in the pose of the viewpoint of the user from the first pose to a second pose different from the first pose; and in response to detecting the change in the pose of the view of the user from the first pose to the second pose, shifting the preview of the field of view of the one or more cameras away from the corresponding spatial configuration relative to the view of the user in a direction determined based on the change in the pose of the view of the user from the first pose to the second pose, wherein the shifting of the preview of the field of view of the one or more cameras occurs at a first speed, wherein when the preview of the field of view of the one or more cameras is shifting based on the change in the pose of the view of the user, the representation of the three-dimensional environment changes at a second speed different from the first speed based on the change in the pose of the view of the user.

According to some embodiments, a non-transitory computer readable storage medium is described. The non-transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component and the one or more cameras, the one or more programs comprising instructions for: when a viewpoint of a user is in a first pose, displaying, via a display generation component, an augmented reality user interface comprising a preview of a field of view of the one or more cameras overlaid on a first portion of a three-dimensional environment visible in the viewpoint of the user, wherein the preview comprises a representation of the first portion of the three-dimensional environment and is displayed with a corresponding spatial configuration relative to the viewpoint of the user; detecting a change in the pose of the viewpoint of the user from the first pose to a second pose different from the first pose; and in response to detecting the change in the pose of the view of the user from the first pose to the second pose, shifting the preview of the field of view of the one or more cameras away from the corresponding spatial configuration relative to the view of the user in a direction determined based on the change in the pose of the view of the user from the first pose to the second pose, wherein the shifting of the preview of the field of view of the one or more cameras occurs at a first speed, wherein when the preview of the field of view of the one or more cameras is shifting based on the change in the pose of the view of the user, the representation of the three-dimensional environment changes at a second speed different from the first speed based on the change in the pose of the view of the user.

According to some embodiments, a transitory computer readable storage medium is described. The transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component and the one or more cameras, the one or more programs comprising instructions for: when a viewpoint of a user is in a first pose, displaying, via a display generation component, an augmented reality user interface comprising a preview of a field of view of the one or more cameras overlaid on a first portion of a three-dimensional environment visible in the viewpoint of the user, wherein the preview comprises a representation of the first portion of the three-dimensional environment and is displayed with a corresponding spatial configuration relative to the viewpoint of the user; detecting a change in the pose of the viewpoint of the user from the first pose to a second pose different from the first pose; and in response to detecting the change in the pose of the view of the user from the first pose to the second pose, shifting the preview of the field of view of the one or more cameras away from the corresponding spatial configuration relative to the view of the user in a direction determined based on the change in the pose of the view of the user from the first pose to the second pose, wherein the shifting of the preview of the field of view of the one or more cameras occurs at a first speed, wherein when the preview of the field of view of the one or more cameras is shifting based on the change in the pose of the view of the user, the representation of the three-dimensional environment changes at a second speed different from the first speed based on the change in the pose of the view of the user.

According to some embodiments, a computer system is described in communication with a display generation component and one or more cameras. The computer system includes: one or more processors; and a memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: when a viewpoint of a user is in a first pose, displaying, via a display generation component, an augmented reality user interface comprising a preview of a field of view of the one or more cameras overlaid on a first portion of a three-dimensional environment visible in the viewpoint of the user, wherein the preview comprises a representation of the first portion of the three-dimensional environment and is displayed with a corresponding spatial configuration relative to the viewpoint of the user; detecting a change in the pose of the viewpoint of the user from the first pose to a second pose different from the first pose; and in response to detecting the change in the pose of the view of the user from the first pose to the second pose, shifting the preview of the field of view of the one or more cameras away from the corresponding spatial configuration relative to the view of the user in a direction determined based on the change in the pose of the view of the user from the first pose to the second pose, wherein the shifting of the preview of the field of view of the one or more cameras occurs at a first speed, wherein when the preview of the field of view of the one or more cameras is shifting based on the change in the pose of the view of the user, the representation of the three-dimensional environment changes at a second speed different from the first speed based on the change in the pose of the view of the user.

According to some embodiments, a computer system is described in communication with a display generation component and one or more cameras. The computer system includes: means for displaying, via a display generating component, an augmented reality user interface when a viewpoint of a user is in a first pose, the augmented reality user interface comprising a preview of a field of view of the one or more cameras, the preview overlaid on a first portion of a three-dimensional environment visible in the viewpoint of the user, wherein the preview comprises a representation of the first portion of the three-dimensional environment and is displayed with a corresponding spatial configuration relative to the viewpoint of the user; means for detecting a change in the pose of the viewpoint of the user from the first pose to a second pose different from the first pose; and means for, in response to detecting the change in the pose of the view of the user from the first pose to the second pose, offsetting the preview of the field of view of the one or more cameras away from the corresponding spatial configuration relative to the view of the user in a direction determined based on the change in the pose of the view of the user from the first pose to the second pose, wherein the offsetting of the preview of the field of view of the one or more cameras occurs at a first speed, wherein when the preview of the field of view of the one or more cameras is offsetting based on the change in the pose of the view of the user, the representation of the three-dimensional environment changes at a second speed different from the first speed based on the change in the pose of the view of the user.

According to some embodiments, a computer program product is described. The computer program product includes one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component. The one or more programs include instructions for: when a viewpoint of a user is in a first pose, displaying, via a display generation component, an augmented reality user interface comprising a preview of a field of view of the one or more cameras overlaid on a first portion of a three-dimensional environment visible in the viewpoint of the user, wherein the preview comprises a representation of the first portion of the three-dimensional environment and is displayed with a corresponding spatial configuration relative to the viewpoint of the user; detecting a change in the pose of the viewpoint of the user from the first pose to a second pose different from the first pose; and in response to detecting the change in the pose of the view of the user from the first pose to the second pose, shifting the preview of the field of view of the one or more cameras away from the corresponding spatial configuration relative to the view of the user in a direction determined based on the change in the pose of the view of the user from the first pose to the second pose, wherein the shifting of the preview of the field of view of the one or more cameras occurs at a first speed, wherein when the preview of the field of view of the one or more cameras is shifting based on the change in the pose of the view of the user, the representation of the three-dimensional environment changes at a second speed different from the first speed based on the change in the pose of the view of the user.

According to some embodiments, a method is performed at a computer system in communication with a display generation component. The method comprises the following steps: upon displaying the augmented reality environment user interface, detecting a request to display captured media comprising immersive content that, when viewed from a respective range of one or more viewpoints, provides a first set of visual cues that the user is at least partially surrounded by content; and in response to detecting the request to display the captured media, displaying the captured media as a three-dimensional representation of the captured media, the three-dimensional representation being displayed at a location selected by the computer system such that the first viewpoint of the user is outside of a respective range of the one or more viewpoints.

According to some embodiments, a non-transitory computer readable storage medium is described. The non-transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component, the one or more programs comprising instructions for: upon displaying the augmented reality environment user interface, detecting a request to display captured media comprising immersive content that, when viewed from a respective range of one or more viewpoints, provides a first set of visual cues that the user is at least partially surrounded by content; and in response to detecting the request to display the captured media, displaying the captured media as a three-dimensional representation of the captured media, the three-dimensional representation being displayed at a location selected by the computer system such that the first viewpoint of the user is outside of a respective range of the one or more viewpoints.

According to some embodiments, a transitory computer readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, the one or more programs comprising instructions for: upon displaying the augmented reality environment user interface, detecting a request to display captured media comprising immersive content that, when viewed from a respective range of one or more viewpoints, provides a first set of visual cues that the user is at least partially surrounded by content; and in response to detecting the request to display the captured media, displaying the captured media as a three-dimensional representation of the captured media, the three-dimensional representation being displayed at a location selected by the computer system such that the first viewpoint of the user is outside of a respective range of the one or more viewpoints.

According to some embodiments, a computer system in communication with a display generation component. The computer system includes: one or more processors; and a memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: upon displaying the augmented reality environment user interface, detecting a request to display captured media comprising immersive content that, when viewed from a respective range of one or more viewpoints, provides a first set of visual cues that the user is at least partially surrounded by content; and in response to detecting the request to display the captured media, displaying the captured media as a three-dimensional representation of the captured media, the three-dimensional representation being displayed at a location selected by the computer system such that the first viewpoint of the user is outside of a respective range of the one or more viewpoints.

According to some embodiments, a computer system in communication with a display generation component is described. The computer system includes: means for detecting, while displaying the augmented reality environment user interface, a request to display captured media comprising immersive content that, when viewed from a respective range of one or more viewpoints, provides a first set of visual cues at least partially surrounded by content by the user; and means for displaying the captured media as a three-dimensional representation of the captured media in response to detecting the request to display the captured media, the three-dimensional representation being displayed at a location selected by the computer system such that the first viewpoint of the user is outside of a respective range of the one or more viewpoints.

According to some embodiments, a computer program product is described. The computer program product includes one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component. The one or more programs include instructions for: upon displaying the augmented reality environment user interface, detecting a request to display captured media comprising immersive content that, when viewed from a respective range of one or more viewpoints, provides a first set of visual cues that the user is at least partially surrounded by content; and in response to detecting the request to display the captured media, displaying the captured media as a three-dimensional representation of the captured media, the three-dimensional representation being displayed at a location selected by the computer system such that the first viewpoint of the user is outside of a respective range of the one or more viewpoints.

According to some embodiments, a method performed at a computer system in communication with a display generation component and one or more cameras is described. The method comprises the following steps: displaying, via the display generating component, an augmented reality camera user interface, the augmented reality camera user interface comprising: a representation of the physical environment; and a recording indicator indicating a recording area within a field of view of the one or more cameras, wherein the recording indicator includes at least a first edge area having a visual parameter that decreases in a visible portion of the recording indicator by a plurality of different values of the visual parameter, wherein the value of the parameter gradually decreases with increasing distance from the first edge area of the recording indicator.

According to some embodiments, a non-transitory computer readable storage medium is described. The non-transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component and the one or more cameras, the one or more programs comprising instructions for: displaying, via the display generating component, an augmented reality camera user interface, the augmented reality camera user interface comprising: a representation of the physical environment; and a recording indicator indicating a recording area within a field of view of the one or more cameras, wherein the recording indicator includes at least a first edge area having a visual parameter that decreases in a visible portion of the recording indicator by a plurality of different values of the visual parameter, wherein the value of the parameter gradually decreases with increasing distance from the first edge area of the recording indicator.

According to some embodiments, a transitory computer readable storage medium is described. The transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component and the one or more cameras, the one or more programs comprising instructions for: displaying, via the display generating component, an augmented reality camera user interface, the augmented reality camera user interface comprising: a representation of the physical environment; and a recording indicator indicating a recording area within a field of view of the one or more cameras, wherein the recording indicator includes at least a first edge area having a visual parameter that decreases in a visible portion of the recording indicator by a plurality of different values of the visual parameter, wherein the value of the parameter gradually decreases with increasing distance from the first edge area of the recording indicator.

According to some embodiments, a computer system configured to communicate with a display generation component and one or more cameras is described. The computer system includes: one or more processors; and a memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: displaying, via the display generating component, an augmented reality camera user interface, the augmented reality camera user interface comprising: a representation of the physical environment; and a recording indicator indicating a recording area within a field of view of the one or more cameras, wherein the recording indicator includes at least a first edge area having a visual parameter that decreases in a visible portion of the recording indicator by a plurality of different values of the visual parameter, wherein the value of the parameter gradually decreases with increasing distance from the first edge area of the recording indicator.

According to some embodiments, a computer system configured to communicate with a display generation component and one or more cameras is described. The computer system includes: means for displaying an augmented reality camera user interface via the display generating component, the augmented reality camera user interface comprising: a representation of the physical environment; and a recording indicator indicating a recording area within a field of view of the one or more cameras, wherein the recording indicator includes at least a first edge area having a visual parameter that decreases in a visible portion of the recording indicator by a plurality of different values of the visual parameter, wherein the value of the parameter gradually decreases with increasing distance from the first edge area of the recording indicator.

According to some embodiments, a computer program product is described. The computer program product includes one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component and the one or more cameras. The one or more programs include instructions for: displaying, via the display generating component, an augmented reality camera user interface, the augmented reality camera user interface comprising: a representation of the physical environment; and a recording indicator indicating a recording area within a field of view of the one or more cameras, wherein the recording indicator includes at least a first edge area having a visual parameter that decreases in a visible portion of the recording indicator by a plurality of different values of the visual parameter, wherein the value of the parameter gradually decreases with increasing distance from the first edge area of the recording indicator.

According to some embodiments, a method performed at a computer system in communication with a display generation component, one or more input devices, and one or more cameras is described. The method comprises the following steps: detecting, via the one or more input devices, a request to display a camera user interface; and in response to detecting a request to display a camera user interface, displaying the camera user interface, wherein the camera user interface includes a reticle virtual object indicating a capture area of the one or more cameras, wherein displaying the camera user interface comprises: in accordance with a determination that a set of one or more criteria is met, displaying the camera user interface within the camera user interface with a tutorial, wherein the tutorial provides information about how media was captured with the computer system when the camera user interface was displayed; and in accordance with a determination that the set of one or more criteria is not met, displaying the camera user interface without displaying the tutorial.

According to some embodiments, a non-transitory computer readable storage medium is described. The non-transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component, the one or more input devices, and the one or more cameras, the one or more programs comprising instructions for: detecting, via the one or more input devices, a request to display a camera user interface; and in response to detecting a request to display a camera user interface, displaying the camera user interface, wherein the camera user interface includes a reticle virtual object indicating a capture area of the one or more cameras, wherein displaying the camera user interface comprises: in accordance with a determination that a set of one or more criteria is met, displaying the camera user interface within the camera user interface with a tutorial, wherein the tutorial provides information about how media was captured with the computer system when the camera user interface was displayed; and in accordance with a determination that the set of one or more criteria is not met, displaying the camera user interface without displaying the tutorial.

According to some embodiments, a transitory computer readable storage medium is described. The transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs comprising instructions for: detecting, via the one or more input devices, a request to display a camera user interface; and in response to detecting a request to display a camera user interface, displaying the camera user interface, wherein the camera user interface includes a reticle virtual object indicating a capture area of the one or more cameras, wherein displaying the camera user interface comprises: in accordance with a determination that a set of one or more criteria is met, displaying the camera user interface within the camera user interface with a tutorial, wherein the tutorial provides information about how media was captured with the computer system when the camera user interface was displayed; and in accordance with a determination that the set of one or more criteria is not met, displaying the camera user interface without displaying the tutorial.

According to some embodiments, a computer system is described. The computer system includes: one or more processors, wherein the computer system is configured to communicate with the display generation component, the one or more input devices, and the one or more cameras; and a memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: detecting, via the one or more input devices, a request to display a camera user interface; and in response to detecting a request to display a camera user interface, displaying the camera user interface, wherein the camera user interface includes a reticle virtual object indicating a capture area of the one or more cameras, wherein displaying the camera user interface comprises: in accordance with a determination that a set of one or more criteria is met, displaying the camera user interface within the camera user interface with a tutorial, wherein the tutorial provides information about how media was captured with the computer system when the camera user interface was displayed; and in accordance with a determination that the set of one or more criteria is not met, displaying the camera user interface without displaying the tutorial.

According to some embodiments, a computer system is described. The computer system is configured to communicate with the display generation component, the one or more input devices, and the one or more cameras, and the computer system comprises: means for detecting, via the one or more input devices, a request to display a camera user interface; and means for displaying a camera user interface in response to detecting a request to display the camera user interface, wherein the camera user interface includes a reticle virtual object indicating a capture region of the one or more cameras, wherein displaying the camera user interface comprises: in accordance with a determination that a set of one or more criteria is met, displaying the camera user interface within the camera user interface with a tutorial, wherein the tutorial provides information about how media was captured with the computer system when the camera user interface was displayed; and in accordance with a determination that the set of one or more criteria is not met, displaying the camera user interface without displaying the tutorial.

According to some embodiments, a computer program product is described. The computer program product includes one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs including instructions for: detecting, via the one or more input devices, a request to display a camera user interface; and in response to detecting a request to display a camera user interface, displaying the camera user interface, wherein the camera user interface includes a reticle virtual object indicating a capture area of the one or more cameras, wherein displaying the camera user interface comprises: in accordance with a determination that a set of one or more criteria is met, displaying the camera user interface within the camera user interface with a tutorial, wherein the tutorial provides information about how media was captured with the computer system when the camera user interface was displayed; and in accordance with a determination that the set of one or more criteria is not met, displaying the camera user interface without displaying the tutorial.

According to some embodiments, a method performed at a computer system in communication with a display generation component, one or more input devices, and one or more cameras is described. The method comprises the following steps: displaying, via the display generating means, a user interface comprising: a representation of a physical environment, wherein a first portion of the representation of the physical environment is within a capture area of the one or more cameras and a second portion of the representation of the physical environment is outside the capture area of the one or more cameras; and a viewfinder, wherein the viewfinder includes a boundary; while displaying the user interface, detecting a first request to capture media via the one or more input devices; and in response to detecting the first request to capture media: capturing, using the one or more cameras, a first media item comprising at least a first portion of a representation of a physical environment; and changing an appearance of the viewfinder, wherein changing the appearance of the viewfinder comprises: changing the appearance of the first content portion that is within a threshold distance of a first side of the boundary of the viewfinder; and changing the appearance of a second content portion that is within the threshold distance of a second side of the boundary of the viewfinder, the second side being different from the first side of the boundary of the viewfinder.

According to some embodiments, a non-transitory computer readable storage medium is described. The non-transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with the display generation component, the one or more input devices, and the one or more cameras, the one or more programs comprising instructions for: displaying, via the display generating means, a user interface comprising: a representation of a physical environment, wherein a first portion of the representation of the physical environment is within a capture area of the one or more cameras and a second portion of the representation of the physical environment is outside the capture area of the one or more cameras; and a viewfinder, wherein the viewfinder includes a boundary; while displaying the user interface, detecting a first request to capture media via the one or more input devices; and in response to detecting the first request to capture media: capturing, using the one or more cameras, a first media item comprising at least a first portion of a representation of a physical environment; and changing an appearance of the viewfinder, wherein changing the appearance of the viewfinder comprises: changing the appearance of the first content portion that is within a threshold distance of a first side of the boundary of the viewfinder; and changing the appearance of a second content portion that is within the threshold distance of a second side of the boundary of the viewfinder, the second side being different from the first side of the boundary of the viewfinder.

According to some embodiments, a transitory computer readable storage medium is described. The transitory computer readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs comprising instructions for: displaying, via the display generating means, a user interface comprising: a representation of a physical environment, wherein a first portion of the representation of the physical environment is within a capture area of the one or more cameras and a second portion of the representation of the physical environment is outside the capture area of the one or more cameras; and a viewfinder, wherein the viewfinder includes a boundary; while displaying the user interface, detecting a first request to capture media via the one or more input devices; and in response to detecting the first request to capture media: capturing, using the one or more cameras, a first media item comprising at least a first portion of a representation of a physical environment; and changing an appearance of the viewfinder, wherein changing the appearance of the viewfinder comprises: changing the appearance of the first content portion that is within a threshold distance of a first side of the boundary of the viewfinder; and changing the appearance of a second content portion that is within the threshold distance of a second side of the boundary of the viewfinder, the second side being different from the first side of the boundary of the viewfinder.

According to some embodiments, a computer system is described. The computer system includes: one or more processors, wherein the computer system is configured to communicate with the display generation component, the one or more input devices, and the one or more cameras; and a memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: displaying, via the display generating means, a user interface comprising: a representation of a physical environment, wherein a first portion of the representation of the physical environment is within a capture area of the one or more cameras and a second portion of the representation of the physical environment is outside the capture area of the one or more cameras; and a viewfinder, wherein the viewfinder includes a boundary; while displaying the user interface, detecting a first request to capture media via the one or more input devices; and in response to detecting the first request to capture media: capturing, using the one or more cameras, a first media item comprising at least a first portion of a representation of a physical environment; and changing an appearance of the viewfinder, wherein changing the appearance of the viewfinder comprises: changing the appearance of the first content portion that is within a threshold distance of a first side of the boundary of the viewfinder; and changing the appearance of a second content portion that is within the threshold distance of a second side of the boundary of the viewfinder, the second side being different from the first side of the boundary of the viewfinder.

According to some embodiments, a computer system is described. The computer system is configured to communicate with the display generation component, one or more input devices, and one or more cameras, and the computer system comprises: means for displaying a user interface via the display generating means, the user interface comprising: a representation of a physical environment, wherein a first portion of the representation of the physical environment is within a capture area of the one or more cameras and a second portion of the representation of the physical environment is outside the capture area of the one or more cameras; and a viewfinder, wherein the viewfinder includes a boundary; means for detecting a first request to capture media via the one or more input devices while the user interface is displayed; and means for, in response to detecting the first request to capture media: capturing, using the one or more cameras, a first media item comprising at least a first portion of a representation of a physical environment; and changing an appearance of the viewfinder, wherein changing the appearance of the viewfinder comprises: changing the appearance of the first content portion that is within a threshold distance of a first side of the boundary of the viewfinder; and changing the appearance of a second content portion that is within the threshold distance of a second side of the boundary of the viewfinder, the second side being different from the first side of the boundary of the viewfinder.

According to some embodiments, a computer program product is described. The computer program product includes one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs including instructions for: displaying, via the display generating means, a user interface comprising: a representation of a physical environment, wherein a first portion of the representation of the physical environment is within a capture area of the one or more cameras and a second portion of the representation of the physical environment is outside the capture area of the one or more cameras; and a viewfinder, wherein the viewfinder includes a boundary; while displaying the user interface, detecting a first request to capture media via the one or more input devices; and in response to detecting the first request to capture media: capturing, using the one or more cameras, a first media item comprising at least a first portion of a representation of a physical environment; and changing an appearance of the viewfinder, wherein changing the appearance of the viewfinder comprises: changing the appearance of the first content portion that is within a threshold distance of a first side of the boundary of the viewfinder; and changing the appearance of a second content portion that is within the threshold distance of a second side of the boundary of the viewfinder, the second side being different from the first side of the boundary of the viewfinder.

It is noted that the various embodiments described above may be combined with any of the other embodiments described herein. The features and advantages described in this specification are not all-inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.

Drawings

For a better understanding of the various described embodiments, reference should be made to the following detailed description taken in conjunction with the following drawings, in which like reference numerals designate corresponding parts throughout the several views.

FIG. 1 is a block diagram illustrating an operating environment for a computer system for providing an XR experience, according to some embodiments.

FIG. 2 is a block diagram illustrating a controller of a computer system configured to manage and coordinate a user's XR experience, according to some embodiments.

FIG. 3 is a block diagram illustrating a display generation component of a computer system configured to provide a visual component of an XR experience to a user, according to some embodiments.

FIG. 4 is a block diagram illustrating a hand tracking unit of a computer system configured to capture gesture inputs of a user, according to some embodiments.

Fig. 5 is a block diagram illustrating an eye tracking unit of a computer system configured to capture gaze input of a user, according to some embodiments.

Fig. 6 is a flow diagram illustrating a flash-assisted gaze tracking pipeline in accordance with some embodiments.

Fig. 7A-7Q illustrate exemplary techniques for capturing and/or displaying media in some environments according to some embodiments.

Fig. 8 is a flow chart of a method of capturing media according to some embodiments.

FIG. 9 is a flow chart of a method of displaying a media preview according to some embodiments.

Fig. 10 is a flow chart of a method for displaying previously captured media, according to some embodiments.

11A-11D illustrate exemplary techniques for displaying a representation of a physical environment having a recording indicator, according to some embodiments.

Fig. 12 is a flow chart of a method for displaying a representation of a physical environment having a recording indicator, according to some embodiments.

Fig. 13A-13J illustrate exemplary techniques for displaying a camera user interface.

Fig. 14 is a flow chart of a method for displaying information related to capturing media, according to some embodiments.

Fig. 15A-15B are flowcharts of methods for changing the appearance of a viewfinder according to some embodiments.

Detailed Description

According to some embodiments, the present disclosure relates to a user interface for providing an augmented reality (XR) experience to a user.

Fig. 1-6 provide a description of an exemplary computer system for providing an XR experience to a user. Fig. 7A-7Q illustrate exemplary techniques for capturing and/or displaying media in various environments according to some embodiments. FIG. 8 is a flow chart of a method of capturing and viewing media according to various embodiments. FIG. 9 is a flow chart of a method of displaying a media preview according to various embodiments. Fig. 10 is a flow chart of a method of displaying previously captured media, according to various embodiments. The user interfaces in fig. 7A to 7Q illustrate the processes in fig. 8, 9 and 10. 11A-11D illustrate exemplary techniques for displaying a representation of a physical environment having a recording indicator, according to some embodiments. Fig. 12 is a flow chart of a method of displaying a representation of a physical environment having a recording indicator, according to some embodiments. The user interfaces of fig. 11A to 11D illustrate the process in fig. 12. Fig. 13A-13J illustrate exemplary techniques for displaying a camera user interface according to some embodiments. Fig. 14 is a flow chart of a method for displaying information related to capturing media, according to some embodiments. Fig. 15A-15B are flowcharts of methods for changing the appearance of a viewfinder according to some embodiments. The user interfaces in fig. 13A to 13J illustrate the processes in fig. 14, 15A, and 15B.

The processes described below enhance operability of a device and make a user-device interface more efficient (e.g., by helping a user provide appropriate input and reducing user errors in operating/interacting with the device) through various techniques, including by providing improved visual feedback to the user, reducing the number of inputs required to perform an operation, providing additional control options without cluttering the user interface with additional display controls, performing an operation when a set of conditions has been met without further user input, improving privacy and/or security, providing a richer, more detailed and/or more realistic user experience while conserving storage space, and/or additional techniques. These techniques also reduce power usage and extend battery life of the device by enabling a user to use the device faster and more efficiently. Saving battery power and thus weight, improves the ergonomics of the device. These techniques also enable real-time communication, allow fewer and/or less accurate sensors to be used, resulting in a more compact, lighter, and cheaper device, and enable the device to be used under a variety of lighting conditions. These techniques reduce energy usage, and thus heat emitted by the device, which is particularly important for wearable devices, where wearing the device can become uncomfortable for the user if the device generates too much heat completely within the operating parameters of the device components.

Furthermore, in a method described herein in which one or more steps are dependent on one or more conditions having been met, it should be understood that the method may be repeated in multiple iterations such that during the iteration, all conditions that determine steps in the method have been met in different iterations of the method. For example, if a method requires performing a first step (if a condition is met) and performing a second step (if a condition is not met), one of ordinary skill will know that the stated steps are repeated until both the condition and the condition are not met (not sequentially). Thus, a method described as having one or more steps depending on one or more conditions having been met may be rewritten as a method that repeats until each of the conditions described in the method have been met. However, this does not require the system or computer-readable medium to claim that the system or computer-readable medium contains instructions for performing the contingent operation based on the satisfaction of the corresponding condition or conditions, and thus is able to determine whether the contingent situation has been met without explicitly repeating the steps of the method until all conditions to decide on steps in the method have been met. It will also be appreciated by those of ordinary skill in the art that, similar to a method with optional steps, a system or computer readable storage medium may repeat the steps of the method as many times as necessary to ensure that all optional steps have been performed.

In some embodiments, as shown in fig. 1, an XR experience is provided to a user via an operating environment 100 comprising a computer system 101. The computer system 101 includes a controller 110 (e.g., a processor or remote server of a portable electronic device), a display generation component 120 (e.g., a Head Mounted Device (HMD), a display, a projector, a touch screen, etc.), one or more input devices 125 (e.g., an eye tracking device 130, a hand tracking device 140, other input devices 150), one or more output devices 155 (e.g., a speaker 160, a haptic output generator 170, and other output devices 180), one or more sensors 190 (e.g., an image sensor, a light sensor, a depth sensor, a haptic sensor, an orientation sensor, a proximity sensor, a temperature sensor, a position sensor, a motion sensor, a speed sensor, etc.), and optionally one or more peripheral devices 195 (e.g., a household appliance, a wearable device, etc.). In some implementations, one or more of the input device 125, the output device 155, the sensor 190, and the peripheral device 195 are integrated with the display generating component 120 (e.g., in a head-mounted device or a handheld device).

In describing an XR experience, various terms are used to refer differently to several related but different environments that a user may sense and/or interact with (e.g., interact with inputs detected by computer system 101 that generated the XR experience, such inputs causing the computer system that generated the XR experience to generate audio, visual, and/or tactile feedback corresponding to various inputs provided to computer system 101). The following are a subset of these terms:

Physical environment: a physical environment refers to a physical world in which people can sense and/or interact without the assistance of an electronic system. Physical environments such as physical parks include physical objects such as physical trees, physical buildings, and physical people. People can directly sense and/or interact with a physical environment, such as by visual, tactile, auditory, gustatory, and olfactory.

And (3) augmented reality: conversely, an augmented reality (XR) environment refers to a fully or partially simulated environment in which people sense and/or interact via an electronic system. In XR, a subset of the physical movements of the person, or a representation thereof, is tracked, and in response, one or more characteristics of one or more virtual objects simulated in the XR environment are adjusted in a manner consistent with at least one physical law. For example, an XR system may detect a person's head rotation and, in response, adjust the graphical content and sound field presented to the person in a manner similar to the manner in which such views and sounds change in a physical environment. In some cases (e.g., for reachability reasons), the adjustment of the characteristics of the virtual object in the XR environment may be made in response to a representation of the physical motion (e.g., a voice command). A person may utilize any of his senses to sense and/or interact with XR objects, including vision, hearing, touch, taste, and smell. For example, a person may sense and/or interact with audio objects that create a 3D or spatial audio environment that provides perception of a point audio source in 3D space. As another example, an audio object may enable audio transparency that selectively introduces environmental sounds from a physical environment with or without computer generated audio. In some XR environments, a person may sense and/or interact with only audio objects.

Examples of XRs include virtual reality and mixed reality.

Virtual reality: a Virtual Reality (VR) environment refers to a simulated environment designed to be based entirely on computer-generated sensory input for one or more senses. The VR environment includes a plurality of virtual objects that a person can sense and/or interact with. For example, computer-generated images of trees, buildings, and avatars representing people are examples of virtual objects. A person may sense and/or interact with virtual objects in the VR environment through a simulation of the presence of the person within the computer-generated environment and/or through a simulation of a subset of the physical movements of the person within the computer-generated environment.

Mixed reality: in contrast to VR environments designed to be based entirely on computer-generated sensory input, a Mixed Reality (MR) environment refers to a simulated environment designed to introduce sensory input from a physical environment or a representation thereof in addition to including computer-generated sensory input (e.g., virtual objects). On a virtual continuum, a mixed reality environment is any condition between, but not including, a full physical environment as one end and a virtual reality environment as the other end. In some MR environments, the computer-generated sensory input may be responsive to changes in sensory input from the physical environment. In addition, some electronic systems for rendering MR environments may track the position and/or orientation relative to the physical environment to enable virtual objects to interact with real objects (i.e., physical objects or representations thereof from the physical environment). For example, the system may cause the motion such that the virtual tree appears to be stationary relative to the physical ground.

Examples of mixed reality include augmented reality and augmented virtualization.

Augmented reality: an Augmented Reality (AR) environment refers to a simulated environment in which one or more virtual objects are superimposed over a physical environment or a representation of a physical environment. For example, an electronic system for presenting an AR environment may have a transparent or translucent display through which a person may directly view the physical environment. The system may be configured to present the virtual object on a transparent or semi-transparent display such that a person perceives the virtual object superimposed over the physical environment with the system. Alternatively, the system may have an opaque display and one or more imaging sensors that capture images or videos of the physical environment, which are representations of the physical environment. The system combines the image or video with the virtual object and presents the composition on an opaque display. A person utilizes the system to indirectly view the physical environment via an image or video of the physical environment and perceive a virtual object superimposed over the physical environment. As used herein, video of a physical environment displayed on an opaque display is referred to as "pass-through video," meaning that the system captures images of the physical environment using one or more image sensors and uses those images when rendering an AR environment on the opaque display. Further alternatively, the system may have a projection system that projects the virtual object into the physical environment, for example as a hologram or on a physical surface, such that a person perceives the virtual object superimposed on top of the physical environment with the system. An augmented reality environment also refers to a simulated environment in which a representation of a physical environment is transformed by computer-generated sensory information. For example, in providing a passthrough video, the system may transform one or more sensor images to apply a selected viewing angle (e.g., a viewpoint) that is different from the viewing angle captured by the imaging sensor. As another example, the representation of the physical environment may be transformed by graphically modifying (e.g., magnifying) portions thereof such that the modified portions may be representative but not real versions of the original captured image. For another example, the representation of the physical environment may be transformed by graphically eliminating or blurring portions thereof.

Enhanced virtualization: enhanced virtual (AV) environment refers to a simulated environment in which a virtual environment or computer-generated environment incorporates one or more sensory inputs from a physical environment. The sensory input may be a representation of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but the face of a person is realistically reproduced from an image taken of a physical person. As another example, the virtual object may take the shape or color of a physical object imaged by one or more imaging sensors. For another example, the virtual object may employ shadows that conform to the positioning of the sun in the physical environment.

Viewpoint-locked virtual object: when the computer system displays the virtual object at the same location and/or position in the user's viewpoint, the virtual object is viewpoint-locked even if the user's viewpoint is offset (e.g., changed). In embodiments in which the computer system is a head-mounted device, the user's point of view is locked to the forward direction of the user's head (e.g., when the user looks directly in front, the user's point of view is at least a portion of the user's field of view); thus, the user's point of view remains fixed without moving the user's head, even when the user's gaze is offset. In embodiments in which the computer system has a display generating component (e.g., a display screen) that is repositionable relative to the user's head, the user's point of view is an augmented reality view presented to the user on the display generating component of the computer system. For example, a viewpoint-locked virtual object displayed in the upper left corner of the user's viewpoint continues to be displayed in the upper left corner of the user's viewpoint when the user's viewpoint is in a first orientation (e.g., the user's head faces north), even when the user's viewpoint changes to a second orientation (e.g., the user's head faces west). In other words, the position and/or orientation of the virtual object in which the viewpoint lock is displayed in the viewpoint of the user is independent of the position and/or orientation of the user in the physical environment. In embodiments in which the computer system is a head-mounted device, the user's point of view is locked to the orientation of the user's head, such that the virtual object is also referred to as a "head-locked virtual object.

Environment-locked visual object: when a computer system displays a virtual object at a location and/or position in a user's point of view, the virtual object is environment-locked (alternatively, "world-locked"), the location and/or position being based on (e.g., selected and/or anchored to) a location and/or object in a three-dimensional environment (e.g., a physical environment or virtual environment) with reference to the location and/or object. As the user's point of view moves, the position and/or object in the environment relative to the user's point of view changes, which results in the environment-locked virtual object being displayed at a different position and/or location in the user's point of view. For example, an environmentally locked virtual object that locks onto a tree immediately in front of the user is displayed at the center of the user's viewpoint. When the user's viewpoint is shifted to the right (e.g., the user's head is turned to the right) such that the tree is now to the left of center in the user's viewpoint (e.g., the tree positioning in the user's viewpoint is shifted), the environmentally locked virtual object that is locked onto the tree is displayed to the left of center in the user's viewpoint. In other words, the position and/or orientation at which the environment-locked virtual object is displayed in the user's viewpoint depends on the position and/or orientation of the object and/or the position at which the virtual object is locked in the environment. In some embodiments, the computer system uses a stationary frame of reference (e.g., a coordinate system anchored to a fixed location and/or object in the physical environment) in order to determine the location of the virtual object that displays the environmental lock in the viewpoint of the user. The environment-locked virtual object may be locked to a stationary portion of the environment (e.g., a floor, wall, table, or other stationary object), or may be locked to a movable portion of the environment (e.g., a representation of a vehicle, animal, person, or even a portion of a user's body such as a user's hand, wrist, arm, or foot that moves independent of the user's point of view) such that the virtual object moves as the point of view or the portion of the environment moves to maintain a fixed relationship between the virtual object and the portion of the environment.

In some implementations, the environmentally or view-locked virtual object exhibits an inert follow-up behavior that reduces or delays movement of the environmentally or view-locked virtual object relative to movement of a reference point that the virtual object follows. In some embodiments, the computer system intentionally delays movement of the virtual object when detecting movement of a reference point (e.g., a portion of the environment, a viewpoint, or a point fixed relative to the viewpoint, such as a point between 5cm and 300cm from the viewpoint) that the virtual object is following while exhibiting inert follow-up behavior. For example, when a reference point (e.g., a portion or viewpoint of an environment) moves at a first speed, the virtual object is moved by the device to remain locked to the reference point, but moves at a second speed that is slower than the first speed (e.g., until the reference point stops moving or slows down, at which time the virtual object begins to catch up with the reference point). In some embodiments, when the virtual object exhibits inert follow-up behavior, the device ignores small movements of the reference point (e.g., ignores movements of the reference point below a threshold amount of movement, such as movements of 0 to 5 degrees or movements of 0 to 50 cm). For example, when a reference point (e.g., a portion or viewpoint of an environment to which a virtual object is locked) moves a first amount, the distance between the reference point and the virtual object increases (e.g., because the virtual object is being displayed so as to maintain a fixed or substantially fixed position relative to a different viewpoint or portion of the environment than the reference point to which the virtual object is locked), and when the reference point (e.g., a portion or viewpoint of the environment to which the virtual object is locked) moves a second amount greater than the first amount, the distance between the reference point and the virtual object increases (e.g., because the virtual object is being displayed so as to maintain a fixed or substantially fixed position relative to a different viewpoint or portion of the environment than the reference point to which the virtual object is locked) then decreases as the amount of movement of the reference point increases above a threshold (e.g., an "inertia following" threshold) because the virtual object is moved by the computer system so as to maintain a fixed or substantially fixed position relative to the reference point. In some embodiments, maintaining a substantially fixed position of the virtual object relative to the reference point includes the virtual object being displayed within a threshold distance (e.g., 1cm, 2cm, 3cm, 5cm, 15cm, 20cm, 50 cm) of the reference point in one or more dimensions (e.g., up/down, left/right, and/or forward/backward relative to the position of the reference point).

Hardware: there are many different types of electronic systems that enable a person to sense and/or interact with various XR environments. Examples include head-mounted systems, projection-based systems, head-up displays (HUDs), vehicle windshields integrated with display capabilities, windows integrated with display capabilities, displays formed as lenses designed for placement on a person's eyes (e.g., similar to contact lenses), headphones/earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smartphones, tablets, and desktop/laptop computers. The head-mounted system may include speakers and/or other audio output devices integrated into the head-mounted system for providing audio output. the head-mounted system may have one or more speakers and an integrated opaque display. Alternatively, the head-mounted system may be configured to accept an external opaque display (e.g., a smart phone). The head-mounted system may incorporate one or more imaging sensors for capturing images or video of the physical environment and/or one or more microphones for capturing audio of the physical environment. The head-mounted system may have a transparent or translucent display instead of an opaque display. The transparent or translucent display may have a medium through which light representing an image is directed to the eyes of a person. The display may utilize digital light projection, OLED, LED, uLED, liquid crystal on silicon, laser scanning light sources, or any combination of these techniques. The medium may be an optical waveguide, a holographic medium, an optical combiner, an optical reflector, or any combination thereof. In one embodiment, the transparent or translucent display may be configured to selectively become opaque. Projection-based systems may employ retinal projection techniques that project a graphical image onto a person's retina. The projection system may also be configured to project the virtual object into the physical environment, for example as a hologram or on a physical surface. In some embodiments, the controller 110 is configured to manage and coordinate the XR experience of the user. In some embodiments, controller 110 includes suitable combinations of software, firmware, and/or hardware. the controller 110 is described in more detail below with respect to fig. 2. In some implementations, the controller 110 is a computing device that is in a local or remote location relative to the scene 105 (e.g., physical environment). For example, the controller 110 is a local server located within the scene 105. As another example, the controller 110 is a remote server (e.g., cloud server, central server, etc.) located outside of the scene 105. In some implementations, the controller 110 is communicatively coupled with the display generation component 120 (e.g., HMD, display, projector, touch-screen, etc.) via one or more wired or wireless communication channels 144 (e.g., bluetooth, IEEE 802.11x, IEEE 802.16x, IEEE 802.3x, etc.). In another example, the controller 110 is included within a housing (e.g., a physical enclosure) of the display generation component 120 (e.g., an HMD or portable electronic device including a display and one or more processors, etc.), one or more of the input devices 125, one or more of the output devices 155, one or more of the sensors 190, and/or one or more of the peripheral devices 195, or shares the same physical housing or support structure with one or more of the above.

In some embodiments, display generation component 120 is configured to provide an XR experience (e.g., at least a visual component of the XR experience) to a user. In some embodiments, display generation component 120 includes suitable combinations of software, firmware, and/or hardware. The display generating section 120 is described in more detail below with respect to fig. 3. In some embodiments, the functionality of the controller 110 is provided by and/or combined with the display generating component 120.

According to some embodiments, display generation component 120 provides an XR experience to a user when the user is virtually and/or physically present within scene 105.

In some embodiments, the display generating component is worn on a portion of the user's body (e.g., on his/her head, on his/her hand, etc.). As such, display generation component 120 includes one or more XR displays provided for displaying XR content. For example, in various embodiments, the display generation component 120 encloses a field of view of a user. In some embodiments, display generation component 120 is a handheld device (such as a smart phone or tablet computer) configured to present XR content, and the user holds the device with a display facing the user's field of view and a camera facing scene 105. In some embodiments, the handheld device is optionally placed within a housing that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., tripod) in front of the user. In some embodiments, display generation component 120 is an XR room, housing, or room configured to present XR content, wherein the user does not wear or hold display generation component 120. Many of the user interfaces described with reference to one type of hardware for displaying XR content (e.g., a handheld device or a device on a tripod) may be implemented on another type of hardware for displaying XR content (e.g., an HMD or other wearable computing device). For example, a user interface showing interactions with XR content triggered based on interactions occurring in a space in front of a handheld device or a tripod-mounted device may similarly be implemented with an HMD, where the interactions occur in the space in front of the HMD and responses to the XR content are displayed via the HMD. Similarly, a user interface showing interaction with XR content triggered based on movement of a handheld device or tripod-mounted device relative to a physical environment (e.g., a scene 105 or a portion of a user's body (e.g., a user's eye, head, or hand)) may similarly be implemented with an HMD, where the movement is caused by movement of the HMD relative to the physical environment (e.g., the scene 105 or a portion of the user's body (e.g., a user's eye, head, or hand)).

While relevant features of the operating environment 100 are shown in fig. 1, those of ordinary skill in the art will recognize from this disclosure that various other features are not shown for the sake of brevity and so as not to obscure more relevant aspects of the exemplary embodiments disclosed herein.

Fig. 2 is a block diagram of an example of a controller 110 according to some embodiments. While certain specific features are shown, those of ordinary skill in the art will appreciate from the disclosure that various other features are not shown for the sake of brevity and so as not to obscure more pertinent aspects of the embodiments disclosed herein. To this end, as a non-limiting example, in some embodiments, the controller 110 includes one or more processing units 202 (e.g., microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), graphics Processing Units (GPUs), central Processing Units (CPUs), processing cores, etc.), one or more input/output (I/O) devices 206, one or more communication interfaces 208 (e.g., universal Serial Bus (USB), IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, global system for mobile communications (GSM), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), global Positioning System (GPS), infrared (IR), bluetooth, ZIGBEE, and/or similar types of interfaces), one or more programming (e.g., I/O) interfaces 210, memory 220, and one or more communication buses 204 for interconnecting these components and various other components.

In some embodiments, one or more of the communication buses 204 include circuitry that interconnects and controls communications between system components. In some embodiments, the one or more I/O devices 206 include at least one of a keyboard, a mouse, a touchpad, a joystick, one or more microphones, one or more speakers, one or more image sensors, one or more displays, and the like.

Memory 220 includes high-speed random access memory such as Dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), double data rate random access memory (DDR RAM), or other random access solid state memory devices. In some embodiments, memory 220 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 220 optionally includes one or more storage devices located remotely from the one or more processing units 202. Memory 220 includes a non-transitory computer-readable storage medium. In some embodiments, memory 220 or a non-transitory computer readable storage medium of memory 220 stores the following programs, modules, and data structures, or a subset thereof, including optional operating system 230 and XR experience module 240.

Operating system 230 includes instructions for handling various basic system services and for performing hardware-related tasks. In some embodiments, XR experience module 240 is configured to manage and coordinate single or multiple XR experiences of one or more users (e.g., single XR experiences of one or more users, or multiple XR experiences of a respective group of one or more users). To this end, in various embodiments, the XR experience module 240 includes a data acquisition unit 241, a tracking unit 242, a coordination unit 246, and a data transmission unit 248.

In some embodiments, the data acquisition unit 241 is configured to acquire data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the display generation component 120 of fig. 1, and optionally from one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. To this end, in various embodiments, the data acquisition unit 241 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

In some embodiments, tracking unit 242 is configured to map scene 105 and track at least the location/position of display generation component 120 relative to scene 105 of fig. 1, and optionally the location of one or more of input device 125, output device 155, sensor 190, and/or peripheral device 195. To this end, in various embodiments, the tracking unit 242 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics. In some embodiments, tracking unit 242 includes a hand tracking unit 244 and/or an eye tracking unit 243. In some embodiments, the hand tracking unit 244 is configured to track the location/position of one or more portions of the user's hand, and/or the motion of one or more portions of the user's hand relative to the scene 105 of fig. 1, relative to the display generating component 120, and/or relative to a coordinate system defined relative to the user's hand. The hand tracking unit 244 is described in more detail below with respect to fig. 4. In some embodiments, the eye tracking unit 243 is configured to track the positioning or movement of the user gaze (or more generally, the user's eyes, face, or head) relative to the scene 105 (e.g., relative to the physical environment and/or relative to the user (e.g., the user's hand)) or relative to XR content displayed via the display generating component 120. The eye tracking unit 243 is described in more detail below with respect to fig. 5.

In some embodiments, coordination unit 246 is configured to manage and coordinate XR experiences presented to a user by display generation component 120, and optionally by one or more of output device 155 and/or peripheral device 195. For this purpose, in various embodiments, coordination unit 246 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

In some embodiments, the data transmission unit 248 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the display generation component 120, and optionally to one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. For this purpose, in various embodiments, the data transmission unit 248 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

While the data acquisition unit 241, tracking unit 242 (e.g., including eye tracking unit 243 and hand tracking unit 244), coordination unit 246, and data transmission unit 248 are shown as residing on a single device (e.g., controller 110), it should be understood that in other embodiments, any combination of the data acquisition unit 241, tracking unit 242 (e.g., including eye tracking unit 243 and hand tracking unit 244), coordination unit 246, and data transmission unit 248 may reside in a single computing device.

Furthermore, FIG. 2 is a functional description of various features that may be present in a particular implementation, as opposed to a schematic of the embodiments described herein. As will be appreciated by one of ordinary skill in the art, the individually displayed items may be combined and some items may be separated. For example, some of the functional blocks shown separately in fig. 2 may be implemented in a single block, and the various functions of a single functional block may be implemented by one or more functional blocks in various embodiments. The actual number of modules and the division of particular functions, and how features are allocated among them, will vary depending upon the particular implementation, and in some embodiments, depend in part on the particular combination of hardware, software, and/or firmware selected for a particular implementation.

Fig. 3 is a block diagram of an example of display generation component 120 according to some embodiments. While certain specific features are shown, those of ordinary skill in the art will appreciate from the disclosure that various other features are not shown for the sake of brevity and so as not to obscure more pertinent aspects of the embodiments disclosed herein. For this purpose, as a non-limiting example, in some embodiments, display generation component 120 (e.g., HMD) includes one or more processing units 302 (e.g., microprocessors, ASIC, FPGA, GPU, CPU, processing cores, etc.), one or more input/output (I/O) devices and sensors 306, one or more communication interfaces 308 (e.g., ,USB、FIREWIRE、THUNDERBOLT、IEEE 802.3x、IEEE 802.11x、IEEE 802.16x、GSM、CDMA、TDMA、GPS、IR、BLUETOOTH、ZIGBEE and/or similar types of interfaces), one or more programming (e.g., I/O) interfaces 310, one or more XR displays 312, one or more optional internally and/or externally facing image sensors 314, memory 320, and one or more communication buses 304 for interconnecting these components and various other components.

In some embodiments, one or more communication buses 304 include circuitry for interconnecting and controlling communications between various system components. In some embodiments, the one or more I/O devices and sensors 306 include an Inertial Measurement Unit (IMU), an accelerometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.), one or more microphones, one or more speakers, a haptic engine, and/or one or more depth sensors (e.g., structured light, time of flight, etc.), and/or the like.

In some embodiments, one or more XR displays 312 are configured to provide an XR experience to a user. In some embodiments, one or more XR displays 312 correspond to holographic, digital Light Processing (DLP), liquid Crystal Displays (LCD), liquid crystal on silicon (LCoS), organic light emitting field effect transistors (OLET), organic Light Emitting Diodes (OLED), surface conduction electron emission displays (SED), field Emission Displays (FED), quantum dot light emitting diodes (QD-LED), microelectromechanical systems (MEMS), and/or similar display types. In some embodiments, one or more XR displays 312 correspond to diffractive, reflective, polarizing, holographic, etc. waveguide displays. For example, the display generation component 120 (e.g., HMD) includes a single XR display. In another example, display generation component 120 includes an XR display for each eye of the user. In some embodiments, one or more XR displays 312 are capable of presenting MR and VR content. In some implementations, one or more XR displays 312 can present MR or VR content.

In some embodiments, the one or more image sensors 314 are configured to acquire image data corresponding to at least a portion of the user's face including the user's eyes (and may be referred to as an eye tracking camera). In some embodiments, the one or more image sensors 314 are configured to acquire image data corresponding to at least a portion of the user's hand and optionally the user's arm (and may be referred to as a hand tracking camera). In some implementations, the one or more image sensors 314 are configured to face forward in order to acquire image data corresponding to a scene that a user would see in the absence of the display generating component 120 (e.g., HMD) (and may be referred to as a scene camera). The one or more optional image sensors 314 may include one or more RGB cameras (e.g., with Complementary Metal Oxide Semiconductor (CMOS) image sensors or Charge Coupled Device (CCD) image sensors), one or more Infrared (IR) cameras, and/or one or more event-based cameras, etc.

Memory 320 includes high-speed random access memory such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. In some embodiments, memory 320 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 320 optionally includes one or more storage devices located remotely from the one or more processing units 302. Memory 320 includes a non-transitory computer-readable storage medium. In some embodiments, memory 320 or a non-transitory computer readable storage medium of memory 320 stores the following programs, modules, and data structures, or a subset thereof, including optional operating system 330 and XR presentation module 340.

Operating system 330 includes processes for handling various basic system services and for performing hardware-related tasks. In some embodiments, XR presentation module 340 is configured to present XR content to a user via one or more XR displays 312. To this end, in various embodiments, the XR presentation module 340 includes a data acquisition unit 342, an XR presentation unit 344, an XR map generation unit 346, and a data transmission unit 348.

In some embodiments, the data acquisition unit 342 is configured to at least acquire data (e.g., presentation data, interaction data, sensor data, location data, etc.) from the controller 110 of fig. 1. For this purpose, in various embodiments, the data acquisition unit 342 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

In some embodiments, XR presentation unit 344 is configured to present XR content via one or more XR displays 312. For this purpose, in various embodiments, XR presentation unit 344 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

In some embodiments, XR map generation unit 346 is configured to generate an XR map based on the media content data (e.g., a 3D map of a mixed reality scene or a map of a physical environment in which computer-generated objects may be placed to generate an augmented reality). For this purpose, in various embodiments, XR map generation unit 346 includes instructions and/or logic for the instructions as well as heuristics and metadata for the heuristics.

In some embodiments, the data transmission unit 348 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the controller 110, and optionally one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. For this purpose, in various embodiments, the data transmission unit 348 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

Although the data acquisition unit 342, the XR presentation unit 344, the XR map generation unit 346, and the data transmission unit 348 are shown as residing on a single device (e.g., the display generation component 120 of fig. 1), it should be understood that in other embodiments, any combination of the data acquisition unit 342, the XR presentation unit 344, the XR map generation unit 346, and the data transmission unit 348 may be located in separate computing devices.

Furthermore, fig. 3 is used more as a functional description of various features that may be present in a particular embodiment, as opposed to a schematic of the embodiments described herein. As will be appreciated by one of ordinary skill in the art, the individually displayed items may be combined and some items may be separated. For example, some of the functional blocks shown separately in fig. 3 may be implemented in a single block, and the various functions of a single functional block may be implemented by one or more functional blocks in various embodiments. The actual number of modules and the division of particular functions, and how features are allocated among them, will vary depending upon the particular implementation, and in some embodiments, depend in part on the particular combination of hardware, software, and/or firmware selected for a particular implementation.

Fig. 4 is a schematic illustration of an exemplary embodiment of a hand tracking device 140. In some embodiments, the hand tracking device 140 (fig. 1) is controlled by the hand tracking unit 244 (fig. 2) to track the position/location of one or more portions of the user's hand, and/or the movement of one or more portions of the user's hand relative to the scene 105 of fig. 1 (e.g., relative to a portion of the physical environment surrounding the user, relative to the display generating component 120, or relative to a portion of the user (e.g., the user's face, eyes, or head), and/or relative to a coordinate system defined relative to the user's hand). In some implementations, the hand tracking device 140 is part of the display generation component 120 (e.g., embedded in or attached to a head-mounted device). In some embodiments, the hand tracking device 140 is separate from the display generation component 120 (e.g., in a separate housing or attached to a separate physical support structure).

In some implementations, the hand tracking device 140 includes an image sensor 404 (e.g., one or more IR cameras, 3D cameras, depth cameras, and/or color cameras, etc.) that captures three-dimensional scene information including at least a human user's hand 406. The image sensor 404 captures the hand image with sufficient resolution to enable the fingers and their respective locations to be distinguished. The image sensor 404 typically captures images of other parts of the user's body, and possibly also all parts of the body, and may have a zoom capability or a dedicated sensor with increased magnification to capture images of the hand with a desired resolution. In some implementations, the image sensor 404 also captures 2D color video images of the hand 406 and other elements of the scene. In some implementations, the image sensor 404 is used in conjunction with other image sensors to capture the physical environment of the scene 105, or as an image sensor that captures the physical environment of the scene 105. In some embodiments, the image sensor 404, or a portion thereof, is positioned relative to the user or the user's environment in a manner that uses the field of view of the image sensor to define an interaction space in which hand movements captured by the image sensor are considered input to the controller 110.

In some embodiments, the image sensor 404 outputs a sequence of frames containing 3D mapping data (and, in addition, possible color image data) to the controller 110, which extracts high-level information from the mapping data. This high-level information is typically provided via an Application Program Interface (API) to an application program running on the controller, which drives the display generating component 120 accordingly. For example, a user may interact with software running on the controller 110 by moving his hand 406 and changing his hand pose.

In some implementations, the image sensor 404 projects a speckle pattern onto a scene that includes the hand 406 and captures an image of the projected pattern. In some implementations, the controller 110 calculates 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation based on lateral offsets of the blobs in the pattern. This approach is advantageous because it does not require the user to hold or wear any kind of beacon, sensor or other marker. The method gives the depth coordinates of points in the scene relative to a predetermined reference plane at a specific distance from the image sensor 404. In this disclosure, it is assumed that the image sensor 404 defines an orthogonal set of x-axis, y-axis, z-axis such that the depth coordinates of points in the scene correspond to the z-component measured by the image sensor. Alternatively, the image sensor 404 (e.g., a hand tracking device) may use other 3D mapping methods, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors.

In some implementations, the hand tracking device 140 captures and processes a time series containing a depth map of the user's hand as the user moves his hand (e.g., the entire hand or one or more fingers). Software running on the image sensor 404 and/or a processor in the controller 110 processes the 3D mapping data to extract image block descriptors of the hand in these depth maps. The software may match these descriptors with image block descriptors stored in database 408 based on previous learning processes in order to estimate the pose of the hand in each frame. The pose typically includes the 3D position of the user's hand joints and finger tips.

The software may also analyze the trajectory of the hand and/or finger over multiple frames in the sequence to identify gestures. The pose estimation functions described herein may alternate with motion tracking functions such that image block-based pose estimation is performed only once every two (or more) frames while tracking changes used to find poses that occur on the remaining frames. Pose, motion, and gesture information are provided to an application running on the controller 110 via the APIs described above. The program may move and modify images presented on the display generation component 120, for example, in response to pose and/or gesture information, or perform other functions.

In some implementations, the gesture includes an air gesture. An air gesture is a motion of a portion of a user's body (e.g., a head, one or more arms, one or more hands, one or more fingers, and/or one or more legs) through the air that is detected without the user touching an input element (or being independent of an input element that is part of a device) that is part of a device (e.g., computer system 101, one or more input devices 125, and/or hand tracking device 140) (including a motion of the user's body relative to an absolute reference (e.g., angle of the user's arm relative to the ground or distance of the user's hand relative to the ground), movement relative to another portion of the user's body (e.g., movement of the user's hand relative to the user's shoulder, movement of one hand of the user relative to the other hand of the user, and/or movement of the user's finger relative to the other finger or portion of the hand of the user), and/or absolute movement of a portion of the user's body (e.g., a flick gesture comprising a predetermined amount and/or speed of movement of the hand in a predetermined gesture, or a shake gesture comprising a predetermined speed or amount of rotation of a portion of the user's body)).

In some embodiments, according to some embodiments, the input gestures used in the various examples and embodiments described herein include air gestures performed by movement of a user's finger relative to other fingers (or portions of the user's hand) for interacting with an XR environment (e.g., a virtual or mixed reality environment). In some embodiments, the air gesture is a gesture that is detected without the user touching an input element that is part of the device (or independent of an input element that is part of the device) and based on a detected movement of a portion of the user's body through the air, including a movement of the user's body relative to an absolute reference (e.g., an angle of the user's arm relative to the ground or a distance of the user's hand relative to the ground), a movement relative to another portion of the user's body (e.g., a movement of the user's hand relative to the user's shoulder, a movement of the user's hand relative to the other hand of the user, and/or a movement of the user's finger relative to the other finger or part of the hand of the user), and/or an absolute movement of a portion of the user's body (e.g., a flick gesture that includes a predetermined amount and/or speed of movement of the hand in a predetermined gesture that includes a predetermined gesture of the hand, or a shake gesture that includes a predetermined speed or amount of rotation of a portion of the user's body).

In some embodiments where the input gesture is an air gesture (e.g., in the absence of physical contact with the input device, the input device provides information to the computer system as to which user interface element is the target of the user input, such as contact with a user interface element displayed on a touch screen, or contact with a mouse or touchpad to move a cursor to the user interface element), the gesture takes into account the user's attention (e.g., gaze) to determine the target of the user input (e.g., for direct input, as described below). Thus, in embodiments involving air gestures, for example, an input gesture in combination (e.g., simultaneously) with movement of a user's finger and/or hand detects an attention (e.g., gaze) toward a user interface element to perform pinch and/or tap inputs, as described below.

In some implementations, an input gesture directed to a user interface object is performed with direct or indirect reference to the user interface object. For example, user input is performed directly on a user interface object according to performing input with a user's hand at a location corresponding to the location of the user interface object in a three-dimensional environment (e.g., as determined based on the user's current viewpoint). In some implementations, upon detecting a user's attention (e.g., gaze) to a user interface object, an input gesture is performed indirectly on the user interface object in accordance with a position of a user's hand not being at the position corresponding to the position of the user interface object in the three-dimensional environment while the user is performing the input gesture. For example, for a direct input gesture, the user can direct the user's input to the user interface object by initiating the gesture at or near a location corresponding to the display location of the user interface object (e.g., within 0.5cm, 1cm, 5cm, or within a distance between 0 and 5cm measured from the outer edge of the option or the center portion of the option). For indirect input gestures, a user can direct the user's input to a user interface object by focusing on the user interface object (e.g., by looking at the user interface object), and while focusing on an option, the user initiates an input gesture (e.g., at any location detectable by the computer system) (e.g., at a location that does not correspond to the display location of the user interface object).

In some embodiments, according to some embodiments, the input gestures (e.g., air gestures) used in the various examples and embodiments described herein include pinch inputs and tap inputs for interacting with a virtual or mixed reality environment. For example, pinch and tap inputs described below are performed as air gestures.

In some implementations, the pinch input is part of an air gesture that includes one or more of: pinch gestures, long pinch gestures, pinch and drag gestures, or double pinch gestures. For example, pinch gestures as air gestures include movements of two or more fingers of a hand to contact each other, i.e., optionally, immediately followed by interruption of contact with each other (e.g., within 0to 1 second). A long pinch gesture, which is an air gesture, includes movement of two or more fingers of a hand into contact with each other for at least a threshold amount of time (e.g., at least 1 second) before a break in contact with each other is detected. For example, a long pinch gesture includes a user holding a pinch gesture (e.g., where two or more fingers make contact), and the long pinch gesture continues until a break in contact between the two or more fingers is detected. In some implementations, the double pinch gesture as an air gesture includes two (e.g., or more) pinch inputs (e.g., performed by the same hand) that are detected in succession with each other immediately (e.g., within a predefined period of time). For example, the user performs a first pinch input (e.g., a pinch input or a long pinch input), releases the first pinch input (e.g., breaks contact between two or more fingers), and performs a second pinch input within a predefined period of time (e.g., within 1 second or within 2 seconds) after releasing the first pinch input.

In some implementations, the pinch-and-drag gesture as an air gesture includes a pinch gesture (e.g., a pinch gesture or a long pinch gesture) that is performed in conjunction with (e.g., follows) a drag input that changes a position of a user's hand from a first position (e.g., a start position of the drag) to a second position (e.g., an end position of the drag). In some implementations, the user holds the pinch gesture while the drag input is performed, and releases the pinch gesture (e.g., opens their two or more fingers) to end the drag gesture (e.g., at the second location). In some implementations, pinch input and drag input are performed by the same hand (e.g., a user pinch two or more fingers to contact each other and move the same hand to a second position in the air with a drag gesture). In some implementations, pinch input is performed by a first hand of the user and drag input is performed by a second hand of the user (e.g., the second hand of the user moves in the air from a first position to a second position as the user continues pinch input with the first hand of the user). In some implementations, the input gesture as an air gesture includes an input (e.g., pinch and/or tap input) performed using both hands of the user. For example, an input gesture includes two (e.g., or more) pinch inputs performed in conjunction with each other (e.g., simultaneously or within a predefined period of time). For example, a first pinch gesture (e.g., pinch input, long pinch input, or pinch-and-drag input) is performed using a first hand of a user, and a second pinch input is performed using the other hand (e.g., a second hand of the two hands of the user) in combination with the pinch input performed using the first hand. In some embodiments, movement between the user's two hands (e.g., increasing and/or decreasing the distance or relative orientation between the user's two hands).

In some implementations, the tap input (e.g., pointing to the user interface element) performed as an air gesture includes movement of a user's finger toward the user interface element, movement of a user's hand toward the user interface element (optionally, the user's finger extends toward the user interface element), downward movement of the user's finger (e.g., mimicking a mouse click motion or a tap on a touch screen), or other predefined movement of the user's hand. In some embodiments, a flick input performed as an air gesture is detected based on a movement characteristic of a finger or hand performing a flick gesture movement of the finger or hand away from a user's point of view and/or toward an object that is a target of the flick input, followed by an end of the movement. In some embodiments, the end of movement is detected based on a change in movement characteristics of the finger or hand performing the flick gesture (e.g., the end of movement away from the user's point of view and/or toward an object that is the target of the flick input, the reversal of the direction of movement of the finger or hand, and/or the reversal of the acceleration direction of movement of the finger or hand).

In some embodiments, the determination that the user's attention is directed to a portion of the three-dimensional environment is based on detection of gaze directed to that portion (optionally, without other conditions). In some embodiments, the portion of the three-dimensional environment to which the user's attention is directed is determined based on detecting a gaze directed to the portion of the three-dimensional environment with one or more additional conditions, such as requiring the gaze to be directed to the portion of the three-dimensional environment for at least a threshold duration (e.g., dwell duration) and/or requiring the gaze to be directed to the portion of the three-dimensional environment when the point of view of the user is within a distance threshold from the portion of the three-dimensional environment, such that the device determines the portion of the three-dimensional environment to which the user's attention is directed, wherein if one of the additional conditions is not met, the device determines that the attention is not directed to the portion of the three-dimensional environment to which the gaze is directed (e.g., until the one or more additional conditions are met).

In some embodiments, detection of the ready state configuration of the user or a portion of the user is detected by the computer system. Detection of a ready state configuration of a hand is used by a computer system as an indication that a user may be ready to interact with the computer system using one or more air gesture inputs (e.g., pinch, tap, pinch and drag, double pinch, long pinch, or other air gestures described herein) performed by the hand. For example, the ready state of the hand is determined based on whether the hand has a predetermined hand shape (e.g., a pre-pinch shape in which the thumb and one or more fingers extend and are spaced apart in preparation for making a pinch or grasp gesture, or a pre-flick in which the one or more fingers extend and the palm faces away from the user), based on whether the hand is in a predetermined position relative to the user's point of view (e.g., below the user's head and above the user's waist and extending at least 15cm, 20cm, 25cm, 30cm, or 50cm from the body), and/or based on whether the hand has moved in a particular manner (e.g., toward an area above the user's waist and in front of the user's head or away from the user's body or legs). In some implementations, the ready state is used to determine whether an interactive element of the user interface is responsive to an attention (e.g., gaze) input.

In some embodiments, the software may be downloaded to the controller 110 in electronic form, over a network, for example, or may alternatively be provided on tangible non-transitory media, such as optical, magnetic, or electronic memory media. In some embodiments, database 408 is also stored in a memory associated with controller 110. Alternatively or in addition, some or all of the described functions of the computer may be implemented in dedicated hardware, such as a custom or semi-custom integrated circuit or a programmable Digital Signal Processor (DSP). Although the controller 110 is shown in fig. 4, for example, as a separate unit from the image sensor 404, some or all of the processing functions of the controller may be performed by a suitable microprocessor and software or by dedicated circuitry within the housing of the image sensor 404 (e.g., a hand tracking device) or other devices associated with the image sensor 404. In some embodiments, at least some of these processing functions may be performed by a suitable processor integrated with display generation component 120 (e.g., in a television receiver, handheld device, or head mounted device) or with any other suitable computerized device (such as a game console or media player). The sensing functionality of the image sensor 404 may likewise be integrated into a computer or other computerized device to be controlled by the sensor output.

Fig. 4 also includes a schematic diagram of a depth map 410 captured by the image sensor 404, according to some embodiments. As described above, the depth map comprises a matrix of pixels having corresponding depth values. Pixels 412 corresponding to the hand 406 have been segmented from the background and wrist in the map. The brightness of each pixel within the depth map 410 is inversely proportional to its depth value (i.e., the measured z-distance from the image sensor 404), where the gray shade becomes darker with increasing depth. The controller 110 processes these depth values to identify and segment components of the image (i.e., a set of adjacent pixels) that have human hand features. These features may include, for example, overall size, shape, and frame-to-frame motion from a sequence of depth maps.

Fig. 4 also schematically illustrates the hand bones 414 that the controller 110 eventually extracts from the depth map 410 of the hand 406, according to some embodiments. In fig. 4, the hand skeleton 414 is superimposed over the hand background 416 that has been segmented from the original depth map. In some embodiments, key feature points of the hand and optionally on the wrist or arm connected to the hand (e.g., points corresponding to knuckles, finger tips, palm centers, ends of the hand connected to the wrist, etc.) are identified and located on the hand bones 414. In some embodiments, the controller 110 uses the positions and movements of these key feature points on the plurality of image frames to determine a gesture performed by the hand or a current state of the hand according to some embodiments.

Fig. 5 illustrates an exemplary embodiment of the eye tracking device 130 (fig. 1). In some embodiments, eye tracking device 130 is controlled by eye tracking unit 243 (fig. 2) to track the positioning and movement of the user gaze relative to scene 105 or relative to XR content displayed via display generation component 120. In some embodiments, the eye tracking device 130 is integrated with the display generation component 120. For example, in some embodiments, when display generating component 120 is a head-mounted device (such as a headset, helmet, goggles, or glasses) or a handheld device placed in a wearable frame, the head-mounted device includes both components that generate XR content for viewing by a user and components for tracking the user's gaze with respect to the XR content. In some embodiments, the eye tracking device 130 is separate from the display generation component 120. For example, when the display generating component is a handheld device or an XR chamber, the eye tracking device 130 is optionally a device separate from the handheld device or XR chamber. In some embodiments, the eye tracking device 130 is a head mounted device or a portion of a head mounted device. In some embodiments, the head-mounted eye tracking device 130 is optionally used in combination with a display generating component that is also head-mounted or a display generating component that is not head-mounted. In some embodiments, the eye tracking device 130 is not a head mounted device and is optionally used in conjunction with a head mounted display generating component. In some embodiments, the eye tracking device 130 is not a head mounted device and optionally is part of a non-head mounted display generating component.

In some embodiments, the display generation component 120 uses a display mechanism (e.g., a left near-eye display panel and a right near-eye display panel) to display frames including left and right images in front of the user's eyes, thereby providing a 3D virtual view to the user. For example, the head mounted display generating component may include left and right optical lenses (referred to herein as eye lenses) located between the display and the user's eyes. In some embodiments, the display generation component may include or be coupled to one or more external cameras that capture video of the user's environment for display. In some embodiments, the head mounted display generating component may have a transparent or translucent display and the virtual object is displayed on the transparent or translucent display through which the user may directly view the physical environment. In some embodiments, the display generation component projects the virtual object into the physical environment. The virtual object may be projected, for example, on a physical surface or as a hologram, such that an individual uses the system to observe the virtual object superimposed over the physical environment. In this case, separate display panels and image frames for the left and right eyes may not be required.

As shown in fig. 5, in some embodiments, the eye tracking device 130 (e.g., a gaze tracking device) includes at least one eye tracking camera (e.g., an Infrared (IR) or Near Infrared (NIR) camera) and an illumination source (e.g., an IR or NIR light source, such as an array or ring of LEDs) that emits light (e.g., IR or NIR light) toward the user's eye. The eye-tracking camera may be directed toward the user's eye to receive IR or NIR light reflected directly from the eye by the light source, or alternatively may be directed toward "hot" mirrors located between the user's eye and the display panel that reflect IR or NIR light from the eye to the eye-tracking camera while allowing visible light to pass through. The eye tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyzes the images to generate gaze tracking information, and communicates the gaze tracking information to the controller 110. In some embodiments, both eyes of the user are tracked separately by the respective eye tracking camera and illumination source. In some embodiments, only one eye of the user is tracked by the respective eye tracking camera and illumination source.

In some embodiments, the eye tracking device 130 is calibrated using a device-specific calibration process to determine parameters of the eye tracking device for the particular operating environment 100, such as 3D geometry and parameters of LEDs, cameras, hot mirrors (if present), eye lenses, and display screens. The device-specific calibration procedure may be performed at the factory or another facility prior to delivering the AR/VR equipment to the end user. The device-specific calibration process may be an automatic calibration process or a manual calibration process. According to some embodiments, the user-specific calibration process may include an estimation of eye parameters of a specific user, such as pupil position, foveal position, optical axis, visual axis, eye distance, etc. According to some embodiments, once the device-specific parameters and the user-specific parameters are determined for the eye-tracking device 130, the images captured by the eye-tracking camera may be processed using a flash-assist method to determine the current visual axis and gaze point of the user relative to the display.

As shown in fig. 5, the eye tracking device 130 (e.g., 130A or 130B) includes an eye lens 520 and a gaze tracking system including at least one eye tracking camera 540 (e.g., an Infrared (IR) or Near Infrared (NIR) camera) positioned on a side of the user's face on which eye tracking is performed, and an illumination source 530 (e.g., an IR or NIR light source such as an array or ring of NIR Light Emitting Diodes (LEDs)) that emits light (e.g., IR or NIR light) toward the user's eyes 592. The eye-tracking camera 540 may be directed toward a mirror 550 (which reflects IR or NIR light from the eye 592 while allowing visible light to pass) located between the user's eye 592 and the display 510 (e.g., left or right display panel of a head-mounted display, or display of a handheld device, projector, etc.) (e.g., as shown in the top portion of fig. 5), or alternatively may be directed toward the user's eye 592 to receive reflected IR or NIR light from the eye 592 (e.g., as shown in the bottom portion of fig. 5).

In some implementations, the controller 110 renders AR or VR frames 562 (e.g., left and right frames for left and right display panels) and provides the frames 562 to the display 510. The controller 110 uses the gaze tracking input 542 from the eye tracking camera 540 for various purposes, such as for processing the frames 562 for display. The controller 110 optionally estimates the gaze point of the user on the display 510 based on gaze tracking input 542 acquired from the eye tracking camera 540 using a flash assist method or other suitable method. The gaze point estimated from the gaze tracking input 542 is optionally used to determine the direction in which the user is currently looking.

Several possible use cases of the current gaze direction of the user are described below and are not intended to be limiting. As an exemplary use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content in a foveal region determined according to a current gaze direction of the user at a higher resolution than in a peripheral region. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another exemplary use case in an AR application, the controller 110 may direct an external camera used to capture the physical environment of the XR experience to focus in the determined direction. The autofocus mechanism of the external camera may then focus on an object or surface in the environment that the user is currently looking at on display 510. As another example use case, the eye lens 520 may be a focusable lens, and the controller uses the gaze tracking information to adjust the focus of the eye lens 520 such that the virtual object the user is currently looking at has the appropriate vergence to match the convergence of the user's eyes 592. The controller 110 may utilize the gaze tracking information to direct the eye lens 520 to adjust the focus such that the approaching object the user is looking at appears at the correct distance.

In some embodiments, the eye tracking device is part of a head mounted device that includes a display (e.g., display 510), two eye lenses (e.g., eye lens 520), an eye tracking camera (e.g., eye tracking camera 540), and a light source (e.g., light source 530 (e.g., IR or NIR LED)) mounted in a wearable housing. The light source emits light (e.g., IR or NIR light) toward the user's eye 592. In some embodiments, the light sources may be arranged in a ring or circle around each of the lenses, as shown in fig. 5. In some embodiments, for example, eight light sources 530 (e.g., LEDs) are arranged around each lens 520. However, more or fewer light sources 530 may be used, and other arrangements and locations of light sources 530 may be used.

In some implementations, the display 510 emits light in the visible range and does not emit light in the IR or NIR range, and thus does not introduce noise in the gaze tracking system. Note that the position and angle of the eye tracking camera 540 is given by way of example and is not intended to be limiting. In some implementations, a single eye tracking camera 540 is located on each side of the user's face. In some implementations, two or more NIR cameras 540 may be used on each side of the user's face. In some implementations, a camera 540 with a wider field of view (FOV) and a camera 540 with a narrower FOV may be used on each side of the user's face. In some implementations, a camera 540 operating at one wavelength (e.g., 850 nm) and a camera 540 operating at a different wavelength (e.g., 940 nm) may be used on each side of the user's face.

The embodiment of the gaze tracking system as shown in fig. 5 may be used, for example, in computer-generated reality, virtual reality, and/or mixed reality applications to provide a user with a computer-generated reality, virtual reality, augmented reality, and/or augmented virtual experience.

Fig. 6 illustrates a flash-assisted gaze tracking pipeline in accordance with some embodiments. In some embodiments, the gaze tracking pipeline is implemented by a glint-assisted gaze tracking system (e.g., eye tracking device 130 as shown in fig. 1 and 5). The flash-assisted gaze tracking system may maintain a tracking state. Initially, the tracking state is off or "no". When in the tracking state, the glint-assisted gaze tracking system uses previous information from a previous frame when analyzing the current frame to track pupil contours and glints in the current frame. When not in the tracking state, the glint-assisted gaze tracking system attempts to detect pupils and glints in the current frame and, if successful, initializes the tracking state to "yes" and continues with the next frame in the tracking state.

As shown in fig. 6, the gaze tracking camera may capture left and right images of the left and right eyes of the user. The captured image is then input to the gaze tracking pipeline for processing beginning at 610. As indicated by the arrow returning to element 600, the gaze tracking system may continue to capture images of the user's eyes, for example, at a rate of 60 to 120 frames per second. In some embodiments, each set of captured images may be input to a pipeline for processing. However, in some embodiments or under some conditions, not all captured frames are pipelined.

At 610, for the currently captured image, if the tracking state is yes, the method proceeds to element 640. At 610, if the tracking state is no, the image is analyzed to detect a user's pupil and glints in the image, as indicated at 620. At 630, if the pupil and glints are successfully detected, the method proceeds to element 640. Otherwise, the method returns to element 610 to process the next image of the user's eye.

At 640, if proceeding from element 610, the current frame is analyzed to track pupils and glints based in part on previous information from the previous frame. At 640, if proceeding from element 630, a tracking state is initialized based on the pupil and flash detected in the current frame. The results of the processing at element 640 are checked to verify that the results of the tracking or detection may be trusted. For example, the results may be checked to determine if the pupil and a sufficient number of flashes for performing gaze estimation are successfully tracked or detected in the current frame. If the result is unlikely to be authentic at 650, then the tracking state is set to no at element 660 and the method returns to element 610 to process the next image of the user's eye. At 650, if the result is trusted, the method proceeds to element 670. At 670, the tracking state is set to YES (if not already YES), and pupil and glint information is passed to element 680 to estimate the gaze point of the user.

Fig. 6 is intended to serve as one example of an eye tracking technique that may be used in a particular implementation. As will be appreciated by one of ordinary skill in the art, other eye tracking techniques, currently existing or developed in the future, may be used in place of or in combination with the glint-assisted eye tracking techniques described herein in computer system 101 for providing an XR experience to a user, according to various embodiments.

In this disclosure, various input methods are described with respect to interactions with a computer system. When one input device or input method is used to provide an example and another input device or input method is used to provide another example, it should be understood that each example may be compatible with and optionally utilize the input device or input method described with respect to the other example. Similarly, various output methods are described with respect to interactions with a computer system. When one output device or output method is used to provide an example and another output device or output method is used to provide another example, it should be understood that each example may be compatible with and optionally utilize the output device or output method described with respect to the other example. Similarly, the various methods are described with respect to interactions with a virtual environment or mixed reality environment through a computer system. When examples are provided using interactions with a virtual environment, and another example is provided using a mixed reality environment, it should be understood that each example may be compatible with and optionally utilize the methods described with respect to the other example. Thus, the present disclosure discloses embodiments that are combinations of features of multiple examples, without the need to list all features of the embodiments in detail in the description of each example embodiment.

User interface and associated process

Attention is now directed to embodiments of a user interface ("UI") and associated processes that may be implemented on a computer system, such as a portable multifunction device or a head-mounted device, in communication with a display generation component and (optionally) one or more cameras and one or more input devices.

Fig. 7A-7Q illustrate exemplary techniques for capturing and/or displaying media in various environments according to some embodiments. Fig. 8 is a flow chart of a method of capturing media according to various embodiments. FIG. 9 is a flow chart of a method of displaying a media preview according to various embodiments. Fig. 10 is a flow chart of a method for displaying previously captured media. The user interfaces in fig. 7A to 7Q are used to illustrate the processes described below, including the processes in fig. 8, 9, and 10.

Fig. 7A-7Q illustrate exemplary techniques for capturing and viewing media according to some embodiments. The schematics and user interfaces in fig. 7A-7Q are used to illustrate the processes described below, including the processes in fig. 8, 9, and 10.

Fig. 7A illustrates a user 712 holding a computer system 700 including a display 702 in a physical environment (e.g., a room in a home). The physical environment includes a sofa 709a, a drawing 709b, a first individual 709c1, a second individual 709c2, a television 709d, and a table 709e. The display 702 presents a representation 704 of the physical environment (e.g., using "pass-through video" as described above). The user 712 is holding the computer system 700 such that the sofa 709a, the picture 709b, the first individual 709c1, and the second individual 709c2 are visible from the user's point of view, which for virtual transmission is determined based on the location of a portion of the computer system 700 that includes one or more cameras for obtaining visual information about the physical environment and generating a virtual environment based on the visual information about the visual environment. In the embodiment of fig. 7A-7Q, the viewpoint of the user corresponds to the field of view of one or more cameras in communication with the computer system 700 (e.g., cameras on the back side of the computer system 700). Thus, for virtual passthrough, as the computer system 700 moves throughout the physical environment, the field of view of the one or more cameras changes, which causes the user's point of view to change. Because the sofa 709a, the drawing 709b, the individual 709c1, and the second individual 709c2 are visible from the user's point of view in fig. 7A, the display 702 includes depictions of the sofa 709a, the drawing 709b, the first individual 709c1, and the second individual 709c 2. When the user 712 looks at the display 702, the user 712 may see the representation 704 of the physical environment and one or more virtual objects that may be displayed by the computer system 700 (e.g., as shown in fig. 7B-7Q). Thus, computer system 700 presents an augmented reality environment via display 702.

Although computer system 700 is a tablet in fig. 7A, in some embodiments computer system 700 may be one or more other devices, such as a handheld device (e.g., a smartphone) and/or a head-mounted device. In some embodiments, when computer system 700 is a head-mounted device, representation 704 of the physical environment is an augmented reality environment. In some embodiments, while the representation 704 of the physical environment is an augmented reality environment, the representation 704 of the physical environment includes immersive visual properties including a display of depth data (e.g., foreground and background of the representation 704 of the physical environment are differently displayed so as to present visual effects of depth when viewed by a user of the computer system 700). In some embodiments, computer system 700 includes one or more components of computer system 101 and/or display 702 includes components of display generation component 120. In some implementations, the display 702 presents a representation of a virtual environment (e.g., rather than the physical environment at fig. 7A).

Fig. 7B-7E illustrate a method for capturing spatial (e.g., immersive) media. In fig. 7B-7E, computer system 700 is maintained in the physical environment shown in fig. 7A, as shown in schematic 701, which is discussed in more detail below. In fig. 7B-7E, computer system 700 is now shown in an enlarged view to better illustrate what is visible on display 702. As shown in fig. 7B, computer system 700 displays control center virtual object 707 (e.g., in response to a swipe gesture performed by user 712 on display 702). The control center virtual object 707 includes a plurality of virtual objects. Each virtual object included in the control center virtual object 707 is selectable. Each virtual object included in control center virtual object 707, when selected, causes computer system 700 to perform a corresponding operation (e.g., modifying a playback state of computer system 700, modifying a volume of audio output by computer system 700, causing display of an application currently installed on computer system 700, and/or any other suitable operation).

As shown in FIG. 7B, computer system 700 presents a representation 704 of a physical environment. The representation 704 of the physical environment corresponds to a viewpoint of the user (e.g., the representation 704 of the physical environment includes content visible from the viewpoint of the user). That is, as the viewpoint of the user changes, the representation 704 of the physical environment changes based on the change in viewpoint of the user of the computer system 700. In some embodiments, representation 704 of the physical environment is a transparent representation of at least a portion of the physical environment surrounding computer system 700.

As shown in fig. 7B, the representation 704 of the physical environment is visually contrasted with the display of the control center virtual object 707. The representation 704 of the physical environment includes a first amount of shadow/blur, while the display of the control center virtual object 707 is displayed with a second amount of shadow/blur (e.g., no shadow/blur) that is different from the first amount of shadow/blur. In some implementations, the representation 704 of the physical environment is not contrasted with the display of the control center virtual object 707. In some implementations, the representation 704 of the physical environment does not have any amount of blurring/shading.

Fig. 7B-7Q include schematic diagrams 701 of physical environments. The computer system 700 is represented by an indication 703 within the schematic 701. That is, the position and orientation of the indication 703 in the schematic 701 represents the position and orientation of the computer system 700 within the physical environment. While schematic 701 depicts the physical environment shown in fig. 7A, it should be appreciated that this is merely an example, and that the techniques described herein may be used with other types of physical environments. The schematic 701 is merely visual aid. Computer system 700 does not display schematic 701.

Fig. 7B shows computer system 700 as having hardware buttons 711a (e.g., hardware input devices/mechanisms) (e.g., physical input devices) and hardware buttons 711B. Further, fig. 7B shows a body portion 712a of the user 712. Body portion 712a depicts one of the fingers of user 704 (e.g., the user's index finger, ring finger, little finger, middle finger, or thumb). In some embodiments, the representation of body portion 712a is any other portion of the body (e.g., wrist, arm, hand, and/or any other suitable body portion) of user 704 that is capable of activating hardware button 711a or hardware button 711b. At fig. 7B, computer system 700 detects activation of hardware button 711a by body part 712a, or computer system 700 detects input 750B directed to camera virtual object 707 a. In some implementations, the input 750b is a tap input on the camera virtual object 707a (e.g., an air tap in space corresponding to the display position of the camera virtual object 707 a). In some implementations, the input 750b is a gaze input (e.g., a continuous gaze) directed toward a display direction of the camera virtual object 707 a. In some implementations, the input 750b is an air-tap input in combination with detecting a gaze in a display direction of the camera virtual object 707 a. In some implementations, the input 750b is a gaze and blink directed toward the display direction of the camera virtual object 707 a.

As shown in fig. 7C, in response to detecting activation of the hardware button 711a by the body part 712a or input 750b directed to the camera virtual object 707a, the computer system 700 displays a media capture preview 708, a timer virtual object 713, a camera shutter virtual object 714, a relocation virtual object 716, a cancel virtual object 719, and a photo pool virtual object 715. The computer system 700 displays the media capture preview 708 as overlaid on top of the representation 704 of the physical environment. As shown in fig. 7C, the display of the media capture preview 708 is smaller than the representation 704 of the physical environment (e.g., occupies less space on the display 702). In some embodiments, computer system 700 is a head-mounted device that presents representation 704 of a physical environment and one or more virtual objects that computer system 700 displays via a display generation component that encloses (or substantially encloses) a field of view of a user. In an embodiment in which the computer system 700 is an HMD, the user's view is locked to the forward direction of the user's head such that the representation 704 of the physical environment and one or more virtual objects (such as media capture preview 708) shift as the user's head moves (e.g., because the computer system 700 also moves as the user's head moves).

Timer virtual object 713, camera shutter virtual object 714, relocation virtual object 716, elimination virtual object 719, and photo pool virtual object 715 are all anchored to media capture preview 708. That is, the display positions of timer virtual object 713, camera shutter virtual object 714, repositioning virtual object 716, and photo pool virtual object 715 are associated with the display position of media capture preview 708. In some implementations, when the display position of the media capture preview 708 changes, the display positions of the timer virtual object 713, the camera shutter virtual object 714, the repositioning virtual object 716, the eliminating virtual object 719, and the photo pool virtual object 715 change (see, e.g., fig. 7F-7G). As shown in fig. 7C, the computer system 700 displays a media capture preview 708 over/on top of the camera shutter virtual object 714 and in the center of the display 702. As shown in fig. 7C, the media capture preview 708 includes a portion of the representation 704 of the physical environment that is visible before the computer system 700 displays the media capture preview 708. For example, at fig. 7B, the representation of the physical environment (e.g., before the computer system 700 displays the media capture preview 708) includes a sofa 709a, a drawing 709B, a first individual 709c1, and a second individual 709c2. Thus, as shown in FIG. 7C, the media capture preview 708 includes depictions of a sofa 709a1, a picture 709b1, a first individual 709C3, and a second individual 709C 4. In response to the computer system 700 detecting a request to capture media, the media capture preview 708 provides a preview of the content to be captured. The content displayed within the media capture preview 708 is based on the field of view of the one or more cameras in communication with the computer system 700 (e.g., the content displayed within the media capture preview 708 is within the field of view of the one or more cameras in communication with the computer system 700). The content displayed within the media capture preview 708 changes based on the change in the field of view of the one or more cameras. In some embodiments, the computer system 700 includes two cameras and the content displayed within the media capture preview 708 is content that falls within the fields of view of both cameras, which enables the capture of immersive content.

In fig. 7C, the viewpoint of the user corresponding to the representation 704 of the physical environment has a wider range of viewing angles than the range of angles of the field of view corresponding to the media capture preview 708. This causes the representation 704 of the physical environment to depict a greater amount of physical environment than the amount of physical environment depicted within the media capture preview 708 (e.g., the entire sofa is visible in the representation 704 of the physical environment while only a portion of the sofa is visible in the media capture preview 708). Thus, as shown in FIG. 7C, the media captured when the media capture preview 708 appears will include only a portion of the sofa, rather than the entire sofa. In some implementations, the representation of the physical environment included in the media capture preview 708 is displayed at a first scale and the representation 704 of the physical environment is presented at a second scale that is greater than the first scale. In some implementations, the computer system 700 includes two cameras having different but overlapping fields of view, and the media capture preview 708 represents a portion of the physical environment (e.g., where the FOVs overlap) that is common to the fields of view of the two cameras, while the representation 704 of the physical environment includes content within the FOV of the first and/or second of the two cameras (e.g., both overlapping and non-overlapping). In some implementations, the display of the media capture preview 708 by the computer system 700 includes content included in the representation 704 of the physical environment. In some implementations, the representation 704 of the physical environment is displayed from an immersive perspective, and the content included in the display of the media capture preview 708 is displayed from a non-immersive perspective.

As shown in fig. 7C, the media capture preview 708 is displayed with a visual appearance that does not include darkening and/or blurring, and the representation 704 of the physical environment is displayed as darkening and/or blurring. This provides a contrast between the display of the media capture preview 708 and the representation 704 of the physical environment. In some implementations, the representation 704 of the physical environment is not dimmed and/or blurred before the computer system 700 detects the input 750B or before the computer system 700 detects the activation of the hardware button 711a by the body part 712a at fig. 7B. In some implementations, the representation 704 of the physical environment is dimmed and/or blurred (e.g., faded out) in response to the computer system 700 detecting the input 750b or in response to the computer system 700 detecting activation of the hardware button 711a by the body part 712 a.

The timer virtual object 713 provides an indication of the amount of time (e.g., minutes, seconds, hours) that has elapsed since the computer system 700 initiated the media capture process relative to the displayed virtual object anchored to the media capture preview 708. Photo pool virtual object 715 includes a representation of the most recently captured media item (e.g., still photo or video). In some implementations, the photo pool virtual object 715 includes a representation of the most recently captured media item captured by the computer system 700. In some implementations, the photo pool virtual object 715 includes a representation of a most recently captured media item captured by an external device in communication with the computer system 700. As shown in fig. 7C, photo pool virtual object 715 includes a representation of a fountain. Thus, the most recently captured media items include depictions of a fountain.

Selection of the camera shutter virtual object 714 initiates a process on the computer system 700 for capturing media including content shown within the media capture preview 708. Repositioning the virtual object 716 allows the user 712 to reposition the display position of the media capture preview 708. For example, moving the display position of the relocated virtual object 716 to the left causes the display of the media capture preview 708 to be moved to the left. Selection of the cancel virtual object 719 causes the computer system 700 to cease displaying the media capture preview 708. In some implementations, when the media capture preview 708 ceases to be displayed, the representation 704 of the physical environment becomes unblurred and/or unshaded.

In fig. 7C, the computer system 700 detects activation of the hardware button 711a by the body part 712a, or the computer system 700 detects an input 750C directed to the camera shutter virtual object 714. In some implementations, the input 750c is a tap on the camera shutter virtual object 714 (e.g., an air tap in space corresponding to the display position of the camera shutter virtual object 714). In some implementations, the input 750c is a gaze (e.g., continuous gaze) input directed in a display direction of the camera shutter virtual object 714. In some implementations, the input 750c is an air tap input in combination with detecting gaze in the display direction of the camera shutter virtual object 714. In some implementations, the input 750c is a gaze and blink directed toward the display direction of the camera shutter virtual object 714. In some implementations, the activation or input 750c of the hardware button 711a by the body part 712a is a long press (e.g., press and hold) (e.g., the duration of the activation or input 750c of the hardware button by the body part 712a spans a few seconds). In some implementations, the activation or input 750c of the hardware button 711 by the body portion 712a is a short press (e.g., pressing and releasing) (e.g., the duration of the activation or input 750c of the hardware button 711a by the body portion 712a is less than one second). In some implementations, a particular air gesture identified as a request to capture media is detected (e.g., as described above with respect to selection of virtual objects in an XR environment).

At fig. 7D, computer system 700 initiates a media capturing process in response to detecting activation of hardware button 711a by body part 712a or in response to detecting input 750 c. At fig. 7D, it is determined that the activation or input 750c of the hardware button 711 by the body part 712a is a long press. Because it is determined that the body part 712a is a long press of the activation or input 750c of the hardware button 711, video media (e.g., rather than static media) is captured. While the media capturing process is in progress, the media capturing process records content displayed within the media capturing preview 708. In some implementations, the field of view of the one or more cameras in communication with the computer system 700 is changed during the media capturing process, which causes the content displayed within the media capturing preview 708 to change and causes the content captured by the media capturing process to change. In some implementations, in accordance with a determination that the body portion 712a activation or input 750c of the hardware button 711 is a short press, static media (e.g., a photograph) is captured via a media capturing process.

As shown in fig. 7D, the display reading of the timer virtual object 713 is "00:05" (e.g., 5 seconds). The timer virtual object 713 in FIG. 7D indicates that five seconds have elapsed since the computer system 700 initiated the media capture process. Further, as shown in fig. 7D, the display of the camera shutter virtual object 714 includes a square. The camera shutter virtual object 714 is displayed with a square indicating that the computer system 700 is currently recording video media. In some implementations, the shape, size, and/or color of the camera shutter virtual object 714 is updated to indicate that the computer system 700 is currently recording video media. In fig. 7D, the computer system 700 detects activation of the hardware button 711a by the body part 712a, or the computer system 700 detects an input 750D directed to the camera shutter virtual object 714. In some implementations, the input 750d is a tap input on the camera shutter virtual object 714 (e.g., an air tap in space corresponding to a display position of the camera shutter virtual object 714). In some implementations, the input 750b is a gaze input directed to a display direction of the camera shutter virtual object 714. In some implementations, the input 750d is an air tap in combination with detecting a gaze in a display direction of the camera shutter virtual object 714. In some implementations, the input 750d is a gaze and blink directed toward the display direction of the camera shutter virtual object 714.

At fig. 7E, in response to detecting activation of hardware button 711a by body part 712a or input 750d directed to camera shutter virtual object 714, computer system 700 stops the media capturing process. Because the computer system 700 is no longer performing the media capturing process, the display of the camera shutter virtual object 714 in fig. 7E does not include a square. As shown in fig. 7E, computer system 700 displays photo pool virtual object 715 in a representation of the video captured in fig. 7C-7D (e.g., video of a physical environment). Further, as shown in FIG. 7E, because the computer system 700 is no longer performing the media capturing process, the display reading of the timer virtual object 713 is 0:00.

As shown in fig. 7E, the schematic 701 includes a movement indicator 721. The movement indicator 721 indicates that the computer system 700 is beginning to move within the physical environment. At FIG. 7E, computer system 700 begins to move laterally to the right within the physical environment.

Fig. 7F-7H illustrate a method by which the computer system 700 displays the media capture preview 708 as lagging (e.g., following) the movement of the computer system 700. Fig. 7E-7G depict one continuous movement of computer system 700 laterally to the right within a physical environment. In some embodiments, computer system 700 moves laterally left, up, and/or down within a physical environment. In some embodiments, computer system 700 moves in a combination of different directions (e.g., up and left and/or down and right).

At fig. 7F, computer system 700 is positioned laterally to the right of the previous positioning of computer system 700 (e.g., the positioning of computer system 700 in fig. 7E). Movement of the computer system 700 causes the user's point of view to change. As described above, the representation 704 of the physical environment corresponds to a portion of the physical environment that is visible from the user's point of view. Thus, as the viewpoint of the user changes, the representation 704 of the physical environment changes accordingly. Further, as described above, the media capture preview 708 corresponds to the field of view of the one or more cameras in communication with the computer system 700. Movement of the computer system 700 causes the field of view of the one or more cameras to change. Thus, as the field of view of the one or more cameras changes, the content displayed within the media capture preview 708 changes accordingly.

As shown in fig. 7F, the computer system 700 displays the media capture preview 708 off-center (e.g., right, left, above or below the center of the display 702). At FIG. 7F, the computer system 700 is moving within the physical environment at a first speed (e.g., 1ft/s, 3ft/s, 5 ft/s) and the computer system 700 displays the media capture preview 708 as moving at a second speed (e.g., at a second speed relative to the physical environment) that is slower than the first speed (e.g., if the computer system is moving at 5ft/s, the media capture preview is moving at 3 ft/s). The difference between the speed at which computer system 700 moves within the physical environment and the speed at which computer system 700 displays media capture preview 708 as it moves relative to the physical environment creates a hysteretic visual effect that depicts media capture preview 708 as lagging (e.g., following) the movement of computer system 700. Strictly from the perspective of the display 702, the media capture preview 708 moves on the screen in a direction opposite to the direction of movement of the computer system 700 in order to produce a visual effect that the media capture preview 708 moves slower relative to the physical environment than the computer system 700. In some implementations, the representation 704 of the physical environment has a first set of disparity attributes while the computer system 700 is moving, and the computer system 700 displays the media capture preview 708 with a second set of disparity attributes that is different from the first set of disparity attributes. for example, at FIG. 7F, as computer system 700 moves within the physical environment, there is a first offset between the depiction of first individual 709c3 and second individual 709c4 within media capture preview 708 and the depiction of drawing 709b1, and a second offset between the depiction of first individual 709c1 and second individual 709c2 within representation 704 of the physical environment and the depiction of drawing 709 b. Because the representation 704 of the physical environment has a different set of parallax properties than the media capture preview 708, a first offset between the depiction of the first and second individuals 709c3, 709c4 and the depiction of the drawing 709b1 within the media capture preview 708 is different than an offset between the depiction of the first and second individuals 709c1, 709c2 and the depiction of the drawing 709b in the representation 704 of the physical environment. In some implementations, the media capture preview 708 is not displayed with any parallax effect (e.g., the first offset is zero), while the representation 704 of the physical environment includes a non-zero degree of offset such that the representation 704 of the physical environment is perceived as having depth, while the media capture preview 708 does not have depth (e.g., it appears flat). In some embodiments, while computer system 700 is moving, computer system 700 applies a first image stabilization technique to representation 704 of the physical environment, and computer system 700 applies a second, different stabilization technique to the display of media capture preview 708. In some embodiments, when computer system 700 is moving, computer system 700 applies a smaller amount of digital image stabilization to representation 704 of the physical environment than the amount of digital stabilization computer system 700 applies to media capture preview 708.

At fig. 7F, computer system 700 is moving laterally to the right. Because the computer system 700 is moving laterally to the right, the computer system 700 displays the media capture preview 708 to the left of the center of the display 702. In some implementations, the computer system 700 moves laterally to the left within the physical environment, which causes the computer system 700 to display the media capture preview 708 to the right of the center of the display 702. In some implementations, the computer system 700 moves vertically upward within the physical environment, which causes the computer system 700 to display the media capture preview 708 below the center of the display 702. In some implementations, the computer system 700 moves vertically downward within the physical environment, which causes the computer system 700 to display the media capture preview 708 over the center of the display 702. In some implementations, the computer system 700 moves forward (e.g., in the z-direction) within the physical environment (e.g., toward the inside of the page), which causes the computer system 700 to display the media capture preview 708 larger relative to the representation 704 of the physical environment for a period of time, and then to transition the size of the media capture preview 708 to have the same relative size as the representation 704 of the physical environment before movement begins; in such implementations, the rate at which the media capture preview 708 transitions between two sizes lags the forward movement speed of the computer system 700. In some implementations, the computer system 700 moves back within the physical environment (e.g., moves away from the page), which causes the computer system to display the media capture preview 708 smaller relative to the representation 704 of the physical environment for a period of time, and then to transition the size of the media capture preview 708 to have the same relative size as the representation 704 of the physical environment before movement begins; in such implementations, the rate at which the media capture preview 708 transitions between two sizes lags the forward movement speed of the computer system 700. In some implementations, as the computer system 700 moves within a physical environment, the computer system 700 detects movement of the computer system 700 with a first amount of tracking hysteresis (e.g., measured as a function of distance over time). For example, when the computer system 700 is located at the location indicated by the indication 703 in the schematic 701 of FIG. 7F, the computer system 700 detects that it is located at the location indicated by the previous location indicator 717 of FIG. 7F for an amount of time corresponding to the first amount of tracking lag. In some implementations, the second speed at which the computer system 700 displays the media capture preview 708 while moving is configured to be less than the first amount of tracking lag in order to show a lag visual effect (e.g., in order to reduce visual artifacts).

As shown in fig. 7F, the schematic 701 includes a previous location indicator 717. The previous location indicator 717 indicates the previous location of the computer system 700 (e.g., the location of the computer system 700 in fig. 7E). At fig. 7F, the computer system 700 continues to move laterally to the right within the physical environment, as indicated by the schematic 701 including movement indication 721.

At FIG. 7G, the computer system 700 is positioned farther to the right within the physical environment than the computer system 700 in FIG. 7F, and has stopped moving. As shown in fig. 7G, the schematic 701 includes a previous location indication 717 indicating a previous location of the computer system 700 (e.g., a location of the computer system 700 in fig. 7F). The schematic 701 also includes an indication 703 indicating the current location of the computer system 700. At fig. 7G, the schematic 701 does not include the movement indicator 721 because the computer system 700 is no longer moving in fig. 7F.

As shown in fig. 7G, computer system 700 displays media capture preview 708 to the left of the center of display 702. That is, although computer system 700 is no longer moving in FIG. 7G, computer system 700 continues to display media capture preview 708 moving at a second speed (e.g., 3ft/s when the computer system is moving at 5 ft/s) so that media capture preview 708 can "catch up" with computer system 700. In some implementations, once the computer system 700 stops moving, the computer system 700 displays the media capture preview 708 as moving at a third speed (e.g., 5 ft/s) faster than the second speed, such that the media capture preview 708 is re-centered faster than it moves at the second speed.

At fig. 7G, the representation 704 of the physical environment is updated relative to the representation 704 of the physical environment in 7H (e.g., the representation 704 of the physical environment in fig. 7G includes a table 709e in the background and less sofas 709a in the foreground). As described above, the viewpoint of the user changes based on a change in the field of view of the one or more cameras in communication with the computer system 700. Thus, as the computer system 700 moves throughout the physical environment (e.g., this causes the field of view of the one or more cameras in communication with the computer system 700 to change), the viewpoint of the user changes, which causes the representation 704 of the physical environment to change accordingly. Further, at FIG. 7G, the display of the media capture preview 708 is updated. As the computer system 700 moves within the physical environment, the field of view of the one or more cameras in communication with the computer system changes, which causes the content displayed within the media capture preview 708 to change.

At fig. 7H, the media capture preview 708 has caught up with the previous movement of the computer system 700 (e.g., movement of the computer system 700 as described above with respect to fig. 7E-7G). As shown in fig. 7H, because the media capture preview 708 catches up with the previous movement of the computer system 700, the computer system 700 displays the media capture preview 708 in the center of the display 702. The computer system 700 displays the media capturing preview 708 in the center of the display 702 such that a first portion of the representation 704 of the physical environment that was previously visible (e.g., visible in fig. 7G) (e.g., a portion of the torso of the second individual 709c 2) ceases to be visible (e.g., because the media capturing preview 708 is displayed overlaid on the first portion) and such that a second portion of the representation 704 of the physical environment that was previously not visible (e.g., the torso of the first individual 709c 1) is visible (e.g., because the media capturing preview 708 is no longer displayed overlaid on the second portion). At FIG. 7H, computer system 700 detects input 750H directed to photo pool virtual object 715. In some implementations, the input 750h is a gaze input (e.g., a continuous gaze) directed to a display direction of the photo pool virtual object 715. In some implementations, the input 750h is a tap on the photo pool virtual object 715 (e.g., an over-the-air tap in a space corresponding to the display location of the photo pool virtual object). In some implementations, the input 750h is an air tap in combination with detecting a gaze in the display direction of the photo pool virtual object 715. In some implementations, the input 750h is a gaze and blink directed to the display direction of the photo pool virtual object 715.

As shown in fig. 7I, in response to detecting input 750h, computer system 700 displays previously captured media item 730. Previously captured media item 730 is the most recently captured media item of computer system 700 (e.g., the media item captured in fig. 7C and 7D). In some implementations, the previously captured media item 730 is the most recently captured media item captured by an external device in communication with the computer system 700 (e.g., a device separate from the computer system 700). Previously captured media item 730 is shown as a square. In some implementations, the previously captured media item 730 is displayed as a rectangle, triangle, or any other suitable shape.

The computer system 700 displays previously captured media items 730 along with library virtual object 731, relocate virtual object 732, eliminate virtual object 733, identifier virtual object 734, shared virtual object 735, and projected shape virtual object 736. In some implementations, the virtual objects listed above are anchored (e.g., as described above in the description of fig. 7C) to the display of the previously captured media item 730. In some embodiments, library virtual object 731, relocation virtual object 732, elimination virtual object 733, identifier virtual object 734, sharing virtual object 735, and projected shape virtual object 736 are displayed in a different spatial configuration than that shown in fig. 7I. In some implementations, in response to detecting input 750h, computer system 700 displays a user interface including a subset of the virtual objects displayed in fig. 7I.

Selection of the library virtual object 731 causes a plurality of representations of previously captured media items to be displayed. In some implementations, the library virtual object 731 is displayed simultaneously with the media capture preview 708. In some implementations, the display of the plurality of representations of previously captured media items replaces the display of previously captured media items 730. Repositioning the virtual object 732 allows the user to reposition the display location of the representation of the previously captured media item 730 in the same manner that the media capture preview 708 may be repositioned using the repositioning virtual object 716. Selection of the cancel virtual object 733 causes the computer system 700 to cease displaying the previously captured media item 730. In some embodiments, selection of the cancellation virtual object 733 causes the computer system 700 to cease displaying the library virtual object 731, relocate the virtual object 732, cancel the virtual object 733, identifier virtual object 734, share virtual object 735, and project shape virtual object 736. In some implementations, selection of the cancel virtual object 733 causes the computer system 700 to display a media capture preview 708. The identifier virtual object 734 provides an indication of where and when the previously captured media item 730 was captured. In some implementations, the identifier virtual object 734 provides different information about the previously captured media item 730 (e.g., resolution of the previously captured media item, date the previously captured media item was captured). Selection of the sharing virtual object 735 initiates a process on the computer system 700 for sharing the previously captured media item 730 with an external device (e.g., a device separate from the computer system 700). At fig. 7I, computer system 700 detects an input 750I directed to projected shape virtual object 736. In some implementations, the input 750i is a gaze input directed toward a display direction of the projected shape virtual object 736. In some implementations, the input 750i is a tap input on the projected shape virtual object 736 (e.g., an air tap in space corresponding to a display location of the projected shape virtual object 736). In some implementations, the input 750i is an air-tap input in combination with detecting a gaze in a display direction of the projected shape virtual object 736. In some implementations, the input 750i is a gaze and blink directed toward a display direction of the projected shape virtual object 736.

As shown in fig. 7J, in response to detecting input 750i, computer system 700 displays a representation of previously captured media item 730 as a circle. As shown in fig. 7J, when the computer system 700 displays the previously captured media item 730 as a circle, the computer system 700 displays the library virtual object 731, the relocation virtual object 732, the elimination virtual object 733, the identifier virtual object 734, the sharing virtual object 735, and the projected shape virtual object 736, as described above in the discussion of fig. 7I. In some implementations, in response to detecting the input 750i, the computer system 700 displays the previously captured media item 730 as a three-dimensional sphere. As shown in fig. 7I, the schematic includes a movement indicator 721. The movement indicator 721 indicates that the computer system 700 is beginning to move (e.g., laterally to the left) within the physical environment. At fig. 7J, computer system 700 begins to move laterally left within the physical environment back to the initial positioning of computer system 700 (e.g., the positioning of computer system 700 in fig. 7A), as indicated by schematic 701 including movement indication 721.

Fig. 7K-7Q illustrate a method by which a user interacts with a three-dimensional representation of a previously captured media item. Fig. 7K-7Q include an elapsed time indication 744. Elapsed time indication 744 is a visual aid that indicates the amount of time that has elapsed between each graph. The computer system 700 does not display the elapsed time indication 744. At fig. 7K, computer system 700 is located at an initial location in a physical environment (e.g., the location of computer system 700 in fig. 7A-7E). Between fig. 7J and 7K, computer system 700 receives a request to display space capture virtual object 740. As shown in fig. 7K, in response to receiving a request to display a space capture virtual object 740, computer system 700 displays space capture virtual object 740 as overlaid on top of representation 704 of the physical environment. The display of the spatially captured virtual object 740 obscures a portion of the representation 704 of the physical environment. The spatial capture virtual object 740 is a representation of a previously captured video media item. In some embodiments, computer system 700 is a head-mounted device that presents spatially-captured virtual object 740 as part of an augmented reality environment, wherein spatially-captured virtual object 740 is displayed with a parallax effect (e.g., as discussed above with respect to fig. 7F) that causes a media item represented by spatially-captured virtual object 740 to have an amount of depth between the foreground and the background of content in the media item when computer system 700 is in motion. In some implementations, the spatial capture virtual object 740 is a representation of previously captured static media. In some implementations, the previously captured video media items are captured using the one or more cameras in communication with the computer system 700. In some embodiments, the request to display space capture virtual object 740 includes one or more inputs directed to one or more hardware buttons in communication with computer system 700. In some embodiments, the request to display space capture virtual object 740 includes one or more inputs directed to one or more virtual objects displayed by computer system 700. In some implementations, displaying the request for the spatial capture virtual object 740 includes detecting a gaze (e.g., performed by a user) in a direction corresponding to one of the sub-representations of the previously captured media item. In an embodiment in which the computer system 700 is an HMD, the user's view is locked to the forward direction of the user's head such that the representation 704 of the physical environment and one or more virtual objects (such as the spatially captured virtual object 740) shift as the user's head moves (e.g., because the computer system 700 also moves as the user's head moves).

Computer system 700 displays spatial capture virtual object 740 at a first location in representation 704 of the physical environment that corresponds to the first location in the physical environment (e.g., the location indicated by spatial capture indicator 740a in schematic 701). When a first location of the physical environment is within a field of view of the one or more cameras in communication with the computer system 700, the computer system 700 displays a spatially captured virtual object 740. The display of the spatially captured virtual object 740 is locked by the environment at a first location in the representation 704 of the physical environment. That is, the location of the display of the spatially-captured virtual object 740 within the representation 704 of the physical environment does not change as the real-world positioning of the computer system 700 changes. However, the display of the spatial capture virtual object 740 is updated as the user's point of view changes in the physical environment. For example, as the user's point of view is closer to a first location in the physical environment, computer system 700 displays spatial capture virtual object 740 larger (e.g., to provide a visual effect that the user has moved closer to spatial capture virtual object 740). Conversely, as the user's point of view is farther from a first location in the physical environment, computer system 700 displays space capture virtual object 740 smaller (e.g., to provide a visual effect that the user has moved farther from space capture virtual object 740).

As shown in fig. 7K, in response to receiving a request to display a space capture virtual object 740, computer system 700 also displays a plurality of sub-representations of previously captured media items 743 while displaying space capture virtual object 740. As shown in fig. 7K, computer system 700 displays a plurality of sub-representations of previously captured media items 743 under the display of spatially captured virtual object 740. Each sub-representation of previously captured media represents a respective media item (e.g., video media or static media) previously captured by computer system 700 or an external device in communication with computer system 700. The sub-representations of the previously captured media item 743a that are displayed in the middle of the plurality of sub-representations of the previously captured media item 743 correspond to the media item that is displayed in focus (e.g., represented by the spatial capture virtual object 740). In some implementations, a user can navigate among multiple sub-representations of a previously captured media item 743 to select a sub-representation to be displayed as focused at the location of the spatially captured virtual object 740. In some implementations, the user can switch the child representations displayed in focus by performing a motion input (e.g., pinching and dragging a gesture). In some implementations, when the display space captures virtual object 740, the computer system 700 expands the size of the display space captures virtual object 740 in response to the computer system 700 detecting (e.g., via the one or more cameras in communication with the computer system 700) that the user has performed a pinch gesture. In some implementations, the computer system 700 reduces the size of the display of the spatial capture virtual object 740 in response to the computer system 700 detecting (e.g., via the one or more cameras in communication with the computer system 700) that the user has performed a spread out air gesture.

As shown in fig. 7K, the schematic 701 includes a spatial capture indicator 740a. The spatially captured virtual object 740 does not actually exist within the physical environment. Instead, the spatially-captured virtual object 740 is a virtual object that is displayed only as being within the representation 704 of the physical environment. The spatial capture indicator 740a indicates the spatial orientation/positioning of the display of the spatial capture virtual object 740 within the representation 704 of the physical environment, and also indicates the location to which the spatial capture virtual object 740 is locked by the environment.

At fig. 7K, while spatial capture virtual object 740 is displayed, computer system 700 plays back video media items represented by spatial capture virtual object 740. After computer system 700 displays spatial capture virtual object 740, computer system 700 automatically (e.g., without user input intervention) initiates playback of the video media item represented by spatial capture virtual object 740. In some implementations, playback of the video media item represented by the spatially-captured virtual object 740 is not automatic. In some implementations, the computer system 700 outputs spatial audio as part of playing back video media items represented by the spatial capture virtual object 740. In some implementations, the video media items represented by the spatial capture virtual object 740 are stereoscopic video media items. Stereoscopic video media items present two different images of the same scene to a user. In some embodiments, the first image is from a first camera and the second image is from a second camera, wherein the first camera and the second camera have slightly different perspectives of the scene. Thus, the first image is slightly different from the second image. When viewed by a user, the two images are superimposed on one another, creating an illusion of depth within the resulting image. In some implementations, using the techniques described above with respect to fig. 7J, a user may change the shape of computer system 700 display space capture virtual object 740. In some embodiments, the user may change the shape of the spatially captured virtual object 740 from a flat stereoscopic projection to a spherical stereoscopic projection, and vice versa. In some implementations, the video media items represented by the spatially captured virtual object 740 are played back on an external electronic device that is incapable of playing back stereoscopic video. In some implementations, spatial audio is used when video media items represented by spatial capture virtual object 740 are played back on an external electronic device that is incapable of playing back stereoscopic video. In some embodiments, in accordance with a determination that the user of computer system 700 has a different pupillary distance than the default pupillary distance setting of computer system 700, computer system 700 plays back the video media item at an offset (e.g., computer system 700 plays back the video of the media item represented by spatial capture virtual object 740 at a different rate than the video media item was captured). The pupillary distance is the distance (e.g., in millimeters) between the pupil centers of the individual's eyes. In some embodiments, in accordance with a determination that the user of computer system 700 has a different pupillary distance than the default pupillary distance setting of computer system 700, computer system 700 changes (e.g., offsets) the first image and/or the second image (e.g., to the left or right) of the stereoscopic video image based on the difference between the user's pupillary distance and the pupillary distance setting of computer system 700 without changing the scale of playback of the stereoscopic video media item.

The video media items represented by the spatially captured virtual objects 740 are video of individuals advancing along a small path. At fig. 7K, computer system 700 plays back a video media item represented by spatially captured virtual object 740 from a non-immersive perspective. Playback of content from a non-immersive perspective cannot be presented from multiple perspectives in response to a detected change in the orientation/position of computer system 700. When rendered from a non-immersive perspective, the video media is rendered from only one perspective, regardless of whether the orientation/position of the computer system 700 is changed. At fig. 7K, computer system 700 is located at a first distance (e.g., a virtual distance) away from a first location in the physical environment where computer system 700 displays a spatially-captured virtual object 740 in representation 704 of the physical environment. At fig. 7K, computer system 700 begins to move toward a first location of the physical environment, as indicated by diagram 701 including movement indication 721.

As shown in fig. 7L, elapsed time indicator 744 reads "00:01". Accordingly, 1 second has elapsed between fig. 7K and fig. 7L. At FIG. 7L, computer system 700 continues to display space capture virtual object 740 within representation 704 of the physical environment. Further, at FIG. 7L, computer system 700 is positioned closer to a first location in the physical environment than the location of computer system 700 in FIG. 7K. Thus, at FIG. 7L, because the computer system 700 is located closer to the first location of the physical environment, the computer system 700 increases the size of the display of the spatially-captured virtual object 740 (e.g., as compared to the size of the display of the spatially-captured virtual object 740 in FIG. 7K). As described above, as the viewpoint of the user changes, the display of the spatial capture virtual object 740 changes based on the changed viewpoint of the user. Because computer system 700 displays spatially-captured virtual object 740 as larger, fewer representations of physical environment 704 are visible. In some implementations, computer system 700 moves backward within the physical environment, which reduces the size of the display of space capture virtual object 740. In some implementations, as the computer system 700 moves laterally to a side within the physical environment, the representation 704 of the physical environment is presented with a respective visual parallax effect, and the computer system 700 displays the spatially captured virtual object 740 with the respective visual parallax effect.

At fig. 7L, computer system 700 continues to play back video media represented by spatially-captured virtual object 740. Thus, at fig. 7L, computer system 700 updates the representation of the video media represented by spatial capture virtual object 740 (e.g., the representation of the video media shows the user as having progressed along the path) to show that playback of the video media has progressed for one second (e.g., an amount of time has elapsed between fig. 7K and fig. 7L). At fig. 7L, the computer system 700 continues to move toward the first location of the physical environment, as indicated by movement indication 721 within the schematic 701.

As shown in fig. 7M, elapsed time indicator 744 reads "00:05". Therefore, 4 seconds have elapsed between fig. 7L and fig. 7M. At fig. 7M, computer system 700 is positioned closer to a first location of the physical environment than the location of computer system 700 in fig. 7L. At fig. 7M, because computer system 700 is positioned closer to the first location of the physical environment, computer system 700 increases the size of the display of spatially-captured virtual object 740 (e.g., as compared to the size of the display of spatially-captured virtual object 740 in fig. 7L). Because computer system 700 displays spatially-captured virtual object 740 as larger, fewer representations of physical environment 704 are visible.

At fig. 7M, computer system 700 continues to play back video media items represented by spatially captured virtual object 740. Thus, at fig. 7M, computer system 700 displays a frame of a representation of a video media item four seconds after the frame of video media shown in fig. 7L to show that playback of the video media has advanced four seconds. In some embodiments, computer system 700 stops displaying space capture virtual object 740 as computer system 700 moves past the first location of the physical environment. In some implementations, the computer system 700 is a head-mounted device that presents the spatial capture virtual object 740 as part of an augmented reality environment, wherein the closer the computer system 700 moves to a first location in the physical environment, the more immersive the video media item represented by the spatial capture virtual object 740 becomes (e.g., having a greater depth between foreground and background and/or responsiveness to an orientation shift of the computer system 700). In some implementations, at fig. 7M, the spatial capture virtual object 740 is displayed in a full screen configuration. When the spatial capture virtual object 740 is displayed in a full screen configuration, the representation 704 of the physical environment is visually obscured (e.g., the representation 704 of the physical environment is unclear, darkened, and/or darkened). In some embodiments, when the space capture virtual object 740 is displayed in a full screen configuration, the computer displays a different virtual environment (e.g., a virtual cinema environment and/or a virtual car driving cinema environment) in place of the representation 704 of the physical environment. At fig. 7M, computer system 700 begins to rotate in a clockwise direction (e.g., 90 degrees clockwise) within the physical environment, as indicated by movement indication 721 within diagram 701.

At fig. 7N, the viewpoint of the user is rotated 90 degrees in a clockwise direction with respect to the viewpoint of the user in fig. 7M. As described above, movement of the computer system 700 causes the user's point of view to change, which causes the representation 704 of the physical environment to change. Thus, at FIG. 7N, the representation 704 of the physical environment corresponds to a change in the viewpoint of the user. At fig. 7N, first and second individuals 709c1 and 709c2, which are part of a physical environment, picture 709b, and sofa 709a are not visible from the user's point of view at fig. 7N. Thus, the representation 704 of the physical environment at fig. 7N does not include depictions of the sofa 709a, the drawing 709b, the first individual 709c1, and the second individual 790c 2. Instead, the representation 704 of the physical environment at FIG. 7N includes a depiction of the right side of the physical environment.

At fig. 7N, a first location of the physical environment (e.g., a location of the computer system 700 that displays the spatially captured virtual object 740 within the physical representation of the physical environment 704) is not within the field of view of the one or more cameras in communication with the computer system 700. Thus, at FIG. 7N, computer system 700 does not display space capture virtual object 740. In some embodiments, when spatially-captured virtual object 740 is displayed in a full-screen configuration in a virtual environment that does not represent physical environment 704 (e.g., as described above with respect to fig. 7M), the computer system displays a perspective of the virtual environment that corresponds to a viewpoint of the user rotated 90 degrees in a clockwise direction from an initial perspective of the virtual environment. At fig. 7N, computer system 700 begins to rotate counterclockwise (e.g., 90 degrees to the left) as indicated by movement indication 721 within diagram 701.

At fig. 7O, the user's viewpoint is rotated 90 degrees in a counterclockwise direction with respect to the user's viewpoint in fig. 7N. As shown in fig. 7O, elapsed time indicator 744 reads "00:06". Thus, 1 second has elapsed between fig. 7M and fig. 7O. At fig. 7O, a first location of the physical environment (e.g., a location of the computer system 700 displaying the spatially captured virtual object 740 within the representation 704 of the physical environment) is within a field of view of the one or more cameras in communication with the computer system 700. Thus, as shown in FIG. 7O, computer system 700 displays a spatially-captured virtual object 740 within representation 704 of the physical environment. At fig. 7O, computer system 700 continues to play back video media represented by spatially-captured virtual object 740. Thus, at FIG. 7O, the computer system 700 displays a frame of the video media item 1 second after the frame of the video media item shown in FIG. 7M to show that playback of the video media item has advanced for one second.

At fig. 7O, computer system 700 begins to move toward a first location of the physical environment, as indicated by diagram 701 including movement indication 721.

At fig. 7P, as indicated by the positioning of the indication 703 within the schematic 701, the computer system 700 is at a first location in the physical environment (e.g., the location where the computer system 700 displays the spatially captured virtual object 740 in the representation 704 of the physical environment). Because computer system 700 is positioned at a first location in the physical environment, computer system 700 plays back video media represented by spatially captured virtual object 740 from an immersive perspective. Immersive visual content is visual content that includes content of multiple perspectives captured from the same first point (e.g., location) in the physical environment at a given point in time. Playing back content (e.g., playback) from an immersive (e.g., first person) perspective includes playing back content from a perspective that matches the first perspective, and a plurality of different perspectives (e.g., fields of view) may be provided in response to user input. In some embodiments, when computer system 700 is at a first location in the physical environment, computer system 700 moves forward within the physical environment (e.g., past the first location in the physical environment) to an updated location, which causes the computer system to display a spatially captured virtual object 740 from a non-immersive perspective at a location within representation 704 of the physical environment that corresponds to a location of computer system 700 in front of the updated location within the physical environment. In some implementations, when computer system 700 is at a first location in the physical environment, computer system 700 moves back within the physical environment (e.g., moves such that computer system 700 is positioned in front of the first location within the physical environment), which causes computer system 700 to display space capture virtual object 740 from a non-immersive perspective at a location within representation 704 of the physical environment that corresponds to the first location. Thus, by changing the positioning of computer system 700 within a physical environment, a user has the ability to control when computer system 700 displays media items from an immersive or non-immersive perspective.

At fig. 7P, the computer system 700 occupies the entire display 702 from an immersive perspective for playback of video media represented by the spatially captured virtual object 740. When computer system 700 plays back video media represented by spatially captured virtual object 740 from an immersive perspective, representation 704 of the physical environment is not visible. In some embodiments, the representation 704 of the physical environment is visible when the computer system 700 plays back video media represented by the spatially captured virtual object 740 from an immersive perspective.

As shown in fig. 7P, the time indication reading is 00:07. Thus, 1 second has elapsed between fig. 7O and fig. 7P. At fig. 7P, computer system 700 continues to play back video media represented by spatially-captured virtual object 740. Thus, at fig. 7P, computer system 700 displays a frame of video media one second ahead of the frame of video media shown in fig. 7O (e.g., spatial capture virtual object 740 shows the user as having progressed along the way) to show that playback of video media has progressed one second. In some embodiments, in response to detecting the first location of the computer system in the physical environment, the computer system resumes playback of the video media represented by the spatially captured virtual object 740 from the head. At fig. 7P, computer system 700 begins to rotate clockwise (e.g., 90 degrees to the right) as indicated by movement indication 721 within diagram 701.

At fig. 7Q, the viewpoint of the user is rotated 90 degrees in the clockwise direction in the physical environment with respect to the viewpoint of the user in fig. 7P. At fig. 7Q, the computer system 700 is located at a first location in a physical environment, as indicated by indication 703 within the schematic 701. As shown in fig. 7Q, in response to the user's viewpoint being rotated 90 degrees in a clockwise direction in the physical environment, computer system 700 displays playback of video media represented by spatially-captured virtual object 740 from a viewpoint different from the viewpoint from which computer system 700 displayed video media in fig. 7P. More specifically, computer system 700 displays video media from an individual perspective facing the right side of the path shown in the video media. As described above, when computer system 700 is positioned at a first location in a physical environment, computer system 700 displays video media represented by spatially captured virtual object 740 from an immersive perspective. That is, the perspective of playback of the media item represented by the spatially captured virtual object 740 changes based on a change in the viewpoint of the user in the physical environment.

Additional description regarding fig. 7A-7Q is provided below with reference to methods 800, 900, and 1000 described with respect to fig. 7A-7Q.

Fig. 8 is a flowchart of an exemplary method 800 for capturing media, according to some embodiments. In some embodiments, the method 800 is performed at a computer system (e.g., computer system 101 and/or computer system 700 in fig. 1) (e.g., a smartphone, a tablet, and/or a head-mounted device) that includes a display generation component (e.g., display generation component 120 in fig. 1, 3, and 4) (e.g., a heads-up display, a display controller, a touch-sensitive display system, a display (e.g., integrated and/or connected), a 3D display, a transparent display, a projector, a touch screen, and/or a projector) and one or more cameras (e.g., cameras pointed downward at a user's hand (e.g., color sensors, infrared sensors, and other depth sensing cameras) or cameras pointed forward from the user's head), and optionally, a physical input mechanism. In some embodiments, the computer system is in communication with one or more gaze tracking sensors (e.g., optical and/or IR cameras configured to track a gaze direction of a user of the computer system and/or a user's attention). In some embodiments, the first camera has a FOV outside at least a portion of the FOV of the second camera. In some embodiments, the second camera has a FOV outside at least a portion of the FOV of the first camera. In some embodiments, the first camera is positioned on a side of the computer system opposite the side of the computer system where the second camera is positioned. In some embodiments, the method 800 is managed by instructions stored in a non-transitory (or transitory) computer-readable storage medium and executed by one or more processors of a computer system (such as the one or more processors 202 of the computer system 101) (e.g., the control 110 in fig. 1). Some of the operations in method 800 are optionally combined and/or the order of some of the operations are optionally changed.

The computer system detects (802) a request to display a media capturing user interface (e.g., a selection of 711a at 712a, and/or 750B at fig. 7B) while displaying, via a display generating component (e.g., 702), a first user interface overlaid on top of a representation of a physical environment (e.g., 704) (wherein the representation of the physical environment changes with a portion of the physical environment corresponding to the representation of the physical environment and/or a viewpoint (e.g., 712) of a user changes (e.g., is being captured by one or more cameras in communication with the computer system) (e.g., is a mixed reality and/or an augmented reality environment of the representation of the physical environment). In some implementations, as part of receiving a request to display a media capturing user interface, the computer system detects an input (e.g., a press, swipe, and/or tap) on a hardware button. In some implementations, as part of receiving a request to display a media capturing user interface, the computer system detects attention of a user at one or more locations of the first user interface and/or one or more mobile voice command inputs (e.g., via one or more microphones in communication with the computer system).

In response to detecting a request to display a media capture user interface (e.g., capture immersive or semi-immersive visual media, capture three-dimensional stereoscopic media, and/or capture spatial media) (e.g., capture video content capable of rendering content from multiple perspectives in response to a detected change in orientation of a user and/or a computer system), the computer system displays (804) (e.g., after initiating capture of media) a media capture preview (e.g., 708) comprising a representation (e.g., virtual representation and/or virtual object) of a portion of the field of view of the one or more cameras with content (e.g., live preview change to change the field of view of the one or more cameras and/or appearance change of the physical environment) (e.g., immersive media content) updated as the portion of the physical environment in the portion of the field of view of the one or more cameras changes, as discussed above with respect to fig. 7K.

The media capture preview indication will be a boundary of media captured in response to detecting a media capture input (e.g., activation of 711a at 712a, and/or 750D at fig. 7D) when the media capture user interface is displayed (806).

The media capture preview (e.g., as described above with respect to fig. 7C) is displayed when a first portion of the representation of the physical environment is visible (e.g., unoccluded) (808) of the display is detected before a request to display the media capture user interface.

The media capturing preview is displayed (e.g., overlaid) in place of (e.g., as part of) a second portion of the representation of the physical environment (e.g., 704 including a portion 709C 1) (e.g., as described above with respect to fig. 7C), wherein the first portion of the representation of the physical environment is updated (810) as a portion of the physical environment corresponding to the first portion of the representation of the physical environment changes and/or a viewpoint of the user changes (e.g., as described above with respect to fig. 7C) (e.g., when the media capturing preview is displayed and/or visible, one or more other portions of the representation of the physical environment continue to be visible). In some implementations, the computer system 700 displays a media capture preview with a black border. In some implementations, in response to detecting a request to capture media, a computer system initiates capture of media (e.g., three-dimensional media, three-dimensional stereoscopic media, and/or spatial media). In some implementations, the first content captured by the first camera is different from the second content captured by the second camera. In some embodiments, the media capture preview is displayed in the center (e.g., middle) of the computer system (e.g., center of the display generating component), near the nose of the user wearing the computer system, in the center of the three-dimensional representation of the physical environment. In some implementations, the media capture preview is displayed between one or more virtual objects (e.g., a time lapse virtual object, a capture control virtual object, a camera film virtual object, a close button virtual object). In some implementations, the computer system displays the media capture preview over the shutter button virtual object in the center of the display (e.g., as described above with respect to fig. 7C). In some implementations, the captured media is played back using one or more techniques as described above with respect to fig. 7K-7Q. In some implementations, the media capture preview includes content of the physical environment, where the content is visible in a representation of the physical environment prior to displaying the media capture preview. In some implementations, multiple perspectives from points in the physical environment other than the first point may be displayed in response to user input when replaying in a non-immersive perspective. Displaying a media capture preview indicating a boundary of media to be captured in response to detecting a media capture input provides improved feedback to a user regarding what content is to be captured in a user's viewpoint. Displaying the media capture preview when a portion of the representation of the physical environment is visible provides the user with the ability to better compose and capture the desired media while also maintaining a perception of the physical environment, which improves media capture operations and reduces the risk of failing to capture transient events that may be missed if the capture operation is inefficient or difficult to use. Improving media capturing operations enhances the operability of the system and makes the user system interface more efficient (e.g., by helping the user provide appropriate input and reducing user errors in operating/interacting with the device).

In some embodiments, the representation of the physical environment (e.g., 704 at fig. 7B) is a transparent representation of the real world environment of the computer system (e.g., as seen in fig. 7A) (e.g., the physical environment (e.g., outside of the computer system)) (e.g., a portion of the real world environment surrounding the computer system) (e.g., the transparent representation is virtual (e.g., a representation of camera image data captured by the one or more cameras integrated into the computer system) and/or optically transparent (e.g., light that passes directly through a portion of the system (e.g., a transparent portion) to the user)). In some embodiments, the transparent representation is updated to reflect the change in the location and/or orientation of the computer system in response to the change in the location and/or orientation of the computer system. Providing a passthrough representation of the real world environment of the computer system provides visual feedback to the user regarding the location (e.g., position and/or orientation) of the computer system within the real world environment, which provides improved visual feedback, particularly when the passthrough representation is visible when the media capturing preview is displayed.

In some implementations, a representation (e.g., 704) of the physical environment (e.g., content in the representation) is visible (e.g., displayed or viewable via optically transparent) at a first scale (e.g., 1:1, 1:2, 1:4, and/or any other suitable scale), and a representation (e.g., content in the representation) of a portion of the field of view of the one or more cameras (e.g., a second portion of the representation of the physical environment) included in the media capture preview (e.g., 708) is displayed at a second scale, and wherein the first scale is greater than the second scale (e.g., as described above with respect to fig. 7C) (e.g., the same object appearing in both representations appears greater in the representation of the physical environment). Displaying the media capture preview at a smaller scale than what is visible to the representation of the physical environment provides improved visual feedback to the user, knowing which representation is which, thereby reducing user confusion. Doing so may also save display space, making both visible at the same time, which provides the user with the ability to better compose and capture the desired media while also maintaining a perception of the physical environment, which improves media capture operations and reduces the risk of failing to capture transient events that may be missed if the capture operation is inefficient or difficult to use. Improving media capturing operations enhances the operability of the system and makes the user system interface more efficient (e.g., by helping the user provide appropriate input and reducing user errors in operating/interacting with the device).

In some implementations, the representation of the physical environment (e.g., 704 at fig. 7G) (e.g., the corresponding portion) includes first content (e.g., the right half of 709E at fig. 7G) that is not included in (e.g., outside of) the representation of the field of view included in the media capture preview (e.g., 708 at fig. 7G)) (e.g., not displayed within the media capture preview). Having a representation of the physical environment that includes content that is not in the current media capture preview provides the user with the ability to view additional content that may (but is not currently) included in the media capture when the user moves the viewpoint, while also enhancing the user's perception of their current physical environment when composing the media capture. Doing so improves media capture operations and reduces the risk of failing to capture transient events and/or content that may be missed if the capture operation is inefficient or difficult to use. Improving media capturing operations enhances the operability of the system and makes the user system interface more efficient (e.g., by helping the user provide appropriate input and reducing user errors in operating/interacting with the device).

In some implementations, the representation of the physical environment (e.g., 704 at fig. 7C) includes a portion (e.g., a first portion, a particular portion) of the physical environment (e.g., a first portion of the representation that is different from (e.g., the angular range of a second portion of the physical environment is narrower than the angular range of the first portion of the physical environment)) and the representation of the field of view of the one or more cameras included in the media capture preview (e.g., 708 at fig. 7C) includes a portion (e.g., 709C3 at fig. 7C) of the physical environment (e.g., the media capture preview includes objects that are also included in the representation of the physical environment). In some implementations, the second portion of the physical environment included in the representation of the physical environment has a different visual appearance (e.g., blurry and/or darkened)) than the second portion of the physical environment included in the media capture preview.

In some implementations, the computer system communicates (e.g., directly communicates (e.g., wired communication) and/or wirelessly communicates) with a physical input mechanism (e.g., 711a or 711B) (e.g., a hardware button) (e.g., a hardware input device/mechanism) (e.g., a physical input device), and the request to display the media capture user interface includes activation (e.g., actuation and/or selection (e.g., pressing a button)) of the physical input mechanism (e.g., activation of 711a at 712a of fig. 7B). Displaying a media capture preview comprising a representation of a portion of the field of view of the one or more cameras (e.g., a virtual representation and/or a virtual object) in response to detecting activation of the physical input mechanism allows the computer system to perform a display operation that provides the user with greater control over the computer system without requiring additional controls to be displayed, which provides additional control options without cluttering the user interface.

In some implementations, when a media capture preview is displayed, the computer system detects (e.g., via one or more input devices in communication with the computer system) an input (e.g., activation of 711a at 750C and/or 712a in fig. 7C) corresponding to a request to capture media. In response to detecting an input corresponding to a request to capture media and in accordance with a determination that the input corresponding to the request to capture media is of a first type (e.g., a single, rapid press of a hardware button (e.g., a virtual shutter button), a tap gesture on a touch-sensitive surface, or a rapid air gesture (e.g., an air gesture that is shorter in duration than a second type of input described below) (e.g., a pinch and release or any other suitable air gesture as described above with respect to selection of a virtual object in an XR environment), the computer system initiates a process (e.g., as described above with respect to selection of a virtual object in an XR environment) of capturing media of a first type (e.g., a static media (e.g., a photograph) using the one or more cameras of the computer system), and in accordance with a determination that the input corresponding to the request to capture media is of a second type (e.g., different from the first type) (e.g., a long press button, a gaze directed to a virtual shutter, a display of a virtual object in a space corresponds to a virtual object in a virtual shutter, a hold of a virtual object in a virtual environment, a touch gesture (e.g., a hold of a virtual object in a virtual shutter) and in a computing system) of a suitable type (e.g., a touch gesture) as described above with respect to the virtual object in an XR environment, or a suitable type of a virtual object in a computing system, and holding system, as described above with respect to fig. 7C) (e.g., video) (e.g., different from the first type of media content). Initiating a first type of media capturing procedure or a second type of media capturing procedure based on whether a short or long press type of input (e.g., short or long press) is received enables the computer system to provide the user with additional control options relative to the type of media capturing procedure being performed by the computer system without the need to display additional controls, which provides additional control options without cluttering the user interface.

In some implementations, the representation of the portion of the field of view of the one or more cameras included in the media capture preview (e.g., 708) has a first set of visual disparity attributes, and the representation of the physical environment has a second set of visual disparity attributes that are different from the first set of visual disparity attributes (e.g., as described above with respect to fig. 7F). In some implementations, the media capture preview includes a first foreground and a first background. In some implementations, as the portion of the physical environment in the portion of the field of view of the one or more cameras changes, there is a first offset between the first foreground and the first background. In some embodiments, the representation of the physical environment includes a second foreground and a second background. In some embodiments, a second offset (e.g., different from the first offset) exists between the second foreground and the second background as the portion of the physical environment in the portion of the field of view of the one or more cameras changes (e.g., the first foreground moves at a first speed relative to the first background and the second foreground moves at a second speed (different from the first speed) relative to the second background as the portion of the physical environment in the portion of the field of view of the one or more cameras changes). Providing a first set of visual parallax attributes to the representation of the portion of the field of view of the one or more cameras included in the media capturing preview and a second set of visual parallax attributes to the representation of the physical environment provides visual feedback to the user as to whether the computer system is in motion and also provides feedback as to which representation is a preview of the media capturing and which is the physical environment, which provides improved visual feedback.

In some implementations, the representation of the physical environment (e.g., 704) is an immersive view (e.g., a first person view) (e.g., a representation of the environment is presented from multiple perspectives in response to detecting a change in orientation of the user and/or the computer system)), and the representation of the portion of the field of view of the one or more cameras included in the media capture preview is a non-immersive view (e.g., a third person view) (e.g., as described above with respect to fig. 7C). In some implementations, the immersive view includes content of a representation of the environment including content of multiple views captured from the same point (e.g., location) in the environment. Providing a representation of the physical environment from an immersive perspective and providing a portion of the field of view of the one or more cameras included in the media capture preview from a non-immersive perspective provides improved visual feedback, making clear which representation is which, and also provides feedback on how the captured media can be previewed when displayed non-immersively (e.g., in a non-immersive album).

In some implementations, a third portion (e.g., the same or different portion as the first portion) of the representation of the physical environment (e.g., the representation of the entire environment) prior to detecting the request to display the media capture user interface (e.g., the first portion of the representation of the environment that is less than the representation of the entire environment) has a first visual appearance that includes a filter (e.g., darkened, blurred, and/or superimposed) having a first magnitude (e.g., no amount, low amount, non-zero amount (e.g., 0%, 10%, or 20% relative to a maximum amount). In some implementations, in response to detecting a request to display a media capturing user interface (e.g., selection of 750B or 712a to 711a in fig. 7B), the computer system changes a third portion of the representation of the physical environment to have a second visual appearance (e.g., different from the first visual appearance) including visual characteristics having a second magnitude, e.g., as described above with respect to fig. 7C. In some implementations, the second magnitude of the visual characteristic is different (e.g., greater than or less than) the first magnitude of the visual characteristic (e.g., as described above with respect to fig. 7C). In some embodiments, the magnitude of the visual characteristic is an amount of darkening applied to a third portion of the representation, and the first magnitude is that darkening is not applied before the request is received, and the second magnitude is that darkening of a non-zero level is applied. Changing the visual appearance of the third portion of the representation of the physical environment in response to detecting a request to display the media capturing user interface provides visual feedback to the user regarding the status of the computer system (e.g., the computer system has detected a request to display the media capturing user interface) and also emphasizes the media capturing preview, which provides improved visual feedback.

In some embodiments, displaying the media capturing preview includes (e.g., as part of) displaying (e.g., incorporating) one or more virtual objects (e.g., 713, 714, 715, and/or 716) in spatial relationship with the representation of the media capturing preview (e.g., when selected, causing the computer system to perform operations (e.g., capturing photos, capturing video, displaying previously captured photos, and/or any other suitable operations)) such as the display of the one or more virtual objects around the media capturing preview (e.g., the spatial relationship includes displaying a predefined distance and relative orientation of the one or more virtual objects relative to the display of the media capturing preview, and/or displaying the location (e.g., above, below, and/or to the side) of the one or more virtual objects relative to the display of the media capturing preview) (e.g., each of the one or more virtual objects having a corresponding spatial relationship with the media capturing preview, and the media capturing and the one or more virtual objects are displayed in a first position (e.g., in relation to the media capturing preview) such as in a computer system at a position e.g., fig. 7, e.g., fig. 714, fig. 7 and/or fig. 716, and a user-sensitive position (e.g., at a position of the computer system) and/or a computer system is discussed above (e.g., fig. 713), in response to detecting a pose change of the user's viewpoint, the computer system displays the media capturing preview and the one or more virtual objects at a second display position (e.g., a position 708, 713, 714, 715, and 716 in fig. 7F) different from the first position (e.g., a different position on the display generating component), and the computer system maintains a spatial relationship between the display of the media capturing preview and the one or more virtual objects. Maintaining a spatial relationship between the display of the media capture preview and the one or more virtual objects in response to detecting the pose change of the viewpoint provides visual feedback to the user that allows the user to easily locate the display of the one or more virtual objects, thereby providing improved visual feedback. Displaying one or more virtual objects based on the display of the media capturing preview by the computer system causes the computer system to perform a display operation that provides the user with additional control options related to the media capturing without further user input, which reduces the amount of input required to perform the operation.

In some embodiments, the one or more virtual objects include an elapsed time virtual object (e.g., 713) that provides an indication of an amount of time (e.g., seconds, minutes, and/or hours) that has elapsed since a process for capturing media (e.g., video recording) was initiated. Displaying the elapsed time virtual object indicating the amount of time that has elapsed since the computer system has initiated the process for capturing media provides visual feedback regarding the video media recording status of the computer system, which provides improved visual feedback.

In some embodiments, the one or more virtual objects include a shutter button virtual object (e.g., 714) (e.g., a software shutter button) that, when selected (e.g., via detection of a user's gaze directed to the shutter button virtual object (e.g., gaze and dwell), and in some embodiments, in conjunction with detection of a user performing one or more gestures (e.g., an air pinch gesture, an open air gesture, an air tap, and/or an air swipe) (e.g., as described above with respect to selection of a virtual object in an XR environment) and/or via detection of a tap of the shutter button virtual object), causes a process for capturing media to be initiated (e.g., causes a computer system to initiate a process for capturing media) (e.g., static media or video media) (e.g., capturing media using the one or more cameras in communication with a computer system). In some implementations, the display of the shutter button virtual object is updated to indicate that the computer system is recording video media. Displaying a shutter button virtual object anchored to the display of the media capturing preview provides visual feedback regarding the display status of the computer system (e.g., the computer system is currently displaying the media capturing preview) and also provides operations that can be performed by interacting with the object, which provides improved visual feedback.

In some embodiments, the one or more virtual objects include a camera film virtual object (e.g., 715) that, when selected (e.g., via detection of a user's gaze directed to the camera film virtual object (e.g., gaze and dwell), and in some embodiments, in conjunction with detection of a user performing one or more gestures (e.g., air pinch gesture, air spread gesture, air tap, and/or air swipe) (e.g., as described above with respect to selection of a virtual object in an XR environment) and/or via detection of a tap of the camera film virtual object), causes a display (e.g., via a display generating component) of a previously captured media item (e.g., causes a computer system to display a previously captured media item) (e.g., still media or video media) (e.g., previously captured using the one or more cameras in communication with the computer system) (e.g., previously captured media items using an external device (e.g., a smart phone) (e.g., a device separate from the system)). In some embodiments, in response to detection of a virtual camera that is selected, the virtual film system includes a preview system that provides a preview of a currently displayed media item (e.g., a preview image) of a preview of a virtual object that is being displayed to the computer system, and also provides operations that can be performed by interacting with the object, which provides improved visual feedback.

In some implementations, the computer system detects a selection of a camera film virtual object (e.g., 750 h) while the media capture preview is displayed. In some implementations, in response to detecting the selection of the camera film virtual object (e.g., 715 at fig. 7H), the computer system displays a representation (e.g., 730) of the previously captured media item via the display generation component (e.g., and ceases to display the media capture preview). In some implementations, displaying a representation of previously captured media includes displaying (e.g., concurrently displaying): a first dismissal virtual object (e.g., 733) that, when selected (e.g., selected via detection of a user's gaze directed to the first dismissal virtual object (e.g., gaze and dwell), and in some embodiments, in conjunction with detection of a user performing one or more gestures (e.g., air pinch gesture, spread air gesture, air flick, and/or air swipe) (e.g., as described above with respect to selection of virtual objects in an XR environment) and/or via a user flicking the first dismissal virtual object), Causing a display of the representation of the previously captured media item to cease (e.g., causing the computer system to cease displaying the previously captured media item); A shared virtual object (e.g., 735) that, when selected (e.g., via detection of a user's gaze directed at the shared virtual object (e.g., gaze and dwell), and in some embodiments, in conjunction with detection of a user performing one or more gestures (e.g., air pinch gesture, spread out air gesture, air tap, and/or air swipe) (e.g., as described above with respect to selection of a virtual object in an XR environment) and/or via a user tap of the shared virtual object), causes a process to be initiated for sharing a representation of a previously captured media item (e.g., with a contact stored on a computer system (e.g., External user) share representations) (e.g., cause a computer system to initiate a process for sharing representations of previously captured media items); A media library virtual object (e.g., 731) that, when selected (e.g., via detection of a user's gaze directed to a location where the media library virtual object is displayed (e.g., gaze and dwell), and in some embodiments, in conjunction with detection of a user performing one or more gestures (e.g., air pinch gesture, spread air gesture, air tap, and/or air swipe) (e.g., as described above with respect to selection of a virtual object in an XR environment) and/or via a user tapping the media library virtual object), causes a plurality of previously captured media items to be displayed (e.g., causes a computer system to display a plurality of previously captured media items (e.g., Non-immersive and/or immersive media items)); And/or resizing the virtual object (e.g., 736) indicating that the representation of the previously captured media item is to be resized (e.g., increased in size or decreased in size) based on detecting one or more gestures (e.g., gestures on the touch-sensitive display and/or air gestures (e.g., pinch-and-drag gestures in air, swipes in air, and/or flicks in air) (e.g., as described above with respect to selection of the virtual object in an XR environment)) such as detected by the one or more cameras in communication with the computer system. In some implementations, the magnitude of the change in size is based on a characteristic of the gesture (e.g., the amount of resizing is based on the distance of the drag motion of the pinch-and-drag gesture). In some implementations, selection of the first dismiss virtual object causes the media capture preview to be redisplayed. Displaying a plurality of virtual objects based on the display of previously captured media items provides visual feedback regarding the display state of the computer system (e.g., the computer system is currently displaying a representation of the previously captured media items) and also provides operations that may be performed by interacting with the objects, which provides improved visual feedback.

In some implementations, a computer system receives a set of one or more inputs that includes an input corresponding to a camera film virtual object (e.g., 715). In some implementations, the set of one or more inputs is a selection of a camera film virtual object followed by a selection of a virtual object corresponding to a subset of previously captured media items (e.g., all media items captured from an immersive perspective). In some embodiments, in response to receiving the set of one or more inputs, the computer system displays a representation (e.g., 730 at fig. 7I and 7J) of a first previously captured media item of the plurality of previously captured media items (e.g., still photo and/or video) (e.g., the first representation of the first previously captured media item is selectable) (e.g., the first previously captured media item is captured by the computer system) at a first location (e.g., a central location on the display generating component), and when the representation of the first previously captured media item is displayed at the first location, the computer system receives a request to navigate to a different previously captured media item of the plurality of previously captured media items (e.g., Pinch-in-air gestures, spread-out-air gestures, flicks in the air, and/or swipes in the air) (e.g., as described above with respect to selection of virtual objects in an XR environment). In some implementations, in response to receiving a request to navigate to a different one of the plurality of previously captured media items, the computer system replaces the display of the representation of the first previously captured media item at the first location with the display of the representation of a second previously captured (e.g., different from the first previously captured) media item of the plurality of previously captured media items (e.g., as described above with respect to fig. 7K). In some implementations, replacing the display of the representation of the first previously captured media item includes ceasing to display the first previously captured media item. In some implementations, when a second previously captured media item is displayed at a first location, the first previously captured media item is displayed (e.g., at a second location different from the first location). In some embodiments, the representation is displayed in place of a portion of the representation of the physical environment. In some implementations, when displaying a second previously captured media item, the computer system receives a second request to navigate to a different (e.g., different from the first previously captured media item and the second previously captured media item), wherein the second request is a repetition of the initial request (e.g., the initial request and the second request include the same type of gesture) and/or the second request includes a gesture that is opposite (e.g., performed in the opposite direction) to the gesture included in the initial request. In some implementations, the request to navigate to a previously captured media item of the plurality of previously captured media items is an air gesture (e.g., as described above with respect to selection of a virtual object in an XR environment), and the second previously captured media item is selected by the computer system based on a magnitude of the air gesture (direction of the air gesture, speed of the air gesture, and/or intensity of the air gesture). Replacing the display of the representation of the first previously captured media item at the first location with the display of the representation of the second previously captured media item provides visual feedback regarding the status of the computer system (e.g., when the first previously captured media item is displayed, the computer system has received a request to navigate to a different previously captured media item), which provides improved visual feedback.

In some embodiments, the one or more virtual objects include a second dismissal virtual object (e.g., 719) that, when selected (e.g., selected via detection of a user's gaze directed to a location at which the second dismissal virtual object is displayed (e.g., gaze and dwell), and in some embodiments, in conjunction with detection of a user performing one or more gestures (e.g., air pinch gesture, spread out air gesture, air flick, and/or air swipe) (e.g., as described above with respect to selection of virtual objects in an XR environment) and/or selected via a user flick of the second dismissal virtual object), causes a media capture preview (e.g., 708) to be stopped from being displayed (e.g., causes the computer system to stop displaying the media capture preview). In some implementations, ceasing to display the media capture preview causes a portion (e.g., a second portion) of the representation of the environment that was not displayed when the media capture preview was displayed to be displayed. Displaying a second dismissed virtual object anchored to the display of the media capturing preview provides visual feedback regarding the display state of the computer system (e.g., the computer system is currently displaying the media capturing preview) and also provides operations that can be performed by interacting with the object, which provides improved visual feedback.

In some implementations, the one or more virtual objects include a repositioned virtual object (e.g., 716), and the computer system detects a set of one or more inputs (e.g., gestures on the touch-sensitive surface and/or air gestures) that include inputs corresponding to the repositioned virtual object while the media capture preview (e.g., 708) is displayed at the first location in the media capture user interface. In some implementations, in response to detecting the set of one or more inputs including an input corresponding to repositioning the virtual object, the computer system moves the media capture preview from the first position to the second position (e.g., as described above with respect to fig. 7C) (e.g., moves the media capture preview based on a direction, magnitude, and/or speed of a request to move the display of the repositioning virtual object (e.g., along an x-axis, a y-axis, and/or a z-axis of the media capture user interface)). In some embodiments, the set of one or more inputs includes an input selecting to reposition the virtual object, followed by one or more inputs specifying a target location and/or a direction of movement. Displaying a repositioned object anchored to the display of the media capturing preview provides visual feedback regarding the display state of the computer system (e.g., the computer system is currently displaying the media capturing preview) and also provides operations that can be performed by interacting with the object, which provides improved visual feedback.

In some embodiments, when displaying a media capturing preview (e.g., 708), the computer system concurrently displays via the display generating component a media library virtual object (e.g., 731) that is selected when selected (e.g., via detection of a gaze (e.g., gaze and dwell) of a user pointing to a location where the media library virtual object is displayed, and in some embodiments, in conjunction with detection of a user performing one or more gestures (e.g., an air pinch gesture, an expand air gesture, an air tap, and/or an air swipe) (e.g., as described above with respect to selection of a virtual object in an XR environment) and/or via a user tapping the media library virtual object such that the plurality of previously captured media items are displayed (e.g., as described above with respect to fig. 7C)), such that the computer system displays the plurality of previously captured media items. And also provides operations that can be performed by interacting with the object, which provides improved visual feedback.

In some embodiments, the portion of the field of view of the one or more cameras included in the media capture preview (e.g., 708) (e.g., the media capture preview does not include information from the non-overlapping portion of the field of view of the one or more cameras) has a first visual angle range (e.g., 0-45 °, 0-90 °, 40 ° -180 °, or any other suitable angular range) (e.g., the portion of the field of view of the one or more cameras included in the media capture preview is based on an overlap of the fields of view of the one or more cameras) (e.g., the media capture preview includes only data from the overlapping portion of the fields of view of the one or more cameras) (e.g., the media capture preview does not include information from the non-overlapping portion of the field of view of the one or more cameras) (e.g., 704) and the representation of the physical environment (e.g., the image) represents a second field of view (e.g., the horizontal and/or vertical field of view of the one or more cameras) (e.g., the second field of view includes data from the first angle (e.g., 0-45 °, 0-90 °, 40 ° -180 °, or any other suitable angular range) and any other suitable angular range) (e.g., a wider angle than the first angle) (e.g., a wider angle than the second angle) (e.g., a wider angle than the first angle). In some embodiments, the first angular range is a subset of the second angular range. Displaying a representation of a physical environment that includes a wider field of view than the media capture preview provides the user with the ability to view additional content that may (but is not currently) included in the media capture preview as the user moves the point of view, while also enhancing the user's perception of their current physical environment when composing the media capture. Doing so improves media capture operations and reduces the risk of failing to capture transient events and/or content that may be missed if the capture operation is inefficient or difficult to use. Improving media capturing operations enhances the operability of the system and makes the user system interface more efficient (e.g., by helping the user provide appropriate input and reducing user errors in operating/interacting with the device).

In some implementations, the representation of the portion of the field of view of the one or more cameras included in the media capture preview (e.g., 708) includes first content (e.g., 709C3 and 709C4 in fig. 7F) (e.g., three-dimensional content) in the field of view of the first one of the one or more cameras (e.g., content captured by the first camera in response to detecting a request to capture media and/or content saved and/or stored by the computer system) and in the field of view of the second one of the one or more cameras (e.g., content captured by the second camera in response to detecting a request to capture media) (e.g., as described above in fig. 7C). In some implementations, portions of the physical environment that are in the field of view of the first camera but not in the field of view of the second camera are not included in the media capture preview. In some implementations, the portion of the physical environment that is in the field of view of the first camera but not in the field of view of the second camera is not included in the media capture preview, but is part of the representation of the physical environment.

In some embodiments, aspects/operations of methods 800, 900, 1000, 1200, 1400, and 1500 may be interchanged, substituted, and/or added between the methods. For example, the media items captured in method 800 may be displayed as part of method 1000. For the sake of brevity, these details are not repeated here.

FIG. 9 is a flowchart of an exemplary method 900 for displaying media previews according to some embodiments. In some embodiments, the method 900 is performed at a computer system (e.g., computer system 101 in fig. 1, and/or computer system 700) (e.g., a smart phone, a tablet, and/or a head-mounted device) in communication with a display generating component (e.g., display generating component 120 in fig. 1,3, and 4, and/or 702) (e.g., a display controller; a touch-sensitive display system; a display (e.g., integrated and/or connected), a 3D display, a transparent display, a projector, a heads-up display, and/or a head-mounted display) and one or more cameras (e.g., cameras pointing downward at the user's hand (e.g., color sensors, infrared sensors, and other depth sensing cameras) or cameras pointing forward from the user's head). In some embodiments, the computer system is in communication with one or more gaze tracking sensors (e.g., optical and/or IR cameras configured to track a gaze direction of a user and/or a user's attention of the computer system). In some embodiments, the computer system includes a first camera. In some embodiments, method 900 is managed by instructions stored in a non-transitory (or transitory) computer-readable storage medium and executed by one or more processors of a computer system (such as one or more processors 202 of computer system 101) (e.g., control 110 in fig. 1). Some operations in method 900 are optionally combined and/or the order of some operations is optionally changed.

When the viewpoint (e.g., and/or when the head of the user is detected) of the user (e.g., 712) (e.g., and/or of the computer system) is in a first pose (e.g., oriented and/or positioned in a physical environment and/or virtual environment), the computer system displays (902) an augmented reality user interface via a display generating component, the augmented reality user interface including a preview (e.g., a real-time preview) (e.g., 708 at fig. 7E) of a field of view of the one or more cameras (e.g., virtual camera or physical camera configured to communicate with the computer system) overlaid on a three-dimensional environment (e.g., 704) (e.g., an environment visible on the display generating component) where the preview includes a representation (e.g., three-dimensional representation, spatial representation, and/or two-dimensional representation) of the first portion of the three-dimensional environment and is displayed with a corresponding spatial configuration (e.g., 708 at fig. 7C-7E) relative to the viewpoint of the user. In some embodiments, the field of view of the first camera is a portion of the field of view of the first camera, rather than the entire field of view of the first camera. In some implementations, the preview includes a representation of at least a portion of a field of view of a second camera (e.g., different from the first camera), the representation including a representation of the portion of the physical environment. In some implementations, the field of view of the second camera is a portion of the field of view of the second camera. In some embodiments, the preview is displayed in the center (e.g., middle) of the computer system (e.g., center of the display generating component), near the nose of the user wearing the computer system, center of the three-dimensional representation of the physical environment

The computer system detects (904) a change in pose to a viewpoint of the user (and/or a head of the user) from a first pose to a second pose different from the first pose (e.g., as described above with respect to fig. 7E-7F) (e.g., when an augmented reality environment user interface including a preview of a field of view of the first camera is displayed, the preview is overlaid on a first location on the augmented reality user interface). In some embodiments, when the computer system and/or the user's point of view is in the second pose, the field of view of the first camera is directed toward the second portion of the physical environment while the first camera is in the second pose and/or the second portion of the physical environment is visible in the user's point of view.

In response to detecting a change in the pose of the user's view point from a first pose to a second pose, the computer system shifts (906) (e.g., translates, changes, and/or updates) the preview of the view field of the one or more cameras away from (e.g., changes and/or transitions to display) the corresponding spatial configuration (e.g., 708 at fig. 7F and 7G) relative to the user's view point in a direction (e.g., as opposed to) determined based on the change in the pose of the user's view point from the first pose to the second pose (e.g., based on the direction and/or speed of the pose change of the view point), wherein shifting of the preview of the view field of the one or more cameras (e.g., 708 at fig. 7F and 7G) occurs at a first speed, wherein the representation of the three-dimensional environment shifts at a different (e.g., faster or slower) than (e.g., at the same rate as) the second speed of the preview change (e.g., at the same rate of the same as the second speed of the preview change based on the pose change of the user's view point. In some implementations, the preview is updated at a first speed to include a representation of the second portion of the environment while moving at a second speed. In some embodiments, one or more objects in the representation of the display field of view are updated at the same rate as the one or more objects in the viewpoint of the physical environment. In some embodiments, the first portion of the environment ceases to be visible in response to detecting a change in the orientation of the viewpoint of the user from the first orientation to the second orientation. In some embodiments, the first portion of the environment is not surrounded by, surrounds, is separate from, is different from, does not include, and vice versa. In some embodiments, the location included in the first portion of the environment is different from the location included in the first portion of the environment. Shifting the previews of the fields of view of the one or more cameras based on the pose of the user's viewpoint changing at a first speed while the representation of the three-dimensional environment changes based on the pose of the user's viewpoint changing at a second speed different from the first speed (e.g., in response to detecting a change in the pose of the user's viewpoint from the first pose to the second pose), allows the computer system to automatically perform operations that reduce the amount of movement between elements shown to the user and reduce the probability of motion sickness, which performs operations when a set of conditions has been met without further user input. Doing so may also prompt the user to reduce the change in viewpoint while in the media capturing mode, as the user may tend to remain focused on the preview. Reducing the change in viewpoint while capturing media may improve media capturing operations and reduce the risk of failing to capture transient events and/or content that may be missed if the capturing operation is inefficient or difficult to use. Improving media capturing operations enhances the operability of the system and makes the user system interface more efficient (e.g., by helping the user provide appropriate input and reducing user errors in operating/interacting with the device).

In some embodiments, the computer system (e.g., 700) tracks the pose change of the viewpoint of the user (e.g., 712) with a first amount of tracking lag (e.g., as described above with respect to fig. 7E-7F) (e.g., the actual change of viewpoint and the delay between the computer system configured to track the amount of movement/degree (e.g., a delay of 0.1 meter/second, 0.2 meter/second, 0.3 meter/second, 0.4 meter/second, or 0.5 meter/second)), and the first speed introduces an amount of visual delay when updating the position of the preview (e.g., 708) of the field of view of the one or more cameras that is greater than the amount of visual delay (e.g., the detection lag is 0.1 meter/second, and the first speed is 0.09 meter/second) that would be introduced based on (e.g., based only on) the first amount of detection tracking lag (e.g., as described above with respect to fig. 7F). In some implementations, the maximum speed of the shift of the preview of the field of view of the one or more cameras is limited (e.g., upper bound) to a value that is below the amount of tracking lag (e.g., current amount). In some embodiments, the amount of tracking lag is proportional to the rate of pose change of the viewpoint). Maintaining the first speed below the amount of tracking lag reduces the occurrence of display disruption of the preview caused by the tracking lag and automatically adjusts the movement speed of the preview based on the system parameters without further user input.

In some implementations, when a preview of the field of view of the one or more cameras is being offset at a first speed (e.g., 708 in fig. 7F and 7G), the computer system detects that the user's point of view is changing the gesture by an amount less than a first threshold amount (e.g., changes or does not change (e.g., a resting pose) at a rate less than a threshold value (e.g., the user is currently positioned in a second pose), in some embodiments, in response to detecting that the user's viewpoint is changing the gesture by an amount less than the threshold amount (e.g., as described above with respect to fig. 7G), the computer system offsets the previews of the fields of view of the one or more cameras toward the respective spatial configurations at a third speed greater than the first speed (e.g., as described above with respect to fig. 7H) (e.g., in response to detecting that the user has stopped moving, the previews of the one or more cameras "spring back" to their original positioning and have a corresponding spatial configuration with respect to the user's point of view.) in some embodiments, when the computer system shifts the previews of the fields of view of the one or more cameras toward the corresponding spatial configuration, the computer system shifts the previews in the opposite direction (e.g., in response to detecting that the viewpoint of the user is changing pose less than a threshold amount, shifting the previews of the fields of view of the one or more cameras toward the respective spatial configuration causes the computer system to display the previews of the fields of view on the display at a location centered in the field of view of the user, doing so without further user input.

In some implementations, when the previews of the fields of view of the one or more cameras are being offset at a first speed (e.g., 708 in fig. 7F and 7G), the computer system detects that the user's point of view is changing by an amount (e.g., as described above with respect to fig. 7G) that is less than a second threshold amount (e.g., changing or not changing at a rate less than the threshold (e.g., a stationary pose)) (e.g., a non-zero amount). In some implementations, in response to detecting that the user's viewpoint is changing by an amount that is less than the second threshold amount, the computer system stops shifting (e.g., automatically (e.g., without user input) the previews of the fields of view of the one or more cameras away from the respective spatial configurations (e.g., as described above in fig. 7H), and the computer system displays the previews of the fields of view of the one or more cameras (e.g., 708 at fig. 7H) with the respective spatial configurations relative to the user's viewpoint (e.g., the previews of the fields of view of the one or more cameras are displayed in the locations displayed by the previews of the fields of view prior to the computer system detecting movement of the pose change of the user's viewpoint) (e.g., in some embodiments, after ceasing to shift the previews of the fields of view of the one or more cameras, in response to the computer system detecting that the user's view point is changing pose a second time at a rate that is greater than a second threshold amount, the previews of the fields of view of the one or more cameras are shifted away from the corresponding spatial configuration a second time at a rate that is different from the rate at which the representation of the three-dimensional environment changes.

In some implementations, the change in pose (e.g., as described above with respect to fig. 7F) includes (e.g., corresponds to) a lateral movement along a plane of the user's viewpoint (e.g., the pose change of the user's viewpoint corresponds to the user's viewpoint moving left and/or right from the user's viewpoint). Offsetting the previews of the fields of view of the one or more cameras based on the pose change of the user's viewpoint in the lateral direction provides visual feedback regarding the direction of movement of the computer system, which provides improved visual feedback.

In some implementations, the change in pose (e.g., as described above with respect to fig. 7F) includes (e.g., corresponds to) a longitudinal movement along a plane of the user's viewpoint (e.g., the change in pose of the user's viewpoint corresponds to the user's viewpoint moving up and/or down). In some embodiments, when the user is in the first position, the user's head is located at a first distance from the ground, and when the user is in the second position, the user's head is located at a second distance from the ground, the second distance being greater/less than the first distance. Offsetting the previews of the fields of view of the one or more cameras based on the pose change of the user's viewpoint in the portrait orientation provides visual feedback regarding the direction of movement of the computer system, which provides improved visual feedback.

In some implementations, the change in pose (e.g., as described above with respect to fig. 7F) includes (e.g., corresponds to) a forward-backward movement of a plane perpendicular to the user's viewpoint (e.g., detecting that the change in pose of the user's viewpoint corresponds to the user's viewpoint moving forward and/or backward). In some embodiments, the computer system moves backward within the physical environment while the viewpoint of the user is in the first pose, which causes the computer system to cease displaying previews of the field of view of the one or more cameras. Shifting the previews of the fields of view of the one or more cameras based on the pose change of the user's viewpoint in the forward and/or backward direction provides visual feedback regarding the direction of movement of the computer system, which provides improved visual feedback.

In some embodiments, the preview of the field of view of the one or more cameras (e.g., 708) does not cover a second portion of the three-dimensional environment visible from the user's point of view (e.g., the portion of fig. 7F that includes 704 of the second individual 709c 2). In some embodiments, the three-dimensional physical environment is a real-world environment in which the user is currently located. In some embodiments, the camera preview covers at least a portion of the view of the physical world, and at least a portion of the view of the physical world is displayed near or adjacent to an edge of the camera preview. Displaying a preview of the field of view of the one or more cameras while at least a portion of the three-dimensional environment is visible provides the user with the ability to better compose and capture the desired media while also maintaining a perception of the physical environment, which improves media capture operations and reduces the risk of failing to capture transient events that may be missed if the capture operations are inefficient or difficult to use. Improving media capturing operations enhances the operability of the system and makes the user system interface more efficient (e.g., by helping the user provide appropriate input and reducing user errors in operating/interacting with the device).

In some implementations, the representation of the three-dimensional environment included in the media capturing preview (e.g., 708) changes based on the pose change of the user's viewpoint (e.g., as described above with respect to fig. 7G) and when a change in the pose of the user's viewpoint from a first pose to a second pose is detected (e.g., as described above with respect to fig. 7F), the computer system performs a first visual stabilization on the representation of the three-dimensional environment included in the preview of the field of view of the one or more cameras (e.g., automatically (e.g., without user input) (e.g., the first visual stabilization corresponds to digital image stabilization) (e.g., to average (e.g., attenuate) involuntary head movements (e.g., head moments performed by the user that do not correspond to a change in pose of the user's viewpoint from the first pose to the second pose) (e.g., in some embodiments, the first visual stabilization is an optical image stabilization technique that adjusts one or more glass elements inside one or more cameras in communication with the computer system based on a change in pose of a user's point of view, in some embodiments, the first visual stabilization includes a digital stabilization technique that involves scaling and/or cropping a representation of a three-dimensional environment included in the media capture preview based on a change in pose of a user's point of view, the display operation increases the clarity of a representation of the three-dimensional environment that is included in the field of view of the one or more cameras from which the user views without requiring user input, which reduces the amount of input required to perform the operation. Applying the first visual stabilization to the representation of the three-dimensional environment improves the media capturing operation and reduces the risk of failing to capture transient events that may be missed if the capturing operation is inefficient or difficult to use. Improving media capturing operations enhances the operability of the system and makes the user system interface more efficient (e.g., by helping the user provide appropriate input and reducing user errors in operating/interacting with the device).

In some embodiments, performing the first stabilization includes applying a first amount of visual stabilization to a representation of the three-dimensional environment included in a preview (e.g., 708) of the field of view of the one or more cameras and, when a change in pose of the view of the user from the first pose to the second pose is detected (e.g., as described above with respect to fig. 7E-7G), the computer system performs (e.g., via the computer system) a second portion of the representation of the three-dimensional environment (e.g., automatically performs (e.g., without user input) a second visual stabilization (e.g., the second visual stabilization corresponds to the digital visual stabilization) (e.g., the representation of the three-dimensional environment is not included in the preview of the field of view of the one or more cameras) (e.g., and does not perform a second visual stabilization on the preview of the field of view of the one or more cameras). In some embodiments, the second visual stabilization applies a second amount of visual stabilization that is less than the first amount of visual stabilization to a second portion of the representation of the three-dimensional environment (e.g., as described above with respect to fig. 7F) (e.g., the first visual stabilization is included in the preview of the field of view of the one or more cameras is performed in the same manner as the first visual stabilization, the first visual stabilization is performed in the preview of the three-dimensional environment) (e.g., the second visual stabilization is performed in some embodiments), this automatically causes the computer system to perform a display operation that increases the clarity of the user's view of the second portion of the representation of the three-dimensional environment without requiring user input, which reduces the amount of input required to perform the operation. Applying less stabilization to the representation of the three-dimensional environment provides more accurate visual feedback about the current physical environment, which improves visual feedback.

In some embodiments, aspects/operations of methods 800, 900, 1000, 1200, 1400, and 1500 may be interchanged, substituted, and/or added between the methods. For example, the media capture previews displayed in method 800 are optionally offset using the method described in method 900. For the sake of brevity, these details are not repeated here.

Fig. 10 is a flowchart of an exemplary method 1000 for displaying previously captured media, according to some embodiments. In some embodiments, method 1000 is performed at a computer system (e.g., computer system 101 in fig. 1, and/or computer system 700) (e.g., a smart phone, a tablet, and/or a head-mounted device) in communication with a display generation component (e.g., display generation component 120 in fig. 1,3, and 4, and/or 702) (e.g., a display controller; a touch-sensitive display system; a display (e.g., integrated and/or connected), a 3D display, a transparent display, a projector, a heads-up display, and/or a head-mounted display). In some embodiments, the computer system communicates with one or more gaze tracking sensors (e.g., optical and/or IR cameras configured to track a gaze direction of a user of the computer system). In some embodiments, method 1000 is managed by instructions stored in a non-transitory (or transitory) computer-readable storage medium and executed by one or more processors of a computer system (such as one or more processors 202 of computer system 101) (e.g., control 110 in fig. 1). Some operations in method 1000 are optionally combined and/or the order of some operations is optionally changed.

Upon displaying the augmented reality environment user interface, the computer system detects (1002) a request to display captured media (e.g., as described above with respect to fig. 7K) that includes immersive content (e.g., 730) that, when viewed from a respective range of one or more viewpoints, provides a visual response that a user is surrounded at least in part by content (e.g., immersive or semi-immersive visual media, three-dimensional, stereoscopic media, and/or spatial media) (e.g., video content that may be presented from multiple viewpoints in response to a detected change in orientation of the user and/or the computer system). In some implementations, the media content is immersive or semi-immersive visual media. In some implementations, the immersive or semi-immersive visual media is visual media that includes content of multiple perspectives captured from the same first point (e.g., location) in the physical environment at a given point in time. In some implementations, replaying (e.g., playing back) visual media from an immersive (e.g., first person) perspective includes playing media content from a perspective that matches a first point in the physical environment at a given point in time, and a plurality of different perspectives (e.g., fields of view) may be provided all from the first point in the physical environment in response to user input. In some implementations, playing back visual media from a non-immersive (e.g., third person) perspective includes playing media content from a perspective other than the first point in the physical environment (e.g., an offset of the perspective (e.g., that occurs in response to user input)). In some implementations, multiple perspectives corresponding to a first point in a physical environment are not displayed in response to user input when replayed in a non-immersive perspective. In some embodiments, as part of detecting a request to play back captured media, the computer system detects a user's attention (e.g., user gaze and/or user viewpoint pointing) to one or more locations on the computer system (e.g., one or more locations corresponding to one or more virtual objects), detects one or more inputs located on the computer system and/or on one or more hardware input mechanisms coupled to the computer system, and/or detects one or more voice commands directed to play back captured media. In some implementations, media is captured using one or more of the techniques discussed above with respect to fig. 7C-7D). In some implementations, the captured media is not non-immersive (e.g., non-immersive visual media and/or non-semi-immersive visual media). In some implementations, the non-immersive video media is video content that cannot be presented from multiple perspectives in response to a detected change in orientation of a user and/or a computer system. In some implementations, the non-immersive video media may be presented in only one perspective (e.g., a first person perspective or a third person perspective) regardless of whether the computer system detects a change in orientation of the computer system and/or a user of the computer system.

In response to detecting a request to display captured media, the computer system displays (1004) the captured media (e.g., displays the captured media being played back) as a three-dimensional representation (e.g., 740) (e.g., a non-static representation) of the captured media, the three-dimensional representation being displayed at a location in a three-dimensional environment selected by the computer system (e.g., a location of 740a in 701) such that a first viewpoint of a user (e.g., 712) is outside of a respective range of one or more viewpoints (e.g., non-immersive viewpoints) (e.g., a third person viewing angle). In some implementations, the three-dimensional representation of the captured media replaces one or more portions of a representation of a previously displayed (e.g., displayed prior to receiving a request to play back the captured media) environment (e.g., a virtual environment and/or a physical environment) (e.g., using one or more techniques as described above with respect to fig. 7C). In some implementations, the computer system displays a three-dimensional representation of the captured media over the shutter button virtual object (e.g., as described above with respect to fig. 7C) and/or in the center of the display generating component. In some implementations, in response to detecting a request to display captured media, the computer system displays a plurality of representations of previously captured media items while displaying a three-dimensional representation of the captured media. In some implementations, in response to receiving a request to play back captured media, the computer system causes a display of a representation (e.g., a static representation) of the captured media to cease. Displaying the captured media as a three-dimensional representation at a location that satisfies a set of prescribed conditions (e.g., the location is outside of a respective range of one or more viewpoints) automatically allows the computer system to render the three-dimensional representation from a non-immersive perspective without user input, which reduces the amount of input required to perform the operation. This also provides improved visual feedback that the initially displayed viewpoint is not within the corresponding range of one or more viewpoints.

In some embodiments, the location selected by the computer system is a location in a physical environment (e.g., the location of 740a in 701), the three-dimensional representation of the captured media is a virtual object that is environment-locked, and upon displaying the three-dimensional representation of the captured media, the computer system detects that the user's point of view has changed (e.g., as described above with respect to fig. 7K-7P) (e.g., the user moves left/right, up/down, and/or toward/away in a real-world environment) (e.g., the user looks up, down, right, or left) (e.g., the field of view of one or more cameras integrated into the computer system has changed) (e.g., detected via one or more cameras integrated into the computer system) (e.g., detected via an external device (e.g., a smartwatch, one or more cameras, and/or an external computer system) in communication (e.g., wireless communication) with the computer system). In some implementations, in response to detecting that the viewpoint of the user has changed, the computer system maintains a display of a three-dimensional representation of the captured media at a location in the three-dimensional environment selected by the computer system (e.g., as described above with respect to fig. 7K). In some embodiments, the three-dimensional representation is environment-locked). In some implementations, in response to detecting that the viewpoint of the user has changed, the three-dimensional representation of the captured media changes in size (e.g., the three-dimensional representation of the captured media is displayed as larger and/or smaller). In some implementations, in response to detecting that the viewpoint of the user has changed, displaying the three-dimensional representation of the captured media is stopped. Maintaining the display of the three-dimensional representation of the captured media at the location in the three-dimensional environment selected by the computer system in response to detecting that the viewpoint of the user has changed causes the computer system to automatically perform a display operation that allows the user to move and maintain a view of the three-dimensional representation of the captured media without user input, which reduces the amount of input required to perform the operation.

In some embodiments, the first viewpoint of the user corresponds to a first viewpoint location a first distance (e.g., 10 meters, 5 meters, or 3 meters) from a location selected by the computer system, and when the three-dimensional representation of the captured media is displayed at the location selected by the computer system (e.g., location 740 at fig. 7K), and when the user is at the first viewpoint location, the computer system detects a pose change of the viewpoint of the user to a second viewpoint of the user corresponding to a second viewpoint location (e.g., repositioning of the entire body of the user, repositioning of a portion of the body (e.g., head, hand, arm, and/or leg) of the user (e.g., repositioning of the user's head) as described above) that is a second distance (e.g., 5 meters, 3 meters, or1 meter) from the location selected by the computer system (e.g., as described above in fig. 7L). In some implementations, in response to detecting a change in the pose of the user's viewpoint to the user's second viewpoint, the computer system displays a three-dimensional representation (e.g., 740 at fig. 7L) having a second set of visual cues that the user is at least partially surrounded by content. In some embodiments, the second set of visual cues includes at least a second visual cue that the user is at least partially surrounded by content that is not provided when the three-dimensional representation (e.g., 740 at fig. 7L) is displayed when viewed from the user's first viewpoint. In some implementations, as the user's point of view moves closer to the location selected by the computer system, the three-dimensional representation of the captured media becomes more immersive (although not as immersive as when the user's point of view is within the range of one or more points of view providing the first set of visual cues). In some embodiments, the second set of visual cues does not include at least a first visual cue (e.g., visual indication of depth, visual indication of view angle, visual response to an orientation shift of the user's point of view) that the user is at least partially surrounded by content included in the first set of visual cues. In some embodiments, the second set of visual cues does not include at least a first visual cue (e.g., visual indication of depth, visual indication of view angle, visual response to an orientation shift of the user's point of view) that the user is at least partially surrounded by content included in the first set of visual cues. Displaying the three-dimensional representation with the different set of visual cues as the user moves closer to the location selected by the computer system causes the computer system to automatically perform a display operation that allows the user to perceive the three-dimensional representation of the captured media differently based on the position of the user's point of view relative to the location selected by the computer without requiring user input, which does so without further user input. Displaying the three-dimensional representation with the different set of visual cues as the user moves closer to the location selected by the computer system provides the user with visual feedback as to what action the user needs to take in order to display the three-dimensional representation from an immersive view, which provides improved visual feedback.

In some implementations, the computer system detects a pose change of the user's viewpoint to a third viewpoint of the user corresponding to a third viewpoint location (e.g., as described above with respect to fig. 7M) when the user is at the second viewpoint location and when the three-dimensional representation of the captured media is displayed. In some implementations, in response to detecting a pose change of a user's viewpoint to a third viewpoint of the user and in accordance with a determination that a first set of display criteria is satisfied, wherein the first set of display criteria includes the first criteria being satisfied when a distance between the third viewpoint location and a location selected by the computer system is greater than a first threshold distance (e.g., 0.1 meter, 0.5 meter, 1 meter, 2 meters, 3 meters) (e.g., the user moves past the location selected by the computer system), the computer system ceases to display a three-dimensional representation of the captured media (e.g., as described above with respect to fig. 7M). In some embodiments, the change in pose from the second view of the user to the third view of the user is in the same direction (e.g., along the same plane) as the change in pose from the first view of the user to the second view of the user (along the same path) (e.g., the direction component of the pose change vector from the first view of the user to the second view of the user is the same as the direction component of the pose change vector from the second view of the user to the third view of the user). In some embodiments, the computer system maintains the display of the three-dimensional representation of the captured media in response to detecting the pose change of the view of the user to the third view of the user and in accordance with the determination that the first set of display criteria is not met. In some implementations, the computer system stops displaying the three-dimensional representation of the captured media when the distance from the third viewpoint location to the location selected by the computer system is greater than the first threshold distance. In some implementations, the first set of display criteria includes criteria that are met when the location selected by the computer system is not visible from the third viewpoint of the user (e.g., the location selected by the computer system is no longer in front of the user). Stopping displaying the three-dimensional representation in accordance with a determination that the distance between the third viewpoint position and the position selected by the computer system is greater than the first threshold distance provides visual feedback regarding positioning (e.g., positioning of the computer system relative to the position selected by the computer system), which provides improved visual feedback.

In some implementations, the computer system detects a pose change of the user's viewpoint to a fourth viewpoint of the user corresponding to the fourth viewpoint location (e.g., as described above with respect to fig. 7P) when the user is at the second viewpoint location and when the three-dimensional representation of the captured media is displayed. In some embodiments, in response to detecting a change in pose of the user's viewpoint to the user's fourth viewpoint and in accordance with a determination that a second set of display criteria is satisfied, wherein the second set of display criteria includes a second criterion that is satisfied when a distance between the fourth viewpoint location and a location selected by the computer system is less than a second threshold distance (e.g., 0.1 meter, 0.5 meter, 1 meter, 2 meters, or 3 meters) (e.g., the user moves toward the location selected by the computer system), the computer system displays a three-dimensional representation of the captured media at a respective location selected by the computer system that is farther from the fourth viewpoint location than the location selected by the computer system (e.g., As described above with respect to fig. 7P) (e.g., the three-dimensional representation of the captured media is moved away from the user's current location). In some embodiments, the respective location is a location in the physical environment visible from a fourth viewpoint of the user. In some implementations, the second set of display criteria includes criteria that are met when the location selected by the computer system is not visible from the fourth viewpoint of the user (e.g., the location selected by the computer system is no longer in front of the user). In some embodiments, the respective location is a location visible from a fourth viewpoint of the user. In some embodiments, when the distance from the fourth viewpoint location to the location selected by the computer system is greater than the second threshold distance, the computer system displays a three-dimensional representation of the captured media at the corresponding location selected by the computer system. In some implementations, in accordance with a determination that the viewpoint location has moved past the location of the three-dimensional representation (e.g., when the distance between the fourth viewpoint location and the location selected by the computer system is greater than (or less than) the third threshold distance), the computer system displays a three-dimensional representation of the captured media at the respective location selected by the computer system that is farther from the fourth viewpoint location than the location selected by the computer system. In some embodiments, the change in pose from the second view of the user to the fourth view of the user is in the same direction as the change in pose from the first view of the user to the second view of the user (e.g., the direction component of the pose change vector from the first view of the user to the second view of the user is the same as the direction component of the pose change vector from the second view of the user to the fourth view of the user)). Displaying the three-dimensional representation of the captured media at a corresponding location selected by the computer system that is farther from the fourth viewpoint location than the location selected by the computer system when certain prescribed conditions are met automatically changes the display of the three-dimensional representation such that the three-dimensional representation of the captured media is readily viewable by a user, which performs an operation without further user input when a set of conditions has been met. Displaying the three-dimensional representation of the captured media at a respective location that is farther from the fourth viewpoint location than the location selected by the computer system provides visual feedback to the user regarding the positioning of the computer system (e.g., the positioning of the computer system relative to the location selected by the computer system), which provides improved visual feedback.

In some embodiments, the first viewpoint of the user corresponds to a fifth viewpoint location a fifth distance from the location selected by the computer system, and when the three-dimensional representation of the captured media is displayed at the computer-selected location and the user is at the fifth viewpoint location, the computer system detects a change in pose of the user's viewpoint to a sixth viewpoint of the user corresponding to the sixth viewpoint location (e.g., repositioning of the entire body of the user, repositioning of a portion of the body of the user (e.g., head, hand, arm, and/or leg)), as described above (e.g., the change in positioning of the user is detected by the user's one or more cameras and/or external devices, as described above), the sixth viewpoint location is a sixth distance from the location selected by the computer system (e.g., as described above with respect to fig. 7P). In some embodiments, in response to detecting a change in the pose of the user's viewpoint to the user's sixth viewpoint, the computer system displays a three-dimensional representation having a third set of visual cues at least partially surrounded by content by the user. In some implementations, as the user's viewpoint moves farther from the location selected by the computer system, the three-dimensional representation of the captured media becomes less immersive (although not as immersive as when the user's viewpoint is within the range of one or more viewpoints providing the first set of visual cues). In some embodiments, the third set of visual cues does not include at least a third visual cue that is surrounded by content that is at least partially included in the first set of visual cues by the user. In some embodiments, the third set of visual cues does not include at least a fourth visual cue that the user is at least partially surrounded by content that is provided when the three-dimensional representation is displayed when viewed from the first viewpoint of the user. Displaying the three-dimensional representation with the different set of visual cues as the user moves farther from the location selected by the computer system causes the computer system to perform a display operation that allows the user to perceive the three-dimensional representation of the captured media differently based on the user's point of view without displaying additional controls, which provides additional control options without cluttering the user interface.

In some implementations, the three-dimensional representation of the captured media includes a plurality of virtual objects including a first virtual object and a second virtual object, and the computer system detects a pose change (e.g., a change in the positioning of the user's entire body, a change in the positioning of the first portion of the user's body) of the user's viewpoint to a seventh viewpoint of the user (e.g., a change in the viewpoint of the user) (e.g., a lateral movement of the user, a side-to-side movement of the user, and/or a movement of the user along a horizontal plane). In some embodiments, in response to detecting a pose change of the user's viewpoint to the seventh viewpoint of the user, the computer system displays, via the display generation component, a first virtual object (e.g., a foreground of content in 740) that moves relative to a second virtual object (e.g., a background of content in 740) based on the pose change of the user (e.g., as described above with respect to fig. 7K) (e.g., displays a parallax effect in which the first virtual object and the second virtual object are differently offset as the pose of the user's viewpoint changes) (e.g., the first virtual object is displayed in the foreground of the three-dimensional representation of the captured media and moves at a first variable speed that is based on the pose change of the user's viewpoint and the second virtual object is displayed in the background of the three-dimensional representation of the captured media and moves at a second variable speed that is based on the pose change of the user's viewpoint). Displaying the first virtual object moving relative to the second virtual object in response to detecting a pose change of the viewpoint of the user provides visual feedback to the user regarding depth data associated with the captured media, which provides improved visual feedback.

In some implementations, the three-dimensional representation of the captured media (e.g., 740) is displayed as a projection of a first type (e.g., a shape of the first type), and upon displaying the three-dimensional representation of the captured media as the first projected shape, the computer system detects a request (e.g., a selection of 736) (e.g., a selection of a virtual arrow object) (e.g., one or more gestures corresponding to a selection of a virtual object) to display the three-dimensional representation as a projection of a second type that is different from the projection of the first type (e.g., a different shape, a different size, and/or a display at a different location). In some embodiments, in response to detecting a request to display the three-dimensional representation as a projection of the second type, the computer system displays the three-dimensional representation as a projection of the second type (e.g., as described above with respect to fig. 7K). In some embodiments, displaying the three-dimensional representation as a second type of projection includes displaying less captured media than when the three-dimensional representation is displayed as a first type of projection. In some implementations, displaying the three-dimensional representation as a first type of projection (e.g., a spherical projection) includes deforming the three-dimensional representation of the captured media along edges of the projection, and displaying the three-dimensional representation as a second type of projection (e.g., a flat projection) does not include deforming the three-dimensional representation of the captured media. In some implementations, displaying the three-dimensional representation as a first type of projection (e.g., a spherical projection) includes displaying the captured media along three axes (e.g., an x-axis, a y-axis, and a z-axis), and displaying the three-dimensional representation as a second type of projection (e.g., a flat projection) includes displaying the captured media along two axes (e.g., the y-axis and the z-axis). Displaying the three-dimensional representation as a second type or projection provides visual feedback regarding the state of the computer system (e.g., the computer system has detected a request to display the three-dimensional representation as a projection of the second type while the three-dimensional representation is being displayed as a first projected shape), which provides improved visual feedback.

In some embodiments, the first type of projection and the second type of projection are independently selected from the group consisting of: spherical stereoscopic projection (e.g., as described above with respect to fig. 7K) and flat stereoscopic projection (e.g., the shape of 730 in fig. 7I and 7J).

In some implementations, the three-dimensional representation of the captured media is displayed at a first size (e.g., 740 at fig. 7K), and upon displaying the three-dimensional representation at the first size, the computer system detects (e.g., via one or more cameras integrated into the computer system) a set of one or more gestures (e.g., pinch and/or spread) (e.g., as described above with respect to selection of virtual objects in an XR environment) (e.g., one or more gestures on a touch-sensitive surface or one or more air gestures) (e.g., as described above in fig. 7K). In some implementations, in response to detecting the set of one or more gestures, the computer system expands the display of the three-dimensional representation of the captured media to a second size (e.g., a size of 740 at fig. 7L) that is larger than the first size. In some implementations, the size of the three-dimensional representation is reduced in response to detecting the set of one or more gestures. In some implementations, displaying the three-dimensional representation of the captured media at the second size covers more than a predetermined amount or all of the augmented reality environment user interface (e.g., the entire augmented reality environment user interface). Expanding the display of the three-dimensional representation of the captured media (e.g., as described above with respect to the selection of virtual objects in the XR environment) in response to detecting a set of one or more gestures (e.g., air gestures, air movements) causes the computer system to perform display operations without displaying additional controls, which provides additional control options without cluttering the user interface. Expanding the display of the three-dimensional representation of the captured media in response to detecting the set of one or more gestures provides visual feedback regarding the state of the computer system (e.g., the computer system has detected a request to expand the display of the three-dimensional representation while the three-dimensional representation is being displayed at the first size), which provides improved visual feedback.

In some implementations, before detecting the set of one or more gestures, the augmented reality environment user interface includes a first portion (e.g., 709c1 at fig. 7K) of a representation (e.g., virtual representation and/or optical representation) of the physical environment (e.g., the physical environment corresponding to a location of the computer system), and expanding the display of the three-dimensional representation of the captured media to a second size that is larger than the first size includes displaying the three-dimensional representation of the captured media (e.g., a portion of the representation) as the second size (e.g., 740 at fig. 7L) in place of the first portion (e.g., 709c1 at fig. 7L) of the representation of the physical environment. In some embodiments, in response to detecting the set of one or more gestures, a representation of the physical environment (e.g., a representation of the entire physical environment) is replaced with a display of a three-dimensional representation of the second size. In some embodiments, in response to detecting the set of one or more gestures, more of the representation of the physical environment is visible. Displaying the three-dimensional representation of the captured media in place of the first portion of the representation of the physical environment provides the user with more focused visual feedback regarding the three-dimensional representation of the captured media by removing content (e.g., the first portion of the representation of the physical environment) that has not yet expressed explicit interest by the user, which provides improved visual feedback.

In some embodiments, while displaying the three-dimensional representation of the captured media (e.g., 740), the computer system detects a second set of one or more gestures that include a movement component (e.g., pinching (e.g., bringing two fingers together and/or moving two fingers closer together) and dragging the gesture (e.g., movement of the pinched hand) (e.g., detected by one or more cameras in communication with the computer system) (e.g., a second set of one or more gestures performed by a user (e.g., a user of the computer system) (e.g., as described above in fig. 7K)), in some embodiments, in response to detecting the second set of one or more gestures, the computer system stops displaying the three-dimensional representation of the captured media, in some embodiments, the three-dimensional representation of the captured media and the second three-dimensional representation of the second captured media are displayed simultaneously before the set of one or more gestures corresponding to the motion input are detected (e.g., different from the three-dimensional representation of the captured media) (e.g., the second three-dimensional representation of the second captured media is not displayed before the set of one or more gestures corresponding to the motion input are detected), as described above with respect to selection of virtual objects in an XR environment). Replacing the display of the first representation of the media item with the second representation of the media in response to detecting the second set of one or more gestures (e.g., air gestures, air movements) causes the computer system to perform a display operation without displaying additional controls, which provides additional control options without cluttering the user interface.

In some implementations, when the captured media is displayed as a three-dimensional representation of the captured media (e.g., 740), the computer system receives a request to play back the captured media (e.g., play back the captured media using the computer system and/or an external device). In some implementations, in response to receiving a request to play back captured media, and in accordance with a determination that the captured media includes audio data, the computer system plays back the captured media, wherein playback of the captured media item includes outputting spatial audio corresponding to the audio data (e.g., as described above with respect to fig. 7K) (e.g., audio perceived by a user as originating from one or more fixed locations and/or directions in a physical environment, even as the viewpoint and/or positioning of the user changes) (e.g., the audio data includes various channels, wherein the user perceives the output of each channel as emanating from a respective spatial location surrounding the user's location, wherein the spatial location of each channel emanating is locked to the location of the computer system, which causes the computer system to audibly emphasize the respective channel based on movement of the user's head within the real world environment) (e.g., audio signals that have been adjusted using directional audio filters) (e.g., spatial audio output via speakers integrated into the computer system) (e.g., spatial audio output via speakers in communication with the computer system). In some implementations, the output of the spatial audio is dependent on (e.g., based on) the positioning of the computer system relative to the physical environment. Outputting spatial audio automatically provides the computer system with the ability to perform audio output operations when certain prescribed conditions are met, which allows the user to hear three-dimensional audio sounds without performing input, which performs operations when a set of conditions has been met without further user input.

In some embodiments, the computer system receives a request to play back the captured media on the external device (e.g., 740) when the computer system communicates with the external device (e.g., a television, a laptop, a smart phone, a smart watch) (e.g., a device separate from the computer system) and the captured media includes depth data (e.g., the captured media item is a stereoscopic media item (e.g., a media item including media captured from two different cameras (or camera groups) simultaneously), and, upon displaying the captured media as a three-dimensional representation (e.g., 740) of the captured media, the computer system receives a request to play back the captured media on the external device.

In some implementations, playing back the captured media on the external device includes outputting spatial audio corresponding to the captured media.

In some embodiments, in response to detecting a request to play back captured media having a default pupillary distance value setting (e.g., a value between 58mm and 70mm, such as 60mm, 64mm, or 68 mm), and the computer system detecting a request to play back captured media (e.g., using the computer system and/or an external device (e.g., an external device capable of playing back media items including depth data)), the computer system initiates playback of captured media items at a first visual offset (e.g., an offset to display captured media to the user's right eye compared to the user's left eye) (e.g., a playback of captured media at a rate different from the rate at which captured media was captured) (e.g., a first offset and a default pupillary distance value having a different (e.g., greater than and/or less than) the default pupillary distance value setting, in response to determining that the user's eyes have a pupillary distance value (e.g., a difference between the default pupillary distance value setting and the pupil distance value is greater than a predetermined threshold value) in response to detecting a user's pupillary distance value having a threshold value (e.g., a difference between the first offset and a threshold value is less than a threshold value, in response to detecting a pupillary distance value having a difference between the user's distance value as described in the first threshold, in accordance with a determination that the user's pupillary distance value is a first pupillary distance value, the computer system initiates playback of the captured media item at a second offset, and in accordance with a determination that the user's pupillary distance value is a second pupillary distance value different from the first pupillary distance value, the computer system initiates playback of the captured media item at a third offset different from the second offset. When the user's pupillary distance value does not match the pupillary distance value, playing back the captured media item at the first offset causes the computer system to perform a playback operation that allows the user to more easily view the playback of the media item without requiring additional user input, which provides additional control options without cluttering the user interface.

In some embodiments, aspects/operations of methods 800, 900, 1000, 1200, 1400, and 1500 may be interchanged, substituted, and/or added between the methods. For example, the video media item captured in method 800 may be a three-dimensional representation of the media captured in method 1000. For the sake of brevity, these details are not repeated here.

11A-11D illustrate exemplary techniques for displaying a representation of a physical environment having a recording indicator, according to some embodiments. The user interfaces in fig. 11A to 11D are used to illustrate the processes described below, including the process in fig. 12.

FIG. 11A illustrates a user 712 holding a computer system 700 comprising a display 702 in a physical environment. The physical environment includes a sofa 709a, a drawing 709b, and a first individual 709c1. The display 702 presents a representation 704 of the physical environment (e.g., using "pass-through video" as described above). In the embodiment of fig. 11A-11D, the viewpoint of computer system 700 corresponds to the field of view of one or more cameras (e.g., cameras on the back side of computer system 700) in communication (e.g., wired or wireless communication) with computer system 700. Thus, as the computer system 700 moves throughout the physical environment, the field of view of the one or more cameras changes, which causes the point of view of the computer system 700 to change.

In fig. 11A, because the sofa 709a, the picture 709b, and the first individual 709c1 are visible from the point of view of the computer system 700, the display 702 presents a depiction of the sofa 709a, the picture 709b, and the first individual 709c 1. When the user 712 looks at the display 702, the user 712 may see the representation 704 of the physical environment and one or more virtual objects displayed by the computer system 700 (e.g., as shown in fig. 11A-11D). Thus, computer system 700 presents an augmented reality environment via display 702. In some embodiments, computer system 700 is a head-mounted device that presents representation 704 of a physical environment and one or more virtual objects that computer system 700 displays via a display generation component that encloses (or substantially encloses) a field of view of a user. In an embodiment in which computer system 700 is an HMD, the user's view is locked to the forward direction of the user's head such that representation 704 of the physical environment and one or more virtual objects, such as recording indicator 1102 (discussed below), shift as the user's head moves (e.g., because computer system 700 also moves as the user's head moves).

In fig. 11B-11D, computer system 700 is shown in an enlarged view to better illustrate what is visible on display 702. As shown in fig. 11B, computer system 700 displays a recording indicator 1102. Recording indicator 1102 is associated with a camera application installed on computer system 700. Computer system 700 displays recording indicator 1102 as faded in (e.g., recording indicator 1102 increases in brightness and/or becomes more visible over a period of time) in response to the camera application being launched. The boundaries of recording indicator 1102 indicate at least a portion of representation 704 of the physical environment to be captured via a media capturing process (e.g., a process of capturing still photographs and/or a process of capturing video). For example, as shown in fig. 11B, the boundary of recording indicator 1102 surrounds the upper body of first individual 709c1, picture 709B, and a portion of sofa 709 a. Thus, when the recording indicator 1102 is in the orientation shown in fig. 11B, the upper body of the first person 709c1, the drawing 709B, and a portion of the sofa 709a will be captured in response to the computer system 700 initiating a media capturing process. As the field of view of the one or more cameras of computer system 700 changes (e.g., the positioning of computer system 700 moves within the physical environment), the representation 704 of the physical environment surrounded by the boundaries of recording indicator 1102 changes in accordance with the change in the field of view of the one or more cameras of computer system 700. As shown in fig. 11B, computer system 700 displays recording indicator 1102 as a rectangle. In some implementations, computer system 700 displays recording indicator 1102 as a shape other than rectangular (e.g., circular, triangular, or square).

At FIG. 11B, computer system 700 displays recording indicator 1102 at a fixed analog depth relative to the location of computer system 700 in the physical environment within representation 704 of the physical environment. As the positioning of computer system 700 changes within the physical environment (e.g., as a user moves computer system 700 around within the physical environment), computer system 700 maintains the display of recording indicator 1102 at a fixed analog depth within representation 704 of the physical environment. Because computer system 700 displays recording indicator 1102 at a fixed analog depth within representation 704 of the physical environment, the display of recording indicator 1102 obscures some content located within the physical environment at a distance from computer system 700 that is greater than the analog depth at which recording indicator 1102 is displayed. In some implementations, the display of recording indicator 1102 is obscured by content positioned within the physical environment at a distance less than the simulated depth at which computer system 700 displays recording indicator 1102.

In fig. 11B, the user perceives that the content within recording indicator 1102 is brighter than the content outside of recording indicator 1102 due to the darkening of the area outside of recording indicator 1102 (e.g., optionally, even though computer system 700 does not significantly and/or systematically modify the brightness of the area within recording indicator 1102 relative to the area outside of recording indicator 1102). The visual effect aids and/or allows the user to easily view and focus on content within the boundaries of the recording indicator 1102 that will be the subject of the resulting multimedia item. The visual effect is created by displaying a recording indicator 1102 having various features (e.g., as described below), wherein a combination of these features creates the visual effect.

At fig. 11B, recording indicator 1102 includes an inner corner 1128 and an outer corner 1130. In addition, at fig. 11B, the recording indicator 1102 includes an inner edge region 1102a and an outer edge region 1102B. The inner edge region 1102a and the outer edge region 1102b form a gray gradient that transitions from a dark color at the inner edge of the recording indicator 1102 to a light color at the outer edge of the recording indicator 1102. The inner edge region 1102a includes the discrete inner edge of the recording indicator 1102 and a region of gray gradient having minimum/zero translucency (e.g., maximum saturation). The outer edge region 1102 includes the remainder of the gray gradient with gradual translucency/saturation. The inner edge region 1102a of the recording indicator 1102 forms an inner corner 1128 and the outer edge region 1102b of the recording indicator 1102 forms an outer corner 1130. Further, at fig. 11B, recording indicator 1102 has visual parameters (e.g., brightness, translucency, and/or tonal density) that change in value from an inner edge region 1102a of recording indicator 1102 to an outer edge region 1102B of recording indicator 1102. Changing the value of the visual parameter of the recording indicator 1102 from the inner edge region 1102a of the recording indicator 1102 to the outer edge region 1102b of the recording indicator 1102 helps create a visual effect that allows the user to treat the content within the recording indicator 1102 as brighter than the content outside of the recording indicator 1102, even if, optionally, the relative brightness of the content region has not been modified (e.g., the effect is a perceived effect, not an actual difference in brightness). As shown in fig. 11B, an inner edge region 1102a of the recording indicator 1102 is darker than an outer edge region 1102B of the recording indicator 1102. At fig. 11B, the display of the outer edge region 1102B of the recording indicator 1102 is larger (e.g., covers more of the display 702) than the display of the inner edge region 1102a of the recording indicator 1102. The difference in the size of the inner edge region 1102a of the recording indicator 1102 and the outer edge region 1102b of the recording indicator 1102, and the transition of the dark color associated with the inner edge region 1102a of the recording indicator 1102 to the light color associated with the outer edge region 1102b of the recording indicator 1102, all help create a visual effect that allows the user to view the content within the recording indicator 1102 as brighter than the content outside of the recording indicator 1102.

At fig. 11B, the inner edge region 1102a of the recording indicator 1102 is not translucent (e.g., the inner edge region 1102a of the recording indicator 1102 is a sharp solid color), and the outer edge region 1102B of the recording indicator 1102 is partially translucent. That is, at fig. 11B, the translucency of the recording indicator 1102 increases from the inner edge region 1102a of the recording indicator 1102 to the outer edge region 1102B of the recording indicator 1102. Increasing the translucency of recording indicator 1102 from the inner edge 1102a region of recording indicator 1102 to the outer edge 1102b region of recording indicator 1102 helps create a visual effect that allows a user to treat content within the boundaries of recording indicator 1102 as brighter than content that is not within the boundaries of recording indicator 1102. In some embodiments, the inner edge region 1102a of the recording indicator 1102 is a different color than the outer edge region 1102b of the recording indicator. In some implementations, an inner edge region 1102a of the recording indicator 1102 is lighter in color (e.g., brighter) than an outer edge region 1102b of the recording indicator 1102. In an embodiment, when the inner edge region 1102a of the recording indicator 1102 is lighter in color than the outer edge region 1102b of the recording indicator 1102, the user perceives the content outside of the recording indicator 1102 as brighter than the content within the recording indicator 1102.

As shown in fig. 11B, computer system 700 displays recording indicator 1102 with rounded corners. Displaying the recording indicator 1102 with rounded corners helps create a visual effect that causes the user to perceive content within the boundaries of the recording indicator 1102 as brighter than content that is not within the boundaries of the recording indicator 1102.

In the embodiment of fig. 11A-11B, the one or more cameras in communication with the computer system 700 have an optimal capture distance (e.g., or range of distances) that enables the one or more cameras to capture improved and/or optimal depth data, or in some embodiments, only depth data at the optimal capture distance. That is, when computer system 700 is positioned at an optimal capture distance away from an object, the resulting media item that includes the object will include depth data and/or improved or optimized depth data. The size and shape of recording indicator 1102 is selected (e.g., by the manufacturer of computer system 700) to encourage the user to place computer system 700 at the best capture distance from the subject (e.g., the size of recording indicator 1102 is selected to be smaller to encourage the user to place computer system 700 closer to the subject, or the size of recording indicator 1102 is selected to be larger to encourage the user to stand farther away from the subject). In some implementations, the computer system 700 dynamically updates the size of the recording indicator 1102 (e.g., based on data from previous media items captured using the computer system 700 and/or based on current environmental conditions) to encourage the user to place the computer system 700 at the optimal capture distance. In some embodiments, the size and shape of recording indicator 1102 is selected by a user of computer system 700.

As shown in fig. 11B, computer system 700 displays camera shutter virtual object 1124 within recording indicator 1102. Selection of the camera shutter virtual object 1124 initiates a process on the computer system 700 for capturing media, where the captured media includes a representation of the content within the recording indicator 1102. Camera shutter virtual object 1124 is anchored to recording indicator 1102. That is, as computer system 700 is repositioned within the physical environment, computer system 700 maintains the display of camera shutter virtual object 1124 within recording indicator 1102. In some implementations, the camera shutter virtual object 1124 includes one or more features discussed above with respect to the camera shutter virtual object 714.

As shown in fig. 11B, computer system 700 displays photo pool virtual object 1126 within record indicator 1102. As discussed above, the photo pool virtual object 1126 includes a representation (e.g., a still photo or video) of the most recently captured media item (e.g., coffee mug at fig. 11B). The computer system 700 displays an enlarged version of the representation of the most recently captured media item in response to detecting the selection of the photo pool virtual object 1126. Similar to the camera shutter virtual object 1124, the photo pool virtual object 1126 is anchored to the display of the recording indicator 1102. In some implementations, in response to a camera application (e.g., a camera application associated with the recording indicator 1102 as discussed above) being launched, the computer system 700 displays both the camera shutter virtual object 1124 and the photo pool virtual object 1126 as fading in simultaneously with the recording indicator 1102. In some embodiments, selection of the photo pool virtual object 1126 causes the computer system 700 to cease displaying the recording indicator 1102. In some embodiments, computer system 700 displays additional virtual objects (e.g., virtual objects for controlling the zoom level of the one or more cameras in communication with the computer system) within recording indicator 1102. In some implementations, the photo pool virtual object 1126 includes one or more features discussed above with respect to the photo pool virtual object 715.

Fig. 11B includes a depiction of a media capturing schematic 1120. Computer system 700 does not display media capturing schematic 1120. Instead, the media capturing schematic 1120 is included in fig. 11B as a visual aid to help explain the following concepts. As described above, the boundaries of recording indicator 1102 indicate what content is to be captured via the corresponding media capturing process. However, the corresponding media capturing process captures additional content that is not within the boundaries of recording indicator 1102. Content within the boundaries of the media capture diagram 1120 is also captured via a corresponding media capture process capture. As shown in fig. 11B, the boundary of media capture diagram 1120 covers a larger area of display 702 of computer system 700 (e.g., more of the representation surrounding the physical environment) than the boundary of recording indicator 1102. Thus, the content captured by the corresponding media capturing process is not just the content surrounded by the boundaries of recording indicator 1102. In some implementations, the resulting multimedia of the respective media capturing process includes content surrounded by the boundaries of recording indicator 1102 and does not include content outside the boundaries of recording indicator 1102. In some implementations, the widths of the media capturing view 1120 and the recording indicator 1102 are the same, and the height of the media capturing view 1120 is greater than the height of the recording indicator 1102. In some implementations, the height of the media capturing view 1120 and the height of the recording indicator 1102 are the same, and the width of the media capturing view 1120 is greater than the width of the recording indicator 1102.

At fig. 11C, computer system 700 is repositioned in the physical environment (e.g., user 712 moves computer system 700 to first individual 709C 1). At fig. 11C, because computer system 700 is repositioned within the physical environment, the portion of representation 704 of the physical environment surrounded by recording indicator 1102 changes (e.g., relative to the portion of representation 704 of the physical environment surrounded by recording indicator 1102 at fig. 11B). In some embodiments, the portion of the representation 704 of the physical environment surrounded by the recording indicator 1102 changes in response to the computer system 700 detecting (e.g., via one or more sensors coupled to the computer system and/or the one or more cameras in communication with the computer system) a gradual change in the user's point of view (e.g., the user's point of view corresponds to the field of view of the one or more cameras in communication with the computer system 700). In some implementations, the portion of the representation 704 of the physical environment surrounded by the recording indicator 1102 changes in response to the computer system 700 detecting (e.g., via the one or more cameras in communication with the computer system) a gradual change in the positioning of content in the physical environment (e.g., a movement of the positioning of content in the physical environment, such as the individual 709c1, relative to the computer system 700).

At fig. 11C, it is determined that a set of display criteria is met (e.g., a distance between computer system 700 and first individual 709C1 is suitable for depth capture (e.g., a distance between computer system 700 and first individual 709C1 is greater than a first threshold and/or less than a second threshold and/or brightness of light in a physical environment is greater than a third threshold and/or less than a fourth threshold) and/or illumination in a physical environment is suitable for depth capture). Because the set of display criteria is determined to be met, computer system 700 displays auxiliary recording indicator 1116. As shown in fig. 11C, auxiliary recording indicator 1116 includes one or more non-contiguous portions displayed at one or more interior corners 1128 of recording indicator 1102. The one or more non-contiguous portions of auxiliary recording indicator 1116 are displayed in a color (e.g., yellow, red, and/or orange) that is different from the display color of recording indicator 1102. In some implementations, computer system 700 displays auxiliary recording indicator 1116 as overlapping with both inner corner 1128 and outer corner 1130 of recording indicator 1102. In some implementations, auxiliary recording indicator 1116 is displayed at one or more outer corners 1130 of recording indicator 1102 (e.g., and not at one or more inner corners 1128 of recording indicator 1102). In some implementations, computer system 700 does not display auxiliary recording indicator 1116 on interior corner 1128 of recording indicator 1102. In some implementations in which computer system 700 does not display auxiliary recording indicator 1116 on interior corner 1128 of recording indicator 1102, computer system 700 displays auxiliary recording indicator 1116 around (e.g., adjacent to) the content within recording indicator 1102. In some implementations, computer system 700 displays auxiliary recording indicator 1102 on a subset (e.g., less than all) of the interior corners 1128 of recording indicator 1116. In some implementations, a secondary recording indicator 1116 is displayed to indicate that other criteria are met, such as auto focus, optimal exposure conditions, and/or face detection.

At fig. 11C, computer system 700 displays recording indicator 1102 and auxiliary recording indicator 1116 in the same plane. As described above, computer system 700 displays recording indicator 1102 at a fixed depth relative to the location of computer system 700 within the physical environment within representation 704 of the physical environment. Because computer system 700 displays recording indicator 1102 and auxiliary recording indicator 1116 in the same plane, computer system 700 also displays auxiliary recording indicator 1116 at the same fixed depth as the display of recording indicator 1102.

At fig. 11C, the computer system 700 detects activation of the hardware button 711a by the body part 712a, or the computer system 700 detects an input 1150C directed to the camera shutter virtual object 1124. In some implementations, the input 1150c is a tap on the camera shutter virtual object 1124 (e.g., an air tap in space corresponding to the display location of the camera shutter virtual object 1124). In some implementations, the input 1150c is a gaze (e.g., continuous gaze) input directed toward a display direction of the camera shutter virtual object 1124. In some implementations, the input 1150c is an air-tap input in combination with detecting gaze in a display direction of the camera shutter virtual object 1124. In some implementations, the input 1150c is a gaze and blink directed toward the display direction of the camera shutter virtual object 1124. In some embodiments, the activation or input 1150c of the hardware button 711a by the body part 712a is a long press (e.g., presses and holds) (e.g., the duration of the activation or input 1150c of the hardware button by the body part 712a is a duration greater than a respective threshold, such as 0.25 seconds, 0.5 seconds, 1 second, 2 seconds, or 5 seconds). In some implementations, the activation or input 1150c of the hardware button 711 by the body part 712a is a short press (e.g., pressing and releasing) (e.g., the duration of the activation or input 1150c of the hardware button 711a by the body part 712a is less than one second). In some implementations, a particular air gesture identified as a request to capture media is detected (e.g., as described above with respect to selection of virtual objects in an XR environment). In some implementations, in response to detecting a long press corresponding to selection of the camera shutter virtual object 1124, the computer system 700 changes the visual appearance of the camera shutter virtual object 1124 to indicate that the one or more cameras in communication with the computer system 700 are capturing video media.

At fig. 11D, computer system 700 initiates a media capturing process in response to detecting activation of hardware button 711a by body part 712a or in response to detecting input 1150 c. At fig. 11D, it is determined that the activation of the hardware button 711 by the body part 712a or the input 1150c is a short press. Because it is determined that the body part 712a is a short press of the hardware button 711 or the input 1150c, static media (e.g., rather than video media) is captured. The resulting media item includes content surrounded by the boundary of the recording indicator 1102 at fig. 11C (e.g., the resulting media includes a representation of the upper body of the first individual 709C1 and a portion of the picture 709 b). At fig. 11D, because the most recently captured media item is now a representation of the physical environment, computer system 700 updates the display of photo pool virtual object 1126 to include the representation of the physical environment (e.g., the photo pool virtual object includes a representation of content surrounded by the boundaries of recording indicator 1102 at fig. 11C).

At fig. 11D, computer system 700 determines that the ambient brightness in the physical environment has decreased. Because it is determined that the ambient brightness in the physical environment has decreased, computer system 700 increases the visual saliency of auxiliary recording indicator 1116 (e.g., auxiliary recording indicator 1116 is displayed more prominently than auxiliary recording indicator 1116 at fig. 11C). Computer system 700 increases the visual saliency of auxiliary recording indicator 1116 by increasing the brightness at which computer system 700 displays auxiliary recording indicator 1116 (e.g., as compared to the brightness of auxiliary recording indicator 1116 in fig. 11C). In some implementations, computer system 700 increases the visual saliency of auxiliary recording indicator 1116 by (e.g., simultaneously) decreasing the brightness of recording indicator 1116 and increasing the brightness of auxiliary recording indicator 1102. In some implementations, computer system 700 increases the visual saliency of auxiliary recording indicator 1116 by decreasing the display size of recording indicator 1116 and increasing the display size of auxiliary recording indicator 1102. In some implementations, the visual appearance of the auxiliary recording indicator 1116 indicates that the current conditions (e.g., the distance between the computer system 700 and the object and/or the illumination in the physical environment) are satisfactory for depth capture. In some implementations, computer system 700 dynamically changes the visual appearance of auxiliary recording indicator 1116 to indicate that the current condition is satisfactory for depth capture or that the current condition is unsatisfactory for depth capture.

Fig. 12 is a flow chart illustrating a method for displaying a representation of a physical environment having a recording indicator, according to some embodiments. The method 1200 is performed at a computer system (e.g., 100, 300, 500) in communication with a display generation component and one or more cameras. Some operations in method 1200 are optionally combined, the order of some operations is optionally changed, and some operations are optionally omitted.

A computer system (e.g., 700) displays (1200) an augmented reality camera user interface (e.g., a display of 1124, 1126, and/or 1102) (e.g., a user interface corresponding to a camera application (e.g., a third party camera application and/or a camera application installed on the computer system by a manufacturer of the computer system)) via a display generation component (e.g., 702), the augmented reality camera user interface comprising: a physical environment (1204) (e.g., a physical environment in which the computer system is located) (e.g., a physical environment within the field of view of the one or more cameras) (e.g., A physical environment within the user's field of view) (e.g., a real-time representation or a three-dimensional representation) (e.g., an optical representation (e.g., via light that passes directly through a portion of the computer system to the user) or a graphical representation displayed by the computer system); And a recording indicator (1206) (e.g., 1102) (e.g., surrounding a first portion of the representation of the physical environment but not a second portion of the representation of the physical environment) (e.g., the recording indicator is displayed at a central location on the camera user interface), the recording indicator indicating a recording area (e.g., an area within 1102) (e.g., the recording area including at least a portion of the representation of the physical environment to be captured by the one or more cameras in response to the computer system detecting a request to capture media, wherein the portion of the representation of the physical environment within the recording area is within a resulting media (e.g., Static media and/or video media), wherein the recording indicator includes at least a first edge region (e.g., 1102 a) having a visual parameter (e.g., density and/or color (e.g., a single color or multi-color gradient)) (e.g., an amount of translucency (e.g., an amount of representation of a physical environment visible behind a visual attribute)) (e.g., a first portion of the visual attribute covering the environment but not a second portion of the representation of the physical environment), the visual parameter being reduced in the visible portion of the recording indicator by a plurality of different values of the visual parameter, wherein the value of the parameter decreases with distance from the first edge region of the recording indicator (e.g., The distance (e.g., inches, centimeters, and/or millimeters) in a direction perpendicular to the first edge region and extending outward from the recording region) increases gradually (e.g., gradually or through multiple discrete steps) decreases (e.g., decreases in a single direction relative to the center of the recording region) (e.g., decreases proportionally, decreases based on a predetermined function, decreases linearly, and/or decreases non-linearly) (e.g., as discussed above with respect to fig. 11B) (e.g., the intensity (e.g., shadow amount) and/or density of the visual attribute decreases with increasing distance from the recording region). In some embodiments, the recording area is a portion of a field of view of at least one of the one or more cameras. In some embodiments, the recording area is a portion of the field of view (e.g., a portion of overlapping fields of view) of at least two of the one or more cameras. In some embodiments, the visual attribute is displayed on a corner of the recording indicator rather than on a side portion of the recording indicator. In some embodiments, the visual attribute is displayed on a lateral portion of the visual indicator rather than on a corner of the visual indicator. In some embodiments, the recording indicator is a boundary between a representation of the physical environment captured by the one or more cameras and a representation of the physical environment not captured by the one or more cameras. In some embodiments, the angular extent of the recording region is the same (e.g., or substantially the same) as the angular extent of the field of view of the one or more cameras. In some embodiments, the visual attribute extends onto the recording area. In some embodiments, the visual attribute increases based on distance from the recording area. Displaying a recording indicator that includes at least a first edge region having a visual parameter that is reduced by a plurality of different values facilitates creating a visual effect that causes a user to perceive the recording region as visually emphasized (e.g., the recording region appears brighter than a representation of a physical environment outside of the recording region), which makes it easier for the user to view and focus on content within the recording region that is to be captured as a photograph or video, which provides improved visual feedback and facilitates the user to properly and quickly construct and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some implementations, when the augmented reality camera user interface (e.g., display of 1124, 1126, and/or 1102) is displayed, the computer system (e.g., 700) detects (e.g., via the one or more cameras in communication with the computer system and/or via one or more sensors coupled to the computer system) an input (e.g., 712a or 1150 c) corresponding to a request to capture media (e.g., an air gesture (e.g., air pinch, air swipe, air expand, and/or air tap), an input corresponding to a hardware button in communication with the computer system, and/or a gaze pointing in a direction of a shutter virtual object displayed in the augmented reality camera user interface. In response to detecting an input corresponding to a request to capture media, the computer system captures media (e.g., captures media via the one or more cameras in communication with the computer system) that includes a representation of at least a portion of a physical environment within the recording area (e.g., a representation of a photograph shown in 1126 of fig. 11D). Capturing media in response to detecting input corresponding to a request to capture media causes the computer system to perform media capturing operations including capturing a representation of at least a portion of a physical environment within a recording area indicated by a recording indicator, which facilitates a user to properly and quickly construct and capture content of interest, which is particularly relevant to transient events and improves functionality of the computer system.

In some implementations, the captured media is static media (e.g., photos or media corresponding to a single point in time) (e.g., as described above with respect to fig. 11B).

In some implementations, the captured media is animated media (e.g., as described above with respect to fig. 11B) (e.g., video, or media corresponding to a period of time, such as a sequence of images). In some implementations, when the captured media is animated media, a request to capture the media is detected for a duration greater than a predetermined threshold (e.g., a long press of a button, a touch and hold gesture corresponding to displaying a virtual shutter button, or a continuous air gesture (e.g., a hold pinch).

In some implementations, the captured media includes a representation of the field of view of the one or more cameras (e.g., a portion of 704 within 1120) that is different from a representation of the physical environment within the recording area (e.g., a portion of 704 within 1102) (e.g., as discussed above with respect to fig. 11B) (e.g., the captured representation of the field of view of the one or more cameras includes fewer representations of the physical environment than the representation of the physical environment within the recording area, or the captured representation of the field of view of the one or more cameras includes more representations of the physical environment than the representation of the physical environment within the recording area). In some implementations, the multimedia (e.g., photo and/or video) representing the captured media does not include a portion of the representation of the physical environment within the recording area (e.g., the portion of the representation of the physical environment within the recording area is not visible in the photo and/or video). Having a representation of a physical environment that includes content that is different from the content included in the recording area provides the user with the ability to view additional content that may (but presently has not) be included in the recording area when the user is mobile in the location of the computer system, while also enhancing the user's perception of their current physical environment when the computer system is performing the media capturing process. Doing so improves media capture operations and reduces the risk of failing to capture transient events and/or content that may be missed if the capture operation is inefficient or difficult to use. Improving media capturing operations enhances the operability of the system and makes the user system interface more efficient (e.g., by helping the user provide appropriate input and reducing user errors in operating/interacting with the device).

In some embodiments, the visual parameter is a tone gradient (e.g., a grayscale gradient created by the inner edge region 1102a and the outer edge region 1102 b) (or a tone density gradient) (e.g., a tone gradient ranging from 100% tone (e.g., a solid color (e.g., a given color such as black, gray, blue, or red) to 0% tone (e.g., a tone gradient in the absence of a given color and/or any color (e.g., substantially transparent)) the display of a recording indicator with a tone gradient facilitates creation of a visual effect that causes a user to perceive a recording region as visually emphasized (e.g., the recording region appears brighter than a representation of a physical environment outside of the recording region), which makes it easier for the user to view and focus on what is to be captured as a photograph or video within the recording region, which provides improved visual feedback and facilitates the user to properly and quickly construct and capture what is of interest, particularly in connection with transient events and improves the functionality of the computer system.

In some embodiments, the recording indicator (e.g., 11 02) includes a second edge region (e.g., 1102 b) that is farther from the center of the recording region (e.g., the interior of 1102) than the first edge region (e.g., 1102 a), and wherein the second edge region is larger than the first edge region (e.g., occupies more space on a display generating component of the computer system, is larger in length and/or thickness than the first edge region). In some embodiments, the size of the second edge region is a function of the size of the first edge region. Displaying the second edge region (e.g., feathered) as compared to the first sharp edge region facilitates creating a visual effect that causes the user to perceive the recording region as visually emphasized (e.g., the recording region appears brighter than a representation of the physical environment outside of the recording region), which makes it easier for the user to view and focus on content within the recording region that is to be captured as a photograph or video, which provides improved visual feedback and facilitates the user to properly and quickly construct and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, the one or more cameras in communication with the computer system (e.g., 700) have an optimal capture distance for capturing depth data (e.g., an optimal distance between the one or more cameras and the object such that depth data can be captured) (e.g., a satisfactory distance for depth capture for one or more capture parameters such as lighting, focus, and/or sharpness, etc.), and wherein the size and/or shape of the recording indicator (e.g., 1102) facilitates (e.g., assists a user) positioning the computer system at the optimal capture distance of the one or more cameras relative to one or more objects (e.g., detected objects and/or detected objects) in the field of view of the one or more cameras) (e.g., as discussed above with respect to fig. 11B) (e.g., the size of the recording indicator (e.g., the height of the recording indicator and the width of the recording indicator) and the shape are selected (e.g., by a manufacturer of the computer system) to encourage the user to adjust their location at the necessary distance from the object(s) intended to be captured via the one or more cameras. In some embodiments, the size and/or shape of the recording indicator is dynamically adjusted based on one or more objects identified in the FOV of the one or more cameras. In some embodiments, the optimal distance is a range of distances in which the capture parameter is within an optimal and/or acceptable range of values. Displaying the recording indicator in a selected size and/or shape so as to encourage placement of the computer system at an optimal capture distance provides the user with visual cues as to where the computer system needs to be placed in the physical environment so that the resulting photograph and/or video includes satisfactory depth data, which provides improved visual feedback and facilitates the user in properly and quickly constructing and capturing content of interest in an optimal manner, which is particularly relevant to transient events and improves the functionality of the computer system.

In some implementations, the recording indicator (e.g., 11 02) is displayed at a fixed simulated (e.g., virtual) depth (e.g., 1 foot, 3 feet, 5 feet, and/or 7 feet) within the representation (e.g., 704) of the physical environment (e.g., relative to a location of the computer system within the physical environment) (e.g., the recording indicator is superimposed over the representation of the physical environment at a location corresponding to the fixed depth from the location of the computer system within the physical environment) (e.g., the recording indicator remains displayed at the fixed simulated depth as the computer system moves in the physical environment). In some embodiments, the recording indicator is obscured by an object included in the representation of the physical environment when the object is at a distance (e.g., a real world distance) from the computer system that is less than the fixed simulated depth. In some embodiments, the recording indicator obscures an object included in the representation of the physical environment when the object is at a distance (e.g., a real world distance) from the computer system that is less than the fixed simulation depth. Displaying the recording indicator at a fixed analog depth within the representation of the physical environment provides a visual cue to the user so that the user can determine their location within the physical environment relative to other objects in the physical environment, which provides improved visual feedback and facilitates the user in properly and quickly constructing and capturing content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, the recording indicator (e.g., 1102) includes one or more corners (e.g., 1128 or 1130). In some embodiments, when the recording indicator is displayed and in accordance with a determination that a set of display criteria is met (e.g., an object within the recording indicator meets the set of display criteria), the computer system (e.g., 700) displays an auxiliary recording indicator (e.g., 1116) (e.g., a non-contiguous yellow corner) at the one or more corners of the recording indicator (e.g., at corners) via the display generating component (e.g., 702). In some embodiments, the display of the auxiliary recording indicator is aborted in response to determining that the set of display criteria is not met. In some embodiments, the auxiliary recording indicator is displayed immediately adjacent to the one or more corners of the recording indicator (e.g., on an inner radius or on an outer radius). In some embodiments, in accordance with a determination to cease meeting the set of display criteria, the display of the auxiliary recording indicator is stopped. In some embodiments, the auxiliary recording indicator continuously surrounds the entire circumference of the recording indicator. In some embodiments, the auxiliary recording indicator is displayed around one or more objects within the recording indicator (e.g., one or more objects meeting the set of display criteria) instead of around the one or more corners of the auxiliary recording indicator. Displaying the auxiliary recording indicator when certain prescribed conditions are met (e.g., in accordance with a determination that a set of display criteria is met) provides the computer system with the ability to automatically perform a display operation that indicates to a user that a condition in the physical environment meets the set of display criteria, which performs the operation when a set of conditions has been met without further user input.

In some embodiments, the recording indicator (e.g., 1102) is displayed in a first plane (e.g., a plane perpendicular to the field of view of the one or more cameras in communication with the computer system) (e.g., a plane within a representation of the physical environment), and wherein the auxiliary recording indicator is displayed in the first plane (e.g., as described above with respect to fig. 11C) (e.g., both the recording indicator and the auxiliary recording indicator are displayed in the first plane at an analog (e.g., virtual) depth (e.g., analog depth discussed above) within the representation of the physical environment). Displaying the recording indicator and the auxiliary recording indicator in the same plane allows the user to easily view both the recording indicator and the auxiliary recording indicator at the same time and provides additional visual cues to the user so that the user can determine their location within the physical environment relative to other objects in the physical environment, which provides improved visual feedback and facilitates the user to properly and quickly construct and capture content of interest via depth capture, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, displaying the auxiliary recording indicator (e.g., 1116) includes: in accordance with a determination that a physical environment (e.g., a physical environment corresponding to a location of a computer system) (e.g., a physical environment corresponding to a representation of the physical environment) has a first amount of brightness (e.g., an ambient lighting condition (e.g., 1 lux, 5 lux, or 10 lux)), the auxiliary recording indicator is displayed with a first amount of visual salience (e.g., contrast) (e.g., a degree to which the auxiliary recording indicator visually contrasts with the recording indicator) relative to the recording indicator (e.g., 1102), and in accordance with a determination that the physical environment has a second amount of brightness that is less than the first amount of brightness, the auxiliary recording indicator is displayed with a second amount of visual salience relative to the recording indicator, wherein the second amount of visual salience is greater than the first amount of visual salience (e.g., as described with respect to fig. 11D). In some implementations, the visual appearance of the auxiliary recording indicator dynamically changes as the brightness of the physical environment changes, while the visual appearance of the recording indicator remains static (e.g., the visual appearance of the auxiliary recording indicator becomes larger and/or brighter in response to the brightness of the physical environment darkening). In some implementations, the visual appearance of the recording indicator dynamically changes as the brightness of the physical environment changes, while the visual appearance of the auxiliary recording indicator remains static (e.g., the visual appearance of the recording indicator becomes smaller and/or darkens in response to the brightness of the physical environment darkening). Displaying the auxiliary recording indicator in a second visually significant amount (e.g., brighter) when certain prescribed conditions are met (e.g., a small amount of light is present in the physical environment) allows the computer system to automatically perform a display operation that allows a user to easily view the auxiliary recording indicator in an environment containing a small amount of light, which performs the operation without requiring further user input when a set of conditions has been met.

In some embodiments, the set of display criteria includes criteria that are met when conditions (e.g., a distance between an object and a computer system and/or illumination of an environment in which the computer system is located) of one or more objects (e.g., 709a, 709b, and/or 709C 1) (e.g., a person, object, and/or scene that a user intends to capture using the one or more cameras in communication with the computer system) are suitable for depth capture (e.g., as described above with respect to fig. 11C) (e.g., a process of capturing media from two or more different cameras (or camera groups) to produce a stereoscopic media item). In some embodiments, the conditions are suitable for depth capture when the distance between the computer system and the object is greater than a first threshold and/or less than a second threshold and/or the amount of light (e.g., natural light and/or ambient light in a physical environment) is greater than a third threshold and/or less than a fourth threshold. Displaying the auxiliary recording indicator when the condition of the one or more objects is suitable for depth capture provides visual feedback to the user as to whether the resulting multimedia item will include a representation of the captured depth data, which provides improved visual feedback, performs the operation without further user input when a set of conditions has been met, and facilitates the user to properly and quickly construct and capture content of interest via depth capture, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, when the auxiliary recording indicator (e.g., 1116) is displayed with a first visual appearance (e.g., the visual appearance of the auxiliary recording indicator at fig. 11C or 11D) (e.g., a first color and/or a first opacity metric) that indicates that the current condition (e.g., distance between the computer system and the object and/or illumination (e.g., ambient illumination and/or artificial illumination)) is not suitable for depth capture (e.g., a process of capturing media from two or more different cameras (or camera sets) to produce a stereoscopic media item), the computer system (e.g., 700) changes the visual appearance (e.g., a visual appearance indicator) around a representation of a second object (e.g., 709a, 709b and/or 709C 1) (e.g., a person, object and/or scene that the user intends to capture using the one or more cameras in communication with the computer system) and in accordance with a determination (e.g., a determination made by one or more sensors in communication (e.g., wired or wireless communication) with the computer system) that one or more depth-capturing (e.g., a set of cameras) of media is or stereoscopic media items), the visual appearance (e.g., a visual appearance indicator is changed from the first visual appearance threshold (e.g., a visual appearance indicator is a visual appearance change at a threshold value) that the current visual appearance (e.g., a visual appearance indicator is suitable for changing the visual appearance) (e.g., a visual appearance threshold is determined from the first visual appearance) (e.g., a visual appearance indicator is changed at the first visual appearance threshold) at the auxiliary visual appearance at the first visual appearance threshold at the first visual appearance (e.g., a threshold) of the auxiliary recording indicator at the auxiliary appearance) at the first visual appearance, changing the size of the auxiliary recording indicator, changing the opacity of the second recording indicator, and/or changing the display position of the auxiliary recording) (e.g., as discussed above with respect to fig. 11D). In some embodiments, the current conditions are not suitable for depth capture when depth capture is not possible under the current conditions. In some embodiments, the current conditions are not suitable for depth capture when capture is enabled but at a non-optimal or non-ideal level (e.g., quality level). In some embodiments, in accordance with a determination that the set of one or more depth capture criteria is not met, a first visual appearance of the auxiliary recording indicator is maintained. In some embodiments, in response to determining that the set of one or more depth capture criteria is not met, the display of the auxiliary recording indication is stopped. Displaying the recording indicator with the second appearance when the set of one or more depth captures is satisfied provides visual feedback to the user as to whether conditions in the physical environment are suitable for the depth capture, which provides improved visual feedback and facilitates the user to properly and quickly construct and capture content of interest via the depth capture, which is particularly relevant to transient events and improves the functionality of the computer system. Displaying the recording indicator when certain prescribed conditions are met (e.g., in accordance with a determination that the set of depth capture criteria is met) automatically provides the computer system with the ability to automatically perform a display operation that indicates to a user that conditions in the physical environment are suitable for depth capture, which performs the operation when a set of conditions has been met without further user input.

In some embodiments, prior to displaying the augmented reality camera user interface (e.g., 1102, 1124, and/or 1126, or the camera application initiates an input on an icon), the computer system (e.g., 700) detects (e.g., via the one or more cameras in communication with the computer system and/or the one or more sensors in communication with the computer system) an input corresponding to a request to display the augmented reality camera user interface (e.g., an air gesture (e.g., an air pinch, an air swipe, an air expand, and/or an air tap), a tactile input (e.g., an input corresponding to a selection of a hardware button coupled to the computer system, or an input on a camera application initiates an icon), and in response to detecting an input corresponding to a request to display the augmented reality camera user interface, the computer system displaying the augmented reality camera user interface, wherein displaying the augmented reality camera user interface includes an animation of a recording indicator Fu Danru (e.g., as discussed above with respect to fig. 11B) (e.g., the appearance of the recording indicator gradually changes (e.g., becomes more pronounced) over a predetermined period of time), the recording indicator fades in. Displaying the recording indicator as a fade-in provides visual feedback to the user regarding the status of the computer system (e.g., the computer system has detected a request to display an augmented reality camera user interface), which provides improved visual feedback and facilitates the user to properly and quickly construct and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, the representation of the physical environment (e.g., 704) includes a third portion (e.g., the portion of 704 at fig. 11B-11C that is not surrounded by the recording indicator) that surrounds the recording indicator (e.g., 1102) (e.g., and is not surrounded by the recording indicator), and wherein the third portion of the representation of the physical environment and the recording region (e.g., the interior of the recording indicator 1102) optionally have substantially the same amount of brightness modification (e.g., as described above with respect to fig. 11B) due to the displayed user interface element (e.g., optionally have the same amount of brightness modification or have a difference in brightness modification that is less than a threshold amount, wherein the threshold is less than 5%, 4%, 3%, 2%, or 1% of the total brightness range) (e.g., an average amount of brightness; the same amount of ambient brightness; the third portion of the representation of the physical environment and the recording region have the same amount of brightness, but the user perceives the recording region as brighter than the third portion of the representation of the physical environment). In some embodiments, the third portion of the representation of the physical environment and the recording area are optionally displayed with the same amount of brightness). In some embodiments, the computer system modifies the amount of brightness of the third portion without distinction as compared to the amount of brightness of the recorded area. Displaying the third portion of the representation of the physical environment and the recording area with the same amount of brightness allows the user to easily ascertain which content in the physical environment is within the recording area and will be captured as part of the media capturing process, and which content in the physical environment is not within the recording area and will not be captured as part of the media capturing process, which provides improved visual feedback and facilitates the user to properly and quickly construct and capture content of interest via deep capturing, which is particularly relevant to transient events and improves the functionality of the computer system.

In some implementations, the recording indicator (e.g., 1102) has a third edge region (e.g., 1102B) (e.g., the first edge region is closer to the center of the recording indicator than the third edge region), and wherein the first edge region (e.g., 1102 a) is darker than the third edge region (e.g., the first edge region is darker in color than the third edge region) (e.g., the first edge region and the third edge region are displayed at the same brightness) (e.g., as described above with respect to fig. 11B). In some embodiments, the visual parameter is darkness level. Displaying the third edge region darker than the first edge region helps create a visual effect that causes the user to perceive the recording region as visually emphasized (e.g., the recording region appears brighter than a representation of the physical environment outside of the recording region), which makes it easier for the user to view and focus on content within the recording region that is to be captured as a photograph or video, which provides improved visual feedback and helps the user to properly and quickly construct and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some implementations, the recording indicator (e.g., 1102) has a fourth edge region (e.g., 1102B) (e.g., the first edge region is closer to the center of the recording indicator than the fourth edge region), and wherein the fourth edge region is darker than the first edge region (e.g., 1102 a) (e.g., the fourth edge region is darker in color than the first edge region) (e.g., the fourth edge region and the first edge region are displayed at the same brightness) (e.g., as described above with respect to fig. 11B). In some embodiments, the visual parameter is a gray scale (e.g., a gray scale having a color value ranging from 0 to 255, where 0 corresponds to black and 255 corresponds to white). Displaying the fourth edge region darker than the first edge region facilitates creating a visual effect that causes the user to perceive a portion of the augmented reality camera user interface (e.g., the recording region or the portion of the augmented reality user interface surrounding the recording indicator) as appearing brighter (e.g., the recording region appears brighter than a representation of the physical environment outside of the recording indicator or the representation of the physical environment surrounding the recording indicator) allowing the user to easily view and focus on content in a particular portion of the augmented reality camera user interface, which provides improved visual feedback and facilitates the user to properly and quickly construct and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, the computer system (e.g., 700) displays a capture virtual object (e.g., 1124) within the recording indicator (e.g., 1102) that, when selected (e.g., via detection of a gaze directed to a user capturing the virtual object (e.g., gaze and dwell), and in some embodiments, performs one or more gestures (e.g., pinch gesture, expand gesture, air tap, and/or air swipe) in conjunction with detection of a user and/or via detection of a tap directed to a point in space corresponding to display of the capture virtual object) such that a process for capturing media (e.g., as described above with respect to fig. 11D) is initiated (e.g., such that the computer system initiates a process for capturing media) (e.g., static media or video media) (e.g., captures media using the one or more cameras in communication with the computer system). In some embodiments, the appearance of the captured virtual object changes in response to the captured virtual object being selected. Displaying the capture affordances within the recording indicators allows a user to easily view and access the capture affordances when viewing content to be captured via the process for capturing media after selecting the capture affordances, which provides improved visual feedback and facilitates the user to properly and quickly construct and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, the computer system (e.g., 700) displays a camera film virtual object (e.g., 1126) within the recording indicator (e.g., 1102) that is selected (e.g., via detection of a user's gaze directed to the display of the camera film virtual object (e.g., gaze and dwell), and in some embodiments, in conjunction with detection of a user performing one or more gestures (e.g., pinch gesture, expand gesture, air tap, air swipe, and/or input on a hardware input device such as a touch-sensitive surface or activation of a button or rotatable input mechanism) and/or via detection of a tap at a point in space corresponding to the display on the camera film virtual object), such that a process (e.g., as described above with respect to fig. 11B) for displaying previously captured media (e.g., via a display generating component) is initiated (e.g., such that the computer system displays previously captured media items) (e.g., static media or video media) (e.g., previously using the one or more camera-captured media items in communication with the computer) (e.g., using a smart phone device, e.g., separately from previously captured media items) (e.g., a computer system). In some implementations, selection of the camera film virtual object ceases to display the augmented reality camera user interface. Displaying the camera film virtual object within the recording indicator allows a user to easily view and access the camera film virtual object while viewing content that is to be captured as part of the process for capturing media, which provides improved visual feedback and facilitates the user to properly and quickly construct and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, the recording indicator (e.g., 1102) includes one or more rounded corners (e.g., 1128 or 1130) (e.g., the radius of each of the rounded corners of the recording indicator is the same).

In some embodiments, when the recording indicator (e.g., 1102) is displayed around a fourth portion of the representation of the physical environment (e.g., a portion within 1102 at fig. 11B) (e.g., the fourth portion of the representation of the physical environment is less than the representation of the entire physical environment) (e.g., the fourth portion of the representation of the physical environment is surrounded by the recording indicator), the computer system (e.g., 700) detects (e.g., via one or more sensors in communication (e.g., wired or wireless communication) with the computer system) a change in the pose of the one or more cameras (e.g., the change in the pose of the one or more cameras corresponds to a user (e.g., A user of the computer system) changes in the field of view) (e.g., movement of the computer system between fig. 11B and 11C). In response to detecting the change in the pose of the one or more cameras, the computer system displays a recording indicator around a fifth portion of the representation of the physical environment without displaying a recording indicator (e.g., portion 704 in 1102 at fig. 11C) around a fourth portion of the representation of the physical environment (e.g., stops displaying the recording indicator around the fourth portion of the representation of the physical environment) (e.g., the fourth portion of the representation of the physical environment remains visible to the user) (e.g., the computer system maintains the display of the recording indicator) (e.g., as the one or more cameras move within the physical environment, the portion of the physical environment within the recording indicator changes) (e.g., the fifth part of the representation of the physical environment is different from the fourth part of the representation of the physical environment). In some embodiments, there is an overlap in content included in the fourth portion of the representation of the physical environment and the fifth portion of the representation of the physical environment. In some embodiments, there is no overlap in the content included in the fourth portion of the representation of the physical environment and the fifth portion of the representation of the physical environment. In some embodiments, in response to detecting the change in the pose of the one or more cameras, a fourth portion of the representation of the physical environment ceases to be visible to the user. In some embodiments, the computer system displays the recording indicator around a fifth portion of the representation of the physical environment and does not display the recording indicator around a fourth portion of the representation of the physical environment in response to detecting a gradual shift in the user's point of view (e.g., the shift in the user's point of view corresponds to a change in the pose of the one or more cameras) (e.g., via one or more sensors coupled to the computer system and/or the one or more cameras in communication with the computer system). In some embodiments, the computer system displays the recording indicator around a fifth portion of the representation of the physical environment and does not display the recording indicator around a fourth portion of the representation of the physical environment in response to detecting a gradual change in content in the physical environment (e.g., the content in the physical environment is shifting while the computer system is stationary) (e.g., via the one or more cameras in communication with the computer system). Stopping displaying the recording indicator around the fourth portion of the representation of the physical environment and displaying the recording indicator around the fifth portion of the representation of the physical environment in response to detecting the change in the pose of the one or more cameras provides the user with the ability to control what portion of the physical environment is to be captured via the media capturing process without displaying additional controls, which provides additional control options without cluttering the user interface.

In some embodiments, aspects/operations of methods 800, 900, 1000, 1200, 1400, and 1500 may be interchanged, substituted, and/or added between the methods. For example, a manner of shifting previews of the fields of view of the one or more cameras in response to detecting a change in the pose of the user's point of view (e.g., as described in method 900) may be applied to the recording indicator of method 1200. For the sake of brevity, these details are not repeated here.

Fig. 13A to 13J illustrate examples of displaying a camera user interface. Fig. 14 is a flow chart of an exemplary method 1400 for displaying information related to capturing media. Fig. 15A-15B are flowcharts of an exemplary method 1500 for changing the appearance of a viewfinder. The user interfaces in fig. 13A to 13J are used to illustrate the processes described below, including the processes in fig. 14 and 15A to 15B.

Fig. 13A illustrates a user 712 holding a computer system 700 comprising a display 702 in a physical environment. The above description of computer system 700 (e.g., as described above with respect to fig. 7A-7Q and 11A-11D) applies to computer system 700 depicted in fig. 13A-13J. The physical environment includes a sofa 709a, a drawing 709b, a first individual 709c1, a second individual 709c2, a television 709d, and a chair 709e. In the embodiment of fig. 13A-13J, the viewpoint of computer system 700 corresponds to the field of view of one or more cameras (e.g., cameras on the back side of computer system 700) in communication (e.g., wired or wireless communication) with computer system 700. Thus, as the computer system 700 moves throughout the physical environment, the point of view of the computer system 700 changes, which causes the field of view of the one or more cameras to change. Although fig. 13A depicts computer system 700 as a tablet, the techniques described below are also applicable to a head-mounted device. In some embodiments in which computer system 700 is a head-mounted device, computer system 700 optionally includes two displays (one for each eye of the user of computer system 700), where each display displays various content. When computer system 700 is a head-mounted device, the appearance of representation 1306 of the physical environment (e.g., as discussed in more detail below) changes based on changes in the user's point of view (e.g., the user rotates their head and/or the user adjusts their position within the physical environment). Further, when computer system 700 is a head-mounted device, content within viewfinder 1318 (e.g., as discussed in more detail below) changes based on changes in the user's point of view (e.g., the user rotates their head and/or the user adjusts their position within the physical environment).

In fig. 13B-13J, computer system 700 is shown in an enlarged view to better illustrate what is visible on display 702. As shown in FIG. 13B, computer system 700 displays a home screen user interface 1314. The home screen user interface 1314 includes a plurality of virtual objects 1310. Each virtual object of plurality of virtual objects 1310 corresponds to a respective application installed on computer system 700. Computer system 700 launches the corresponding application in response to detecting the selection of a respective virtual object of plurality of virtual objects 1310.

The camera application virtual object 1310a corresponds to a camera application installed on the computer system 700. As shown in fig. 13B, computer system 700 includes hardware button 711a. In response to computer system 700 detecting that hardware button 711a is pressed, hardware button 711a is activated. At fig. 13B, the computer system 700 detects an input 1350B (e.g., a tap) corresponding to a selection of the camera application virtual object 1310 a. In some implementations, the input 1350b corresponds to an air gesture (e.g., an air tap, an air pinch, an air expand, and/or an air swipe) detected at a point in space corresponding to the display of the camera application virtual object 1310a (e.g., via one or more cameras in communication with the computer system 700). In some implementations, the input 1350b corresponds to detecting (e.g., by one or more cameras of the computer system 700) a sustained gaze by the user in a display direction of the camera application virtual object 1310 a. In some implementations, the hardware button 711a is not visible to the user when the user is operating the computer system 700 (e.g., when the computer system 700 is an HMD, the hardware button 711a is not within the field of view of the user). In some embodiments, hardware button 711a is a rotatable button. In some embodiments, hardware button 711a is activated in response to computer system 700 detecting that hardware button 711a is rotated. In some embodiments, hardware button 711a is activated in response to computer system 700 detecting that hardware button 711a is pressed and rotated.

At fig. 13C, in response to detecting input 1350b, computer system 700 displays camera user interface 1304. The camera user interface 1304 corresponds to a camera application installed on the computer system 700. As shown in fig. 13C, the camera user interface 1304 includes a representation 1306 of the physical environment. The representation 1306 of the physical environment corresponds to a field of view of the one or more cameras in communication (e.g., wired communication and/or wireless communication) with the computer system 700. Thus, the content included in the representation 1306 of the physical environment changes based on the change in the field of view of the one or more cameras. When the user looks at the display 702, the user may see a representation of the physical environment 1106 and one or more virtual objects displayed by the computer system 700 (e.g., as shown in fig. 13C-13F and 13H-13J). Thus, computer system 700 presents an augmented reality environment via display 702.

At fig. 13C, computer system 700 determines that camera user interface 1304 is displayed for the first time since computer system 700 was initially powered on (e.g., at fig. 13C, computer system 700 first displays camera user interface 1304 during the lifetime of computer system 700). Because the computer system 700 determines that the camera user interface 1304 is the first display, the computer system 700 displays the tutorial 1308 within the camera user interface 1304. The tutorial 1308 includes a graphical representation of the computer system 700, a graphical representation of the hardware button 711a, and instructions on how to capture video and/or photos using the computer system 700.

Course 1308 is a video depicting a representation of a user's finger performing an input on a representation of hardware button 711a (e.g., a cyclic video (e.g., playback of course 1308 displayed by computer system 700 in a repeating pattern)). The instructions included in course 1308 instruct computer system 700 to capture a photo in response to detecting a tap input on hardware button 711a, and computer system 700 to capture a video in response to detecting a press and hold input on hardware button 711 a. In some embodiments, when computer system 700 first launches an application, computer system 700 displays tutorials 1308 within a user interface corresponding to a different application (e.g., a media playback application, an email application, and/or a text messaging application) installed on computer system 700. In some embodiments, course 1308 includes instructions on how to use other functions of computer system 700 (e.g., how to play back media on computer system 700, how to control playback of media on computer system 700, and/or how to browse an internet browser on computer system 700).

As shown in fig. 13C, when computer system 700 displays course 1308 in camera user interface 1304, computer system 700 displays representation 1306 of the physical environment with reduced brightness (e.g., as compared to the brightness with which computer system 700 displays course 1308). Further, as shown in fig. 13C, when the computer system 700 displays the course 1308 in the camera user interface 1304, the computer system 700 displays the course 1308 as overlaid on top of the representation 1306 of the physical environment such that a portion of the representation 1306 of the physical environment is annotated visible.

At fig. 13C, computer system 700 detects input 1350C corresponding to activation of hardware button 711 a. In some embodiments, input 1350c corresponds to a press and hold of hardware button 711a (e.g., computer system 700 detects that hardware button 711a is pressed for more than a predetermined amount of time). In some embodiments, input 1350c corresponds to a tap on hardware button 711a (e.g., computer system 700 detects that hardware button 711a is pressed for less than a predetermined amount of time). In some implementations, the input 1350c corresponds to a rotation of the hardware button 711 a. In some implementations, the computer system 700 displays the course 1308 with a degree of translucency such that the display of the course 1308 does not obscure the appearance of the representation 1306 of the physical environment.

At fig. 13D, in response to detecting input 1350c, computer system 700 stops displaying course 1308. That is, the computer system 700 stops displaying the course 1308 in response to detecting an input (e.g., activation of the hardware button 711 a) depicted by the course 1308. As shown in fig. 13D, computer system 700 maintains the display of camera user interface 1304 even though tutorial 1308 is not displayed. Further, as shown in fig. 13D, as part of stopping displaying the tutorial 1308, the computer system 700 displays the viewfinder 1318 within the camera user interface 1304.

As shown in fig. 13D, computer system 700 displays a reticle virtual object 1320 at each corner of viewfinder 1318. The reticle virtual object 1320 indicates a capture area of the one or more cameras in communication with the computer system 700. When computer system 700 performs a media capturing operation (e.g., computer system 700 captures a photograph and/or video), content from representation 1306 of the physical environment within reticle virtual object 1320 is visible in the resulting media item. In addition, as shown in FIG. 13D, computer system 700 displays exit virtual object 1316. The computer system 700 stops displaying the camera user interface 1304 in response to the computer system 700 detecting an input corresponding to a selection to exit the virtual object 1316. In some embodiments, the reticle virtual object 1320 indicates the boundary of the viewfinder 1318. In some embodiments, viewfinder 1318 indicates the capture area of the one or more cameras in communication with computer system 700. In some embodiments, computer system 700 captures content outside of viewfinder 1318 and/or reticle virtual object 1320 as part of performing a media capture operation. In some embodiments, the reticle virtual object 1320 is a continuous line around the perimeter of the viewfinder 1318.

Note that at fig. 13D, computer system 700 does not display within viewfinder 1318 a virtual object that, when selected, causes computer system 700 to perform a media capturing operation. Instead, as explained in more detail below, the computer system 700 performs a media capturing operation in response to detecting activation of a hardware button coupled to the computer system 700. At fig. 13D, the computer system detects an input 1350D corresponding to a tap to hardware button 711 a. Input 1350d corresponds to the request to capture a photograph. In some embodiments, computer system 700 displays a plurality of camera mode virtual objects outside the display of viewfinder 1318 (e.g., each of the plurality of camera mode virtual objects, when selected, causes the one or more cameras in communication with computer system 700 to be configured to capture a respective type of media item (e.g., a slow motion video, panoramic photograph, and/or portrait style photograph)). In some embodiments, computer system 700 displays exit virtual object 1316 outside viewfinder 1318. In some embodiments, computer system 700 displays course 1308 within/overlaid on top of the display of viewfinder 1318. In some embodiments, when the auxiliary function settings of the computer system 700 are enabled, the computer system 700 displays a shutter button virtual object within the viewfinder 1318 that, when selected, causes the computer system 700 to initiate a process of capturing a media item (e.g., a photograph or video).

13E 1-13E 5 depict various exemplary embodiments of how the computer system 700 responds to detecting the input 1350d (e.g., how the computer system 700 changes the appearance of the reticle virtual object 1320 and/or the representation 1306 of the physical environment in response to detecting the input 1350 d). As explained in more detail below, in each of the exemplary embodiments depicted in fig. 13E 1-13E 5, the computer system 700 changes at least one optical attribute (e.g., brightness, contrast, translucency, and/or size) of the representation 1306 of the physical environment within the viewfinder 1318 in response to detecting the input 1350 d. In some embodiments, computer system 700 changes two or more optical properties of representation 1306 of a physical environment within viewfinder 1317 in response to detecting input 1350 d.

Fig. 13E1 illustrates a first exemplary embodiment in which the computer system 700 determines that input 1350d corresponds to a tap input on the hardware button 711a (e.g., the hardware button 711a is pressed less than a predetermined amount of time (e.g., 0.1 seconds, 0.3 seconds, 0.5 seconds, 0.7 seconds). At fig. 13E1, because the computer system 700 determines that input 1350d corresponds to a tap input on the hardware button 711a, the computer system 700 performs a photograph capture operation. At fig. 13E1, as part of performing the photograph capture operation, the computer system 700 darkens a majority (e.g., greater than 80%) of the representation 1306 of the physical environment within the viewfinder 1318. I.e., at fig. 13E1, the computer system 700 uniformly alters the appearance of the representation 1306 of the physical environment positioned within the viewfinder 1318 at a location greater than a distance threshold from an edge of the viewfinder 1318. At fig. 13E1, the appearance of the representation 1306 of the physical environment is not modified within a distance from the corresponding edge threshold of the viewfinder 1318.

In addition, at FIG. 13E1, computer system 700 stops displaying exit virtual object 1316 as part of performing the photo capture operation. In some implementations, as part of performing the photo capture operation, the computer system 700 also darkens portions of the representation 1306 of the physical environment that are not within the viewfinder 1318. In some implementations, the computer system 700 causes portions of the representation 1306 of the physical environment that are within the viewfinder 1318 to be darker than portions of the representation 1306 of the physical environment that are not within the viewfinder 1318, and vice versa.

Fig. 13E2 illustrates a second exemplary embodiment, wherein the computer system 700 determines that input 1350d corresponds to a tap input on hardware button 711 a. At fig. 13E2, because computer system 700 determines that input 1350d corresponds to a tap input on hardware button 711a, computer system 700 performs a photo capture operation. At fig. 13E2, as part of performing the photo capture operation, computer system 700 darkens representation 1306 of the overall physical environment within viewfinder 1318. At fig. 13E2, the representation 1306 of the total physical environment within viewfinder 1318 is changed uniformly. At fig. 13E2, computer system 700 darkens portions of representation 1306 of the physical environment that are outside viewfinder 1318. As shown in fig. 13E2, computer system 700 causes the portion of representation 1306 of the physical environment that is within viewfinder 1318 to become darker than the portion of representation 1306 of the physical environment that is outside viewfinder 1318. As shown in fig. 13E2, computer system 700 displays reticle virtual object 1320 in white (e.g., in contrast to computer system 700 displaying reticle virtual object 1320 in black at fig. 13D). At fig. 13E2, as part of performing the photo capture operation, computer system 700 changes the color of the reticle virtual object 1320 (e.g., as compared to the appearance of the reticle virtual object 1320 at fig. 13D). In some embodiments, computer system 700 changes the appearance of reticle virtual object 1320 relative to representation 1306 of the physical environment within viewfinder 1318. In some embodiments, as part of performing the photo capture operation, computer system 700 changes the appearance of one or more corners of viewfinder 1318 and/or one or more corners of reticle virtual object 1320 in a manner that is different from the appearance of computer system 700 changing one or more edges of viewfinder 1318 and/or one or more edges of reticle virtual object 1320.

Fig. 13E3 illustrates a third exemplary embodiment, wherein the computer system 700 determines that input 1350d corresponds to a tap input on hardware button 711a (e.g., hardware button 711a is pressed for less than a predetermined amount of time). At fig. 13E3, because computer system 700 determines that input 1350d corresponds to a tap input on hardware button 711a, computer system 700 performs a photo capture operation. At fig. 13E3, as part of performing the photo capture operation, computer system 700 darkens portions of representation 1306 of the physical environment that are within viewfinder 1318 and are a threshold distance from a respective edge of viewfinder 1318. That is, at fig. 13E3, computer system 700 uniformly changes the appearance of representation 1306 of the physical environment, which is positioned at a location within viewfinder 1318 and within a distance threshold of an edge of viewfinder 1318. At fig. 13E3, computer system 700 does not change the appearance of representation 1306 of the physical environment, which is within viewfinder 1318 and is located at a distance from the respective edge of viewfinder 1318 that is greater than a threshold distance.

At fig. 13E3, computer system 700 darkens portions of representation 1306 of the physical environment that are outside viewfinder 1318. As shown in fig. 13E3, computer system 700 darkens portions of representation 1306 of the physical environment that are within viewfinder 1318 and that are a threshold distance from a respective edge of viewfinder 1318 such that the portions are darker than portions of representation 1306 of the physical environment that are outside of viewfinder 1318. In addition, at fig. 13E3, as part of performing the photo capture operation, computer system 700 displays the reticle virtual object 1320 at an increased thickness (e.g., as compared to the thickness of the reticle virtual object 1320 at fig. 13D) and stops displaying the exit virtual object 1316.

Fig. 13E4 shows a fourth exemplary embodiment, wherein computer system 700 determines that input 1350d corresponds to a tap input on hardware button 711 a. At fig. 13E4, because computer system 700 determines that input 1350d corresponds to a tap input on hardware button 711a, computer system 700 performs a photo capture operation. At fig. 13E4, as part of performing the photo capture operation, computer system 700 darkens representation 1306 of the entire physical environment within viewfinder 1318. At fig. 13E4, the appearance of the portion of the representation 1306 of the physical environment that is not within the viewfinder 1318 is not modified. Further, at fig. 13E4, as part of performing the photo capture operation, computer system 700 stops displaying the reticle virtual object 1320 and the exit virtual object 1316 within viewfinder 1318. As shown in fig. 13E4, computer system 700 displays representation 1306 of the physical environment within viewfinder 1318 as darker than representation 1306 of the physical environment outside of viewfinder 1318. In some embodiments, as part of performing the photo capture operation, computer system 700 darkens portions of representation 1306 of the physical environment that are outside viewfinder 1318. In some implementations, the computer system 700 causes portions of the representation 1306 of the physical environment that are within the viewfinder 1318 to be darker than portions of the representation 1306 of the physical environment that are not within the viewfinder 1318, and vice versa.

Fig. 13E5 shows a fifth exemplary embodiment, wherein computer system 700 determines that input 1350d corresponds to a tap input on hardware button 711 a. At fig. 13E5, because computer system 700 determines that input 1350d corresponds to a tap input on hardware button 711a, computer system 700 performs a photo capture operation. At fig. 13E5, computer system 700 darkens the display of reticle virtual object 1320 as part of performing a photo capture operation. At fig. 13E5, computer system 700 does not modify the appearance of representation 1306 of the physical environment within viewfinder 1318 or outside of viewfinder 1318.

It should be noted that the behavior of computer system 700 shown in the exemplary embodiments of fig. 13E 1-13E 5 may be combined. For example, in response to detecting input 1350d, computer system 700 can change the appearance of reticle virtual object 1320, as shown in fig. 13E2, and computer system 700 can change the appearance of the portion of representation 1306 of the physical environment, as shown in fig. 13E1 and/or fig. 13E 3.

At fig. 13F, computer system 700 has completed the photo capture operation initiated in response to computer system 700 detecting input 1350 d. The depiction of computer system 700 as shown in fig. 13F may follow the behavior of computer system 700 as shown in any of fig. 13E 1-13E 5. As shown in fig. 13F, because the computer system 700 has completed the photo capture operation, the computer system 700 will restore the representations 1306 of the display reticle virtual object 1320 and physical environment to their original appearance (e.g., the appearance of the representations 1306 of the reticle virtual object 1320 and physical environment as shown in fig. 13D). That is, as shown in fig. 13E1 to 13E5, the change in the appearance of the reticle virtual object 1318 and the representation 1306 of the physical environment is temporary. After the computer system 700 has completed the photo capture operation, the computer system 700 reverses the change in appearance of the representation 1306 of the reticle virtual object 1320 and the physical environment. In some embodiments, after a predetermined amount of time (e.g., 1 second, 3 seconds, 5 seconds, 10 seconds, 15 seconds, or 30 seconds) has elapsed since the computer system 700 has completed the photo capture operation, the computer system 700 reverses the change in appearance of the representation 1306 of the reticle virtual object 1320 and the physical environment.

As shown in fig. 13F, because the computer system 700 has completed the photo capture operation, the computer system 700 displays a photo pool virtual object 1330 within the viewfinder 1318. The photo pool virtual object 1330 includes a representation of media items most recently captured by the computer system 700. Thus, at FIG. 13F, photo pool virtual object 1330 includes a representation of a photo captured by computer system 700 in response to detecting input 1350 d. As shown in fig. 13F, once the computer system 700 completes the photo capture operation, the computer system 700 lightens the representation 1306 of the physical environment (e.g., as compared to the appearance of the representation 1306 of the physical environment at fig. 13E 1-13E 5), and the computer system 700 displays the exit virtual object 1316 within the viewfinder 1318.

At fig. 13F, computer system 700 detects input 1350F corresponding to the selection of exit virtual object 1316. In some implementations, the input 1350f corresponds to an air gesture (e.g., an air tap, an air pinch, an air expand, and/or an air swipe) detected in space at a point corresponding to exiting the display of the virtual object 1316 (e.g., via the one or more cameras in communication with the computer system 700). In some implementations, after the computer system 700 completes the media capturing operation, the computer system 700 displays a preview of the captured media as faded into the display of the camera user interface 1304 (e.g., the computer system 700 progressively displays the preview of the captured media). In some implementations, after the computer system 700 completes the media capturing operation, the computer system 700 displays a preview of the captured media that is reduced from the first size to the second size and moved to a corner of the viewfinder 1318. In some embodiments, after the computer system 700 performs the media capturing operation, the computer system 700 replaces the display of the reticle virtual object 1320 with a display of a preview of the captured media. In some implementations, after the computer system 700 performs the media capturing operation, the computer system 700 initially displays a preview of the captured media at a size slightly larger than the size of the photo pool virtual object 1330, and the computer system 700 displays the preview of the captured media item as a reduced size and moves to the display position of the photo pool virtual object 1330. In some embodiments, computer system 700 captures a stereoscopic media item (e.g., media items captured from two or more cameras positioned at different locations in a physical environment (e.g., slightly different (e.g., separated by 1 inch, 2 inches, 3 inches, and/or an average pupillary distance of a person), each camera capturing a different perspective of the physical environment) in response to detecting input 1350 d.

At fig. 13G, in response to detecting the input 1350f, the computer system 700 stops displaying the camera user interface 1304 and displays the home screen user interface 1314. At fig. 13G, computer system 700 detects input 1350G corresponding to the selection of camera application virtual object 1310 a. In some implementations, the input 1350g corresponds to an air gesture (e.g., an air tap, an air pinch, an air expand, and/or an air swipe) detected in space at a point corresponding to the display of the camera application virtual object 1310a (e.g., via one or more cameras in communication with the computer system 700).

At fig. 13H, in response to detecting input 1350g, computer system 700 displays camera user interface 1304. At fig. 13H, computer system 700 determines that camera user interface 1304 was previously displayed by computer system 700 (e.g., at fig. 13D). Because the computer system 700 determines that the camera user interface 1304 has been previously displayed by the computer system 700, the computer system 700 displays the camera user interface 1304 without displaying the tutorial 1308. As described above, when the computer system 700 first displays the camera user interface 1304, the computer system 700 displays a tutorial 1308 within the camera user interface 1304. However, after the computer system 700 first displays the camera user interface 1304, the computer system 700 does not display the tutorial 1308 within the camera user interface 1304. At fig. 13H, computer system 700 detects input 1350H corresponding to activation of hardware button 711 a. Input 1350h corresponds to the request to capture video media.

At FIG. 13I, computer system 700 determines that input 1350H is a press and hold input to hardware button 711a (e.g., computer system 700 detects that hardware button 711a is pressed for longer than a predetermined amount of time (e.g., 0.5 seconds, 1 second, 2 seconds, 3 seconds, 4 seconds, or 5 seconds). Because computer system 700 determines that input 1350H is a press and hold input to hardware button 711a, computer system 700 initiates a video capture operation. Thus, at FIG. 13H, computer system 700 is performing a video capture operation where content within a reticle virtual object 1320 will be visible in the resulting video medium when computer system 700 performs a video capture operation. As shown in FIG. 13I, because computer system 700 is performing a video capture operation, computer system 700 reduces the brightness of the appearance of representation 1306 of the physical environment outside viewfinder 1318 (e.g., as compared to representation 1306 of the physical environment at FIG. 13H).

As shown in fig. 13I, because computer system 700 is performing a video capture operation, computer system 700 displays a recording indicator virtual object 1332 within viewfinder 1318. Recording indicator virtual object 1332 provides an indication of the amount of time that has elapsed since the initiation of a video capture operation by computer system 700. Thus, at FIG. 13I, computer system 700 has performed a two second video capture operation. The display of recording indicator virtual object 1332 is part of the visual feedback that computer system 700 displays as part of performing a video capture operation. When the computer system 700 performs a photo capture operation, the computer system 700 displays a first type of feedback (e.g., feedback depicted in any of fig. 13E 1-13E 5), and when the computer system 700 performs a video capture operation, the computer system 700 displays a second type of feedback (e.g., as depicted in fig. 13I).

At FIG. 13I, computer system 700 detects input 1350I corresponding to activation of hardware button 711 a. In some embodiments, when computer system 700 performs a video capture operation, computer system 700 does not modify the brightness of representation 1306 of the physical environment. In some embodiments, computer system 700 performs a video capture operation whenever computer system 700 detects input 1350h. In some embodiments, after computer system 700 detects input 1350h for more than a predetermined amount of time (e.g., 2,3, 4, or 5), computer system 700 continues to perform video capture operations even if computer system 700 stops detecting input 1350h. In some embodiments, computer system 700 changes the appearance of recording indicator virtual object 1332 to indicate to the user that computer system 700 will continue to perform video capture operations even if computer system 700 stops detecting input 1350h. In some embodiments, computer system 700 displays recording indicator virtual object 1332 outside viewfinder 1318.

At fig. 13J, in response to detecting input 1350i, computer system 700 stops performing video capture operations. That is, the computer system 700 initiates a video capture operation in response to detecting a first activation of the hardware button 711a, and the computer system 700 ends the video capture operation in response to detecting a subsequent activation of the hardware button 711 a. As part of ceasing to perform the video capture operation, computer system 700 ceases to display recording indicator virtual object 1332. As described above, the display of photo pool virtual object 1330 includes a representation of the media item most recently captured by computer system 700. Thus, at fig. 13J, computer system 700 updates the display of photo pool virtual object 1330 (e.g., as compared to the appearance of the photo pool virtual object at fig. 13I) such that photo pool virtual object 1330 includes a representation of the video media item captured at fig. 13I. At fig. 13J, as part of stopping the video capture operation, computer system 700 lightens the appearance of representation 1306 of the physical environment within camera user interface 1304 (e.g., as compared to the appearance of representation 1306 of the physical environment at fig. 13I).

Additional description regarding fig. 13A-13J is provided below with reference to methods 1400 and 1500 described with respect to fig. 13A-13J.

FIG. 14 is a flowchart of an exemplary method 1400 for displaying media items, according to some embodiments. In some embodiments, the method 1400 is performed at a computer system (e.g., 700) (e.g., a smart phone, a tablet, and/or a head-mounted device) in communication with a display generation component (e.g., 702) (e.g., a display controller, a touch-sensitive display system, a display (e.g., integrated and/or connected), a 3D display, a transparent display, a projector, a heads-up display, and/or a head-mounted display), one or more input devices (e.g., 711 a) (e.g., a touch-sensitive surface (e.g., a touch-sensitive display), a mouse, a keyboard, a remote control, a visual input device (e.g., a camera), an audio input device (e.g., a microphone), and/or a biometric sensor (e.g., a fingerprint sensor, a facial identification sensor, and/or an iris identification sensor)), and one or more cameras: in some embodiments, the method 1400 is managed by instructions stored in a non-transitory (or transitory) computer-readable storage medium and executed by one or more processors of a computer system (such as the one or more processors 202 of the computer system 101) (e.g., the control 110 in fig. 1). Some operations in method 1400 are optionally combined, and/or the order of some operations is optionally changed.

The computer system detects (1402) via the one or more input devices a request (e.g., 1350b, 1350g, 1350c, 1350 d) to display a camera user interface (e.g., 1304) (e.g., a camera user interface corresponding to a camera application installed on the computer system) (e.g., an air gesture (e.g., an air swipe, an air tap, an air pinch, and/or an air expand) detected by the one or more cameras), a gaze-based request, and/or activation of a hardware input mechanism in communication (e.g., wired and/or wireless communication) with the computer system.

In response to detecting a request to display a camera user interface, the computer system displaying (1404) the camera user interface, wherein the camera user interface includes a reticle virtual object (e.g., 1320) (e.g., a set of one or more contiguous or non-contiguous lines or other shapes) (the capture area being a portion of the fields of view of at least two of the one or more cameras) (e.g., a portion of the overlapping fields of view), indicating that a set of one or more criteria is met, e.g., captured by the one or more cameras, wherein displaying the camera user interface includes determining (1406) that a set of one or more criteria is met, displaying a camera user interface (e.g., 1304) (e.g., how to interpret one or more functions of the computer system) with a tutorial (e.g., 1308) within the camera user interface (e.g., within the reticle virtual object) (e.g., within a guide of fig. 13C) (e.g., wherein the user interface is embedded within the camera user interface), wherein the user interface is provided at a set of top of points in accordance with the set of physical criteria that are not met, the user interface is displayed in accordance with the set of computer system (e.g., the user interface) and the set of physical criteria is not met is displayed at the top of the user interface (e.g., the user interface) is determined, the tutorial is displayed while a representation of the physical environment is displayed to the user. In some embodiments, the display of the course is environment-locked. In some embodiments, the display of the course is view-locked. In some embodiments, the course is a cyclic animation (e.g., the animation repeats itself). In some implementations, the appearance of the representation of the physical environment is changed (e.g., dimmed) in response to detecting a request to display the camera user interface. In some embodiments, once the computer system has completed playback of the course, the computer system does not continue to play back the course. In some implementations, the course includes a combination of two different types of media (e.g., video, still photographs, and/or textual descriptions). In some embodiments, after the computer system completes playback of the course, the computer system displays a representation of the course (e.g., a still photo representation of the last video frame of the course). Displaying a camera user interface with a tutorial (e.g., initially displaying the camera user interface) when a set of conditions is met automatically allows the computer system to perform a display operation that provides information to the user about how to use the media capturing function of the computer system at a point in time when the user has limited information about how to operate the media capturing function of the computer system, which performs the operation without further user input when a set of conditions has been met.

In some embodiments, the set of one or more criteria includes criteria (e.g., as discussed above with respect to fig. 11C) that are met when the camera user interface (e.g., 1304) is initially displayed (e.g., when the camera user interface is first displayed after the computer system is first powered on or when the camera user interface is first displayed after a software update) (e.g., since the computer system was first powered on, the computer system first executes a camera application associated with the camera user interface). In some embodiments, the criteria include criteria that are met when the camera user interface is initially displayed after the operating system of the computer system has been reset. In some implementations, after initially displaying the camera user interface, the computer system displays the tutorial in response to detecting a request to display the tutorial. In some implementations, the set of one or more criteria is not met when the second instance of the camera user interface is displayed after the previous instance of the camera user interface is displayed with the tutorial. In some embodiments, the set of one or more criteria is not met when the computer system redisplays the camera user interface after the computer system initially displays the camera user interface. In some embodiments, after the computer system initially displays the camera user interface, the set of one or more criteria is not met when the computer system first displays the camera user interface during a discrete period of time (e.g., the computer system first displays the camera user interface during a day). In some embodiments, the set of one or more criteria is not met when the computer system first displays the camera user interface after the computer system has been restarted after the computer system initially displays the camera user interface. In some embodiments, the set of one or more criteria is not met when the computer system first displays the camera user interface after a period of time has elapsed (e.g., 3 days, 1 week, 1 month, and/or 1 year) since the computer system previously displayed the camera user interface. Displaying a tutorial when a camera user interface is first displayed provides visual feedback to a user as to whether the camera user interface has been displayed by a computer system in the past, which provides improved visual feedback.

Displaying the tutorial (e.g., 1308) includes displaying instructions (e.g., text instructions and/or graphics instructions) for capturing a first media item (e.g., the media items captured in fig. 13E 1-13E 5 and/or 13I) using the one or more cameras (e.g., as discussed above with respect to fig. 13C and 13I) (e.g., still photographs and/or videos). In some embodiments, the tutorial includes instructions on how to operate two or more media capturing functions (e.g., a photo capturing function and a video capturing function) of the computer system. Displaying instructions for capturing media items when the camera user interface is first displayed provides the user with clear information about how to operate the media capturing function of the computer system at a point in time when the user is unfamiliar with the media capturing function of the computer system, which reduces the amount of user error when the user is first using the media capturing function of the computer system.

In some embodiments, the course (e.g., 1308) includes video (e.g., as discussed above with reference to fig. 13C) (e.g., a cyclic video) (e.g., the computer system plays back the course continuously in a repeated cycle). In some embodiments, the computer system plays back video of a course. In some implementations, the computer system displays one or more controls for controlling playback of video of the course (e.g., a user may pause, rewind, and/or fast forward playback of video). In some embodiments, the computer system stops playing back the video of the tutorial in response to detecting a request to stop playing back the video of the tutorial. Displaying a video that includes instructions on how to capture media with a computer system provides a user with clear and explicit guidance on how to use the media capturing function of the computer system at a point in time when the user is unfamiliar with the media capturing function of the computer system, which reduces the amount of user error when the user is first using the media capturing function of the computer system.

In some embodiments, displaying the camera user interface (e.g., 1304) includes displaying a viewfinder virtual object (e.g., 1318) (e.g., the viewfinder virtual object indicates a capture area of the one or more cameras in communication with the computer system), and wherein displaying the tutorial (e.g., 1308) includes displaying a tutorial (e.g., as described above with respect to fig. 13D) that covers at least a portion of the viewfinder virtual object (e.g., the tutorial is displayed within the viewfinder (e.g., in a center of the viewfinder).

In some embodiments, a computer system (e.g., 700) communicates (e.g., wireless communication and/or wired communication) with a hardware input mechanism (e.g., 711 a) (e.g., a depressible and/or rotatable hardware input mechanism) (e.g., the hardware input mechanism is not visible to a user when displaying a tutorial) (e.g., a side button integrated into the computer system) that, when activated (e.g., when the hardware input mechanism is depressed), causes a media capture process (e.g., as shown in fig. 13E 1-13E 5 and 13I) (e.g., captures video or static media) to be initiated, wherein the tutorial (e.g., 1308) includes a representation of the hardware input mechanism (e.g., a graphical representation of the hardware input mechanism) (ISE, including a representation of the computer system (e.g., a graphical representation)), and wherein displaying the tutorial includes displaying an input (e.g., tap input and/or press and hold input) (e.g., a representation of the user input) (e.g., a graphical representation) (e.g., a representation of a user input) (e.g., a representation of a selection of the hardware input) (e.g., a tutorial mechanism as described above with respect to fig. 13C)). Displaying a representation of the input corresponding to the selection of the representation of the hardware input mechanism provides the user with clear and explicit guidance on how to use the media capturing function of the computer system at a point in time when the user is unfamiliar with the media capturing function of the computer system, which reduces the amount of user error when the user is first using the media capturing function of the computer system.

In some embodiments, when a user operates a computer system (e.g., 700), a hardware input mechanism (e.g., 711 a) (e.g., when activated, such that a media capture process is initiated) is not visible to the user (e.g., when the computer system is an HMD) displays a tutorial (e.g., the computer system is not in the user's field of view) (e.g., the computer system prevents the user from viewing the hardware input mechanism's ability) (e.g., when the computer system displays a tutorial, the hardware input mechanism is not visible to the user). Allowing the user to operate the computer system when the hardware input mechanism is not visible to the user prevents the hardware input mechanism from interfering with the user's ability to view information and/or instructions displayed by the computer (e.g., the user may clearly view a tutorial displayed by the computer system), which provides the computer system with the ability to efficiently communicate highly important information regarding the location and functionality of the hardware input mechanism in the event that the user cannot see the hardware input mechanism, which increases the effectiveness of the computer system's ability to communicate information and/or instructions to the user.

In some implementations, the computer system detects a first activation of the hardware input mechanism (e.g., 1350c, 1350h, 1350i on 711 a) (e.g., the hardware input mechanism is pressed and/or rotated), wherein the first activation of the hardware input mechanism is a first type of input (e.g., an input corresponding to a tap input (e.g., the hardware input mechanism is pressed less than a threshold amount of time (e.g., 0.1 seconds, 0.3 seconds, 0.5 seconds, 0.7 seconds, 1 second, or 2 seconds)) and/or an input that does not include a persistent input component). In some embodiments, in response to detecting the first activation of the hardware input mechanism, the computer system captures a second media item (e.g., a photograph) using the one or more cameras (e.g., as described above with respect to fig. 13E 1-13E 5 and 13I) (e.g., content within the timeline virtual object is visible in the second media item at the time of the computer system detecting the first activation of the hardware input mechanism). In some implementations, the computer system detects a first activation of the hardware input mechanism while the camera user interface is displayed and the tutorial is not displayed. Capturing the second media item in response to detecting activation of the hardware input mechanism allows the user to control a media capturing process of the computer system without displaying additional controls, which provides additional control options without cluttering the user interface.

In some embodiments, the computer system detects a second activation of the hardware input mechanism (e.g., 1350h, 1350I, 1350c on 711 a) where the second activation of the hardware input mechanism corresponds to a second type (ISE, different from the first type) of input that includes maintaining the input for a predetermined period of time (e.g., as described above with respect to fig. 13I) (e.g., pressing and holding (e.g., the hardware input mechanism is pressed for greater than a threshold amount of time (0.5 seconds, 1 second, 2 seconds, 3 seconds, 5 seconds, or 8 seconds)) (e.g., as described above with respect to fig. 13I). In some embodiments, the computer system captures a third media item (e.g., as described above with respect to fig. 13I) (e.g., video) (e.g., content within a virtual object of a reticle is visible in the third media item during a duration of capture of the third media item). In some embodiments, without a confusion is not included to maintain the input mechanism for a predetermined period of time in a threshold amount of time (e.g., without requiring the camera to provide additional user interface display of the user interface to detect the second input mechanism to be activated in response to detecting the second activation of the hardware input mechanism.

In some embodiments, the computer system detects a third activation (e.g., long press (e.g., press and hold) or short press (e.g., press and release)) of the hardware input mechanism (e.g., 1350c, 1350h, or 1350 i) when the camera user interface (e.g., 1304) is displayed with the tutorial (e.g., 1308) (e.g., directly (e.g., on the solid state input mechanism) or indirectly (e.g., not on the solid state input mechanism) in response to the computer system detecting pressure) (e.g., the hardware input mechanism is pressed and/or rotated) in some embodiments, in response to detecting a third activation of the hardware input mechanism, stopping the display of the tutorial (e.g., as described above in fig. 13D) (e.g., and maintaining the display of the camera user interface and the virtual object of the reticle). In some embodiments, stopping the display of the tutorial includes changing the appearance of the representation of the physical environment (e.g., from blurred to non-blurred, vice versa)) in some embodiments, when the computer system detects that the hardware input mechanism is activated (e.g., the solid state input mechanism is pressed and/or indirectly (e.g., not on the solid state input mechanism) and the haptic control input mechanism is not required to be activated in response to the second tactile input mechanism, doing so provides additional control options without cluttering the user interface. Stopping the display of the tutorial in response to detecting the third activation of the hardware input mechanism provides visual feedback to the user regarding the status of the computer system (e.g., the computer system has detected the activation of the hardware input mechanism), which provides improved visual feedback.

In some embodiments, in accordance with a determination that a set of criteria is met (e.g., auxiliary function settings of the computer system are enabled), the camera user interface (e.g., 1304) includes a camera shutter virtual object that, when selected, initiates a process for capturing media items (e.g., still photographs and/or video) (e.g., the camera shutter virtual object is displayed within a reticle virtual object) (e.g., as discussed above with respect to fig. 13D), and in accordance with a determination that the set of criteria is not met (e.g., auxiliary function settings of the computer system are not enabled), the camera user interface does not include a camera shutter virtual object (e.g., selectable virtual object) for initiating a process for capturing media items (e.g., still photographs and/or video) (e.g., as discussed above with respect to fig. 13D). In some implementations, the camera shutter virtual object is displayed within the reticle virtual object. In some implementations, the computer system stops displaying the camera shutter virtual object in response to determining that the settings of the computer system are disabled. In some implementations, in response to detecting an input corresponding to selection of the camera shutter virtual object, the computer system initiates a process for capturing the media item (e.g., the computer optionally adds the captured media item to a media library (e.g., on the computer system and/or on a cloud server) as part of capturing the media item) (e.g., the captured media includes content within the reticle virtual object from an augmented reality environment, as discussed below). In some implementations, the camera user interface includes a camera shutter virtual object when the camera user interface is displayed without displaying the tutorial (e.g., and displaying the reticle virtual object). Displaying the camera shutter virtual object when a prescribed set of conditions is met (e.g., a first set of criteria is met (e.g., auxiliary function settings of the computer system are enabled)) automatically allows the computer system to perform display operations that help facilitate the media capturing process, which performs the operations when the set of conditions has been met without further user input.

In some embodiments, the set of criteria includes criteria that are met when a setting (e.g., auxiliary function setting) of the computer system (e.g., 700) is enabled (e.g., enabled by a user) (e.g., as described above with respect to fig. 13D). In some embodiments, the setting is enabled in response to the computer system detecting that the user's gaze is directed to the display of the virtual object corresponding to the setting for more than a predetermined amount of time (e.g., 1 second, 3 seconds, 5 seconds, or 7 seconds). In some embodiments, the set of criteria is not met when the settings of the computer system are not enabled. In some embodiments, the set of criteria is not met when the set of computer systems transitions from enabled to disabled.

In some implementations, the camera user interface (e.g., 1304) includes a closed virtual object (e.g., 1316) that, when selected (e.g., selected via detection of an air gesture (e.g., an air tap, an air pinch, an air expand, and/or an air swipe) at a point in space corresponding to display of the closed virtual object) causes display of the camera user interface to cease (e.g., as described above with respect to fig. 13G). In some embodiments, the computer system displays a user interface corresponding to a home screen user interface of the computer system (e.g., a user interface containing a plurality of selectable virtual objects corresponding to various applications installed on the computer system) in response to detecting a selection of a closed virtual object. Displaying the closed virtual object within the camera user interface allows a user to easily view and access the closed virtual object while viewing content that is to be captured as part of the process for capturing media, which provides improved visual feedback and facilitates the user to properly and quickly build and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some implementations, a camera user interface (e.g., 1304) is displayed within the augmented reality environment (e.g., the camera user interface is displayed overlaid on top of the augmented reality environment), wherein a first portion of the augmented reality environment (e.g., less than the entire augmented reality environment) is displayed within the reticle virtual object (e.g., as shown in fig. 13D-13E 5, 13F, 13H, and 13I). In some implementations, a first portion of the augmented reality environment is visible to a second user (e.g., a user of a computer system). In some embodiments, the computer system changes a portion of the augmented reality environment displayed within the reticle vertical object in response to detecting a change in the viewpoint of the user. In some implementations, a first portion of the augmented reality environment is visible to a user in the resulting media item. Displaying the first portion of the augmented reality environment within the marked virtual object in the event that the computer system detects a request to capture media informs the user what content the computer system will capture, which allows the user to check and potentially change what content will be captured via the media capture process before the computer system performs the media capture process, which provides improved visual feedback and enhances the privacy and security of the computer system by informing the user of what will be visible in the resulting media item. Enhancing the privacy and security of the computer system enhances the operability of the device and makes the user-device interface more efficient (e.g., by helping the user capture content that the user intends to capture but not capture content that the user does not intend to capture), which in turn reduces power usage and extends battery life of the device by enabling the user to use the device more quickly and efficiently.

In some implementations, the computer system detects a request to capture a fourth media item (e.g., 1350h, 1350c, 1350i on 711 a) (e.g., detects activation of a hardware input mechanism in communication with the computer system, detects a voice command, and/or detects an air gesture (e.g., air pinch, air expand, air tap, and/or air swipe)). In some embodiments, in response to detecting a request to capture a fourth media item, the computer system captures a fourth media item (e.g., a media item captured in fig. 13E 1-13E 5 or 13I) (e.g., a photograph or video), wherein the fourth media item is a stereoscopic media item (e.g., as discussed above with respect to fig. 13F) (e.g., the fourth media item is captured from a set of cameras (e.g., two or more cameras) located at a common location in the physical environment, wherein each camera in the set of cameras captures a unique perspective of the physical environment (e.g., a perspective of the physical environment captured by a first camera in the set of cameras is different from a perspective of the physical environment captured by a second camera in the set of cameras)) (e.g., the fourth media item is a stereoscopic still photograph and/or stereoscopic video). In some implementations, the computer system detects a request to capture a fourth media item while the camera user interface is displayed and the tutorial is not displayed. Capturing the stereoscopic media items allows the user to perceive depth between the content included in the fourth media item, which provides the user with a more accurate sense of real world positioning relationships between the content included in the fourth media item, resulting in an improved and more accurate media capturing process.

In some embodiments, aspects/operations of methods 800, 900, 1000, 1200, 1400, and 1500 may be interchanged, substituted, and/or added between the methods. For example, as discussed in method 1000, the tutorial discussed in method 1400 is optionally displayed when the user first views previously captured media. For the sake of brevity, these details are not repeated here.

Fig. 15A-15B are flowcharts of an exemplary method 1500 for displaying media items according to some embodiments. In some embodiments, the method 1500 is performed at a computer system (e.g., 700) (e.g., a smart phone, a tablet, and/or a head-mounted device) in communication with a display generation component (e.g., 702) (e.g., a display controller, a touch-sensitive display system, a display (e.g., integrated and/or connected), a 3D display, a transparent display, a projector, a heads-up display, and/or a head-mounted display), one or more cameras, and one or more input devices (e.g., 711 a) (e.g., a touch-sensitive surface (e.g., a touch-sensitive display), a mouse, a keyboard, a remote control, a visual input device (e.g., a camera), an audio input device (e.g., a microphone), and/or a biometric sensor (e.g., a fingerprint sensor, a facial identification sensor, and/or an iris identification sensor). In some embodiments, the method 1500 is managed by instructions stored in a non-transitory (or transitory) computer-readable storage medium and executed by one or more processors of a computer system (such as the one or more processors 202 of the computer system 101) (e.g., the control 110 in fig. 1). Some operations in method 1500 are optionally combined and/or the order of some operations is optionally changed.

The computer system displays (1502) a user interface via a display generation component, the user interface comprising: a representation (e.g., 1306) of the physical environment (1504) (e.g., an optically transparent representation or a virtual representation), wherein a first portion of the representation of the physical environment is within a capture area of the one or more cameras (e.g., a portion of the representation of the physical environment that is within 1318) (e.g., a first portion of the representation of the physical environment that is within a viewfinder) (e.g., a first portion of the representation of the physical environment that is visible in a resulting media item captured by the one or more cameras), and a second portion of the representation of the physical environment (e.g., a portion that is different from the first portion) is outside of a capture area of the one or more cameras (e.g., a portion of the representation of the physical environment that is outside of 1318) (e.g., a second portion of the representation of the physical environment that is not visible in the resulting media item captured by the one or more cameras) (e.g., a second portion of the representation of the physical environment that is outside of the viewfinder), and a viewfinder (1506) (e.g., at least a first portion of the representation of the physical environment that is within the viewfinder), wherein the viewfinder includes a boundary (e.g., a boundary that separates the one or more cameras from the capture area or more than one or more camera's within a solid line) and/or surrounding the capture area.

Upon display of the user interface, the computer system detects (1508), via the one or more input devices, a first request (e.g., an air gesture (e.g., an air swipe, an air tap, an air pinch, and/or an air expand) and/or activation of a hardware input mechanism in communication (e.g., wireless communication or wired communication) with the computer system to capture media (e.g., capture media using the one or more cameras).

Responsive to (1510) detecting a first request to capture media, the computer system captures (1512) a first media item (e.g., still photograph or video) comprising at least a first portion of a representation of a physical environment using the one or more cameras; And the computer system changes (1514) an appearance of the viewfinder (e.g., 1318 at fig. 13E 1-13E 5) (e.g., changes a translucency of the viewfinder, obscures an interior of the viewfinder (e.g., a portion of the interior or the entire interior)), wherein changing the appearance of the viewfinder includes changing (1516) an appearance (e.g., a portion of 1306) of a first content portion (e.g., a portion of 1306) that is within a threshold distance (e.g., 0.1 inch, 0.25 inch, 0.5 inch, 1 inch, 3 inch or 5 inch) of a first side (e.g., an edge) of a boundary of the viewfinder (e.g., a left boundary of 1318), Blurring) (e.g., content visible to a user) (e.g., a portion of a representation of a physical environment) (e.g., content within and/or near a viewfinder), and changing (1518) an appearance of a second portion of content (e.g., a portion of 1306) within a threshold distance (e.g., 0.1 inch, 0.25 inch, 0.5 inch, 1 inch, 3 inches, or 5 inches) of a second side (e.g., edge) of a boundary of the viewfinder (e.g., right boundary of 1318) that is different from the first side of the boundary of the viewfinder (e.g., when the appearance of the viewfinder and the appearance of the content are changed), the representation of the physical environment remains visible to the user). In some embodiments, changing the appearance of the viewfinder includes changing the appearance of the first portion of the viewfinder without changing the appearance of the second portion of the viewfinder. In some embodiments, changing the appearance of the viewfinder includes changing a first portion of the viewfinder in a first manner and changing a second portion of the viewfinder in a second manner that is different from the first manner. In some embodiments, the entire viewfinder is changed in a uniform manner. In some embodiments, when the appearance of the viewfinder changes, the content within the viewfinder is not visible. In some embodiments, the appearance of the viewfinder is changed in a different manner than the first and second content portions. in some implementations, the appearance of the first content portion is changed in a different manner than the appearance of the second content portion. In some embodiments, two or more visual characteristics (e.g., color, size, shape, translucency, and/or brightness) of the viewfinder and the content are changed. In some implementations, a computer system captures a media item (e.g., a photograph or video) in response to detecting a request to capture the media item, and a representation of the captured media item is displayed inside the viewfinder. In some implementations, the user interface is an augmented reality user interface. Changing the appearance of the viewfinder in response to detecting a request to capture media provides visual feedback to the user regarding the status of the computer system (e.g., the computer system has detected a request to capture media), which provides improved visual feedback to the user. providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by informing a user that the user can review and/or edit captured media. Furthermore, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capture operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system.

In some embodiments, the first media item is a stereoscopic media item (e.g., as described above with respect to fig. 13F) (e.g., a media item comprising two or more images captured from a set of cameras positioned at a common point in the physical environment (e.g., a first respective camera of the set of cameras captures a first image and a second respective camera captures a second image)), wherein each camera of the set of cameras captures a different perspective of the physical environment (e.g., a perspective of the physical environment captured by the first respective camera of the set of cameras is different from a perspective of the physical environment captured by the second respective camera of the set of cameras).

In some implementations, before detecting the first request to capture media (e.g., 1350D, 1350h, 1350i, or 1350D on 711 a), the viewfinder (e.g., 1318) is displayed in a first appearance (e.g., the appearance of 1318 at fig. 13D), and wherein changing the appearance of the viewfinder includes displaying the viewfinder in a second appearance (e.g., the appearance of 1318 at fig. 13E 1-13E 5) that is different than the first appearance (e.g., the amount of brightness, translucency, contrast, and/or size of the viewfinder is different when the viewfinder is displayed in the second appearance as compared to when the viewfinder is displayed in the first appearance). In some implementations, after a period of time (e.g., 0.1 seconds, 0.3 seconds, 0.5 seconds, 0.7 seconds, 1 second, 1.5 seconds, or 2 seconds) has elapsed for the viewfinder to be displayed in the second appearance (e.g., the first period of time has elapsed since the computer system has captured the first media item), the computer system changes (e.g., automatically changes, without further user input) the appearance of the viewfinder from the second appearance to the first appearance (e.g., as explained above in fig. 13F) (e.g., the appearance of the viewfinder reverts to the initial appearance of the viewfinder). In some embodiments, the representation of the physical environment remains visible to the user when the appearance of the viewfinder changes from the second appearance to the first appearance. In some embodiments, changing the appearance of the viewfinder from the second appearance to the first appearance includes changing two or more visual properties (e.g., size, brightness, translucency, and/or contrast) of the viewfinder. In some embodiments, changing the appearance of the viewfinder from the second appearance to the first appearance includes ceasing to display the viewfinder for a period of time (e.g., 0.1 seconds, 0.3 seconds, 0.5 seconds, 0.7 seconds, 1 second, 1.5 seconds, 2 seconds). Changing the appearance of the viewfinder from the second appearance to the first appearance provides visual feedback to the user that the period of time has elapsed since the first media item was captured by the computer system, which provides improved visual feedback. Providing improved visual feedback that media capturing operations are performed enhances the privacy and security of a computer system by informing a user that captured media can be reviewed and/or edited by the user to ensure that the captured media includes content that the user intended to capture and not content that the user did not intend to capture.

In some implementations, the appearance of the first content portion (e.g., the first portion of 1306 within 1318) and the appearance of the second content portion (e.g., the second portion of 1306 within 1318) change in the same manner (e.g., one or more optical properties (e.g., brightness, translucence, size, and/or contrast) of both the first content portion and the second content portion change in the same manner). In some embodiments, the appearance of the viewfinder, the second content portion, and the first content portion are changed in the same manner.

In some embodiments, changing the appearance of the viewfinder (e.g., 1318) includes changing the appearance of a third content portion (e.g., a portion of 1306 within 1318) that is within a threshold distance of a third side (e.g., a third side that is different from the first side and the second side of the boundary) of the viewfinder (e.g., a top/bottom boundary of 1318) (e.g., a third content that is different from the second content portion and the first content portion), and wherein the appearances of the first content portion, the second content portion, and the third content portion change in the same manner (e.g., one or more optical properties (e.g., brightness, translucence, size, and/or contrast) of both the first content portion, the second content portion, the third content portion, and the viewfinder change in the same manner).

In some implementations, the boundary of the viewfinder (e.g., 1318) is a marked virtual object (e.g., 132) (e.g., a series of contiguous or non-contiguous lines displayed around the perimeter of the viewfinder or at one or more corners of the viewfinder) (e.g., the marked virtual object indicates a capture area of the one or more cameras in communication with the computer system), and wherein the marked virtual object is displayed in a first appearance (e.g., 1320 at fig. 13D) before a first request to capture media (e.g., 1350c, 1350D, 1350h, and/or 1350 i) is detected. In some implementations, in response to detecting the first request to capture media, the computer system changes the appearance of the reticle virtual object from a first appearance to a second appearance (e.g., 1320 at fig. 13E2, 13E3, and/or 13E 4) that is different from the first appearance (e.g., brightness, translucency, contrast, positioning, and/or size change of the reticle virtual object). In some embodiments, the appearance of the reticle virtual object changes from the second appearance back to the first appearance after the corresponding media item is captured by the computer system. In some embodiments, changing the appearance of the reticle virtual object from the first appearance to the second appearance includes changing a portion (e.g., less than the entire reticle) of the reticle virtual object. In some embodiments, in response to detecting the request to capture media, the display of the reticle virtual object is stopped for a period of time. Changing the appearance of the reticle from the first appearance to the second appearance in response to detecting a request to capture media provides visual feedback to the user regarding the status of the computer system (e.g., the computer system has detected a request to capture media), which provides improved visual feedback. Providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by informing a user that captured media can be reviewed and/or edited by the user to ensure that the captured media includes content that the user intended to capture and not content that the user did not intend to capture. Furthermore, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capture operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system. Changing the appearance of the reticle virtual object rather than performing an animation that is unrelated to the display of the reticle virtual object reduces the number of elements displayed on the display of the computer system, which reduces the amount of field of view of the one or more cameras obscured by the display of the virtual element, thereby reducing the power consumption of the computer system.

In some embodiments, displaying the viewfinder (e.g., 1318) includes displaying one or more elements (e.g., 1306, 1316, 1330, and/or 1332) (e.g., content) (e.g., one or more selectable virtual objects within the viewfinder) (e.g., content (e.g., a representation of a physical environment)) within the viewfinder, and wherein changing the appearance of the reticle virtual object (e.g., 1320) includes changing the appearance of the reticle virtual object relative to the one or more elements displayed within the viewfinder (e.g., changing one or more optical properties of the reticle virtual object relative to one or more optical properties of the one or more elements within the viewfinder) (e.g., changing one or more optical properties of the reticle virtual object by a different magnitude than the same one or more optical properties of the one or more elements within the viewfinder). Changing the appearance of the reticle virtual object rather than performing an animation that is unrelated to the display of the reticle virtual object reduces the number of elements displayed on the display of the computer system, which reduces the amount of field of view of the one or more cameras obscured by the display of the virtual element, thereby reducing the power consumption of the computer system.

In some embodiments, displaying the user interface (e.g., 1304) includes displaying one or more corners (e.g., 1318 and/or 1320) (e.g., rounded corners) (e.g., displaying the one or more corners within the viewfinder) (e.g., displaying the corners at the one or more corners of the viewfinder) (e.g., the one or more corners are part of a virtual object of the reticle) (e.g., the one or more corners are independent of both the viewfinder and the virtual object of the reticle), and wherein changing the appearance of the viewfinder (e.g., 1318) in a first manner (e.g., the color of one or more corners of the virtual object of the reticle is changed to a first color (e.g., white 1320, black, yellow and/or orange) (e.g., the appearance at fig. 13E2 and/or 13E 3) and changing the appearance of at least one side of the boundary of the viewfinder in a second manner different from the first manner (e.g., the appearance of the one side of the viewfinder) in a first manner) (e.g., the appearance of the boundary of the one side of the viewfinder is changed to the first side in a first manner (e.g., the appearance of the boundary of the one side of the viewfinder is changed to the one side of the boundary of the one or more corners of the one side of the viewfinder) in some embodiments, or more than the boundary of the one side is changed to the first side of the boundary of the one side is changed to the one side of the boundary of the one side and the one side of the boundary of the one side and the one side is different than the one side is changed to the one side of the boundary of the one side and the one side is further is different from the one side is further of the one, the one or more angles and the same optical properties (e.g., brightness, translucency, and/or size) of the first side of the boundary of the viewfinder are changed. In some embodiments, the appearance of the first side of the boundary of the viewfinder and the appearance of the one or more corners change inversely (e.g., the first side of the boundary of the viewfinder is darkened and the one or more corners are brightened). In some embodiments, the one or more corners are not connected. Changing the appearance of one or more corners differently than the way in which the appearance of the first side of the boundary is changed as part of performing the media capturing process enables the computer system to more effectively alert the user that a media item has been captured, in contrast to the computer system changing the appearance of the reticle as part of performing the media capturing process without changing the appearance of the other parts of the viewfinder, thereby producing improved visual feedback. Providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by more reliably informing a user that captured media can be reviewed and/or edited by the user to ensure that the captured media includes content that the user intended to capture and not content that the user did not intend to capture. Furthermore, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capture operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system.

In some embodiments, changing the appearance of the viewfinder (e.g., 1318) includes changing a first set of one or more optical properties (e.g., a value of the first optical property increases or decreases) of a first portion (e.g., 1306) of the content within the viewfinder (e.g., greater than 50%, 70%, 80%, 90%, or 95% of the content within the viewfinder for a majority of the content) (e.g., as explained above with respect to fig. 13E 1-13E 5). In some implementations, the first content portion changes inversely with respect to how the second content portion changes (e.g., the first content portion darkens and the second content portion lightens). In some embodiments, two or more optical properties of the content within the viewfinder are changed. In some embodiments, two or more optical properties of the content within the viewfinder change in an opposite manner (e.g., the value of the first optical property increases and the value of the second optical property decreases). In some embodiments, the computer system changes the first optical property of the content in the viewfinder before the computer system changes the second optical property of the content in the viewfinder. Changing the optical properties of the content within the viewfinder in response to detecting a request to capture media provides visual feedback to the user regarding the status of the computer system (e.g., the computer system detecting a request to capture media) and alerts the user that a media item has been captured by the computer system that the user can view to confirm that the captured media does not contain content that the user does not intend to capture, which provides improved visual feedback. Providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by informing a user that the user can review and/or edit captured media. Furthermore, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capture operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system.

In some implementations, the first set of one or more optical properties includes a contrast (e.g., an amount of contrast between the content in the viewfinder and the representation of the physical environment increases and/or decreases) of the content (e.g., 1306) within the viewfinder (e.g., 1318) (e.g., an amount of contrast between the first portion of the content and the second portion of the content). Changing the contrast of content within the viewfinder in response to detecting a request to capture media provides visual feedback to the user regarding the status of the computer system (e.g., the computer system detecting a request to capture media) and alerts the user that a media item has been captured by the computer system that the user can view to confirm that the captured media does not contain content that the user does not intend to capture, which provides improved visual feedback. Providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by informing a user that the user can review and/or edit captured media. Furthermore, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capture operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system.

In some implementations, the first set of one or more optical properties includes a brightness (e.g., an increase and/or decrease in an amount of brightness of content in the viewfinder) of content (e.g., 1306) within the viewfinder (e.g., 1318). In some embodiments, the brightness of a first portion of the content included in the viewfinder is increased and the brightness of a second portion of the content included in the viewfinder is decreased. Changing the brightness of the content within the viewfinder in response to detecting a request to capture media provides visual feedback to the user regarding the status of the computer system (e.g., the computer system detecting a request to capture media) and alerts the user that a media item has been captured by the computer system that the user can view to confirm that the captured media does not contain content that the user does not intend to capture, which provides improved visual feedback. Providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by informing a user that the user can review and/or edit captured media. Furthermore, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capture operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system.

In some implementations, the first set of one or more optical properties includes translucency (e.g., an increase and/or decrease in an amount of translucency of content (e.g., 1306) within the viewfinder (e.g., 1318)). In some embodiments, the translucency of the first portion of the content included in the viewfinder is increased and the translucency of the second portion of the content included in the viewfinder is decreased. Changing the translucence of content within the viewfinder in response to detecting a request to capture media provides visual feedback to the user regarding the status of the computer system (e.g., the computer system detecting a request to capture media) and alerts the user that a media item has been captured by the computer system that the user can view to confirm that the captured media does not contain content that the user does not intend to capture, which provides improved visual feedback. Providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by informing a user that the user can review and/or edit captured media. Furthermore, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capture operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system.

In some implementations, the first set of one or more optical properties includes a size of content (e.g., 1306) within the viewfinder (e.g., 1318) (e.g., the size of content within the viewfinder increases and/or decreases). In some embodiments, the size of the content within the viewfinder increases before the size of the content within the viewfinder decreases, and vice versa. In some embodiments, the first portion of the content included in the viewfinder increases in size and the second portion of the content included in the viewfinder decreases in size. Changing the size of the content within the viewfinder in response to detecting the request to capture media provides visual feedback to the user regarding the status of the computer system (e.g., the computer system detecting the request to capture media) and alerts the user that a media item has been captured by the computer system that the user can view to confirm that the captured media does not contain content that the user does not intend to capture, which provides improved visual feedback. Providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by informing a user that the user can review and/or edit captured media. Furthermore, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capture operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system.

In some implementations, displaying the user interface (e.g., 1304) includes displaying a first set of virtual control objects (e.g., 1316, 1320, 1332, and/or 1330) (e.g., one or more virtual control objects) (e.g., one or more media capturing control virtual objects) within a boundary of the viewfinder (e.g., each virtual control object in the first set of virtual control objects is selectable). Displaying the first set of virtual objects within the boundaries of the viewfinder allows the user to easily view and access the first set of virtual objects while viewing content that is to be captured as part of the process for capturing media, which provides improved visual feedback and helps the user to properly and quickly build and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some implementations, the first set of virtual objects includes a first closed virtual object (e.g., 1316). In some embodiments, the computer system detects input (e.g., 1350f, 1350c, 1350 d) corresponding to selection of the first closed virtual object (e.g., air gestures (e.g., air swipes, air taps, air pinch and/or air spreads detected by the one or more cameras (e.g., detected at points in space corresponding to display of the first closed virtual object), requests based on gaze, and/or activation of hardware input mechanisms in communication (e.g., wired and/or wireless communication) with the computer system.) in some embodiments, in response to detecting input corresponding to selection of the first closed virtual object, the computer system ceases to display the user interface (e.g., as described above with respect to fig. 13G.) in some embodiments, the user interface corresponds to the first application as part of the ceasing display user interface, and the computer system displays the second user interface corresponding to the second application as part of the ceasing display user interface (e.g., as part of the ceasing display user interface, the computer system displays the main desktop user interface (e.g., comprising a reduced number of user interfaces (e.g., as part of the virtual object) corresponding to various applications installed on the computer system in some embodiments), compared to the number of virtual objects included in the user interface) is displayed overlaid on top of the passthrough representation of the physical environment. Displaying the closed virtual object within the boundaries of the viewfinder allows a user to easily view and access the closed virtual object while viewing content that is to be captured as part of the process for capturing media, which provides improved visual feedback and facilitates the user to properly and quickly construct and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some implementations, the first set of virtual objects includes a first media review virtual object (e.g., 1330) (e.g., the first media review virtual object includes a representation of a media item most recently captured by the one or more cameras in communication with the computer system) (e.g., a photo pool). In some embodiments, the computer system detects input (e.g., 1350F, 1350g, 1350d, 1350 c) corresponding to selection of the first media review virtual object (e.g., one or more representations of air gestures (e.g., air swipes, air taps, air pinch and/or air spreads detected by the one or more cameras (e.g., detected at points in space corresponding to display of the first media review virtual object), based on a request for gaze, and/or activation of a hardware input mechanism in communication (e.g., wired and/or wireless communication) with the computer system.) in some embodiments, in response to detecting input of selection of the first media review virtual object, the computer system displays one or more representations of previously captured media items (e.g., as explained above with respect to fig. 13F) (e.g., previously captured by the one or more cameras in communication with the computer system), the first media reviews the virtual object update to include a representation of the most recently captured media item. Displaying the media review virtual object within the boundaries of the viewfinder allows a user to easily view and access the media review virtual object while viewing content that is to be captured as part of the process for capturing media, which provides improved visual feedback and helps the user to properly and quickly build and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, the first set of virtual objects includes a recording time virtual object (e.g., 1332) that indicates an amount of time that has elapsed since the computer system (e.g., 700) initiated the first video capture operation (e.g., displayed while the computer system is performing the video capture operation). In some embodiments, the computer system stops displaying the recording time virtual object in response to the computer system stopping the video recording operation. Displaying a recording time virtual object indicating the amount of time that has elapsed since the computer system initiated a video capture operation provides visual feedback to the user as to how long the computer system has performed the video capture operation and alerts the user that the computer system is capturing a user-viewable video media item (e.g., when the computer system completes performing the video capture operation) to confirm that the captured video media does not contain content that the user did not intend to capture, which provides improved visual feedback. Providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by informing a user that the user can review and/or edit captured media. Furthermore, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capture operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system.

In some embodiments, displaying the user interface includes displaying a second set of virtual control objects (e.g., 1332, 1330, 1316, and/or 1320) (e.g., one or more capture control virtual objects) outside of the boundary of the viewfinder (e.g., 1318). Displaying the second set of virtual objects outside the boundaries of the viewfinder such that the display of the second set of virtual objects does not interfere with the display of content within the viewfinder allows the user to clearly view and focus on content to be captured via the media capturing process, which helps the user to properly and quickly build and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, the second set of virtual control objects includes a set of camera mode virtual objects that each correspond to a respective mode of operation of the one or more cameras (e.g., the set of camera mode virtual objects includes one or more of a time-lapse virtual object (e.g., the time-lapse virtual object, when selected, causes the one or more cameras of the computer system to be configured to cause the one or more cameras to capture a series of images of a period of time); a slow motion virtual object (e.g., the slow motion virtual object, when selected, causes one or more cameras of a computer system to capture video media having a slow motion effect), a movie virtual object (e.g., the movie virtual object, when selected, causes one or more cameras of a computer system to capture media having a blurred background), a portrait virtual object (e.g., the portrait virtual object, when selected, causes the one or more cameras of a computer system to capture still media having a depth of view effect), and a panoramic virtual object (e.g., the panoramic virtual object, when selected, causes the one or more cameras of a computer system to capture media spanning an increased capture range (e.g., compared to a capture range when the one or more cameras are operating in a normal operating range), wherein the set of camera mode virtual objects includes a first camera mode virtual object and a second camera mode virtual object (e.g., the panoramic virtual object, different from the first camera mode virtual object). In some implementations, the computer system detects input (e.g., 1350g, 1350c, 1350 d) corresponding to selection of a respective camera mode virtual object in the set of camera mode virtual objects (e.g., air gestures (e.g., air swipes, air taps, air pinches, and/or air deployments) detected by the one or more cameras), gaze-based requests, and/or activation of hardware input mechanisms in communication (e.g., wired and/or wireless communication) with the computer system. In some embodiments, in response to detecting an input corresponding to a selection of a respective camera mode virtual object of the set of camera mode virtual objects and in accordance with a determination that the input corresponds to a selection of a first camera mode virtual object, the computer system configures the one or more cameras to operate in a first mode and in accordance with a determination that the input corresponds to a selection of a second camera mode virtual object, the computer system configures the one or more cameras to operate in a second mode (e.g., different from the first mode) (e.g., as discussed above with respect to fig. 13D). Displaying the set of camera mode virtual objects outside the boundaries of the viewfinder such that the display of the set of camera mode virtual objects does not interfere with the display of content within the viewfinder allows the user to clearly view and focus on content to be captured via the media capturing process, which helps the user to properly and quickly build and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some implementations, the second set of virtual control objects includes a second closed virtual object (e.g., 1316). In some embodiments, the computer system detects input (e.g., 1350G, 1350c, 1350 d) corresponding to a selection of a second closed virtual object (e.g., air gestures (e.g., air swipes, air taps, air pinch and/or air spreads detected by the one or more cameras (e.g., detected at points in space corresponding to a display of a first closed virtual object)), requests based on gaze and/or activation of hardware input mechanisms in communication (e.g., wired and/or wireless communication) with the computer system in some embodiments, in response to detecting input corresponding to a selection of a second virtual object, the computer system ceases to display a user interface (e.g., 1304) (e.g., as discussed above with respect to fig. 13G.) in some embodiments, a main desktop user interface (e.g., a user interface comprising a plurality of selectable virtual objects corresponding to various applications installed on the computer system) is displayed upon detection of a second closed virtual object, as part of a cease to display user interface, as part of a stop to correspond to a first user interface, and as part of a stop to display of an application in some embodiments, the computer system corresponding to a first user interface as part of a user interface is displayed on a main desktop user interface (e.g., as part of a user interface), the computer system displays a reduced number of virtual objects (e.g., as compared to the number of virtual objects included in the user interface) overlaid on top of the transparent representation of the physical environment. Displaying the closed virtual object outside the boundaries of the viewfinder such that the display of the closed virtual object does not obstruct the display of content within the viewfinder allows the user to clearly view and focus on content to be captured via the media capturing process, which helps the user to properly and quickly build and capture content of interest, which is particularly relevant to transient events and improves the functionality of the computer system.

In some embodiments, after capturing the first media item, the computer system displays (e.g., automatically (e.g., without user input intervention) the representation of the first media item (e.g., the media item captured in fig. 13E 1-13E 5 or fig. 13I) fades in (e.g., the representation of the first media item is displayed gradually (e.g., within a predetermined amount of time) into the display of the user interface (e.g., 1304) (e.g., as discussed above with respect to fig. 13F). In some embodiments, the first portion of the representation of the first media item fades in to the display of the user interface before the second portion of the representation of the first media item; in some embodiments, the amount of time spent in the display of the representation of the first media item faded in to the display of the user interface is directly related to the size of the first media item (e.g., the greater the size of the first media item, the longer spent in the user interface) the display of the representation of the first media item is displayed as a visual feedback to the computer system has completed (e.g., the status of the user of the captured media item), this provides improved visual feedback to the user. Providing improved visual feedback that media capturing operations are performed enhances the privacy and security of a computer system by informing a user that captured media can be reviewed and/or edited by the user to ensure that the captured media includes content that the user intended to capture and not content that the user did not intend to capture.

In some implementations, displaying the representation of the first media item (e.g., the media item captured in fig. 13E 1-13E 5 or 13I) fades into the display of the user interface includes displaying the representation of the first media item transitioning from a first size (e.g., the representation of the first media item initially displayed in the first size) to a second size (e.g., the representation of the first media item gradually decreasing in size) (e.g., the computer system displaying the representation of the first media item transitioning from the first size to the second size over a period of time), and displaying the representation of the first media item moving from a first location in the user interface (e.g., a central location within the user interface) to a second location in the user interface (e.g., different from the first location) (e.g., the second location at the corner of the viewfinder), wherein the second location corresponds to the corner (e.g., the location of 1330) of the viewfinder (e.g., 1318) (e.g., as described above with respect to fig. 13F). In some implementations, the computer system displays the representation of the first media item as transitioning from the first size to the second size as the computer system displays the representation of the first media item as moving from the first position to the second position. In some implementations, the computer system displays the representation of the first media item as transitioning from the first size to the second size before/after the computer system displays the representation of the first media item as moving from the first position to the second position. In some implementations, when the computer system displays the representation of the first media item as faded into the display of the user interface, the computer displays the representation of the first media item as transitioning from the first size to the second size and moving from the first position to the second position. In some implementations, the computer displays the representation of the first media item as transitioning from the first size to the second size and moving from the first position to the second position before/after the computer system displays the representation of the first media item as faded into the display of the user interface. Displaying the representation of the first media item as a reduced size after capturing the first media item and moving from a first location on the user interface to a second location on the user interface provides visual feedback to the user regarding the status of the computer system (e.g., the computer system has captured the first media item), which provides improved visual feedback. Providing improved visual feedback that media capturing operations are performed enhances the privacy and security of a computer system by informing a user that captured media can be reviewed and/or edited by the user to ensure that the captured media includes content that the user intended to capture and not content that the user did not intend to capture.

In some embodiments, the viewfinder (e.g., 1318) includes a second reticle virtual object (e.g., 1320) (e.g., a set of contiguous or non-contiguous lines) before the representation of the first media item is displayed, and wherein the display of the representation of the first media item replaces the display of the second reticle virtual object (e.g., as discussed above with respect to fig. 13F). In some embodiments, the second reticle virtual object is redisplayed after the display of the representation is moved from the first position to the second position. In some implementations, the second reticle virtual object is redisplayed in response to ceasing to display the representation of the first media item. Replacing the display of the second reticle virtual object after the first media item is captured provides visual feedback to the user regarding the status of the computer system (e.g., the computer system has captured the first media item), which provides improved visual feedback. Providing improved visual feedback that media capturing operations are performed enhances the privacy and security of a computer system by informing a user that captured media can be reviewed and/or edited by the user to ensure that the captured media includes content that the user intended to capture and not content that the user did not intend to capture.

In some implementations, displaying the user interface (e.g., 1304) includes displaying a second media review virtual object (e.g., 1330) at a third location (e.g., a location of 1330) in the user interface (e.g., the media review virtual object includes a representation of a most recently captured media item (e.g., most recently captured by the one or more cameras of the computer system) that, when selected, causes a user interface (e.g., camera film affordance) for reviewing previously captured media to be displayed) (e.g., the second media review virtual object is displayed at a corner of the viewfinder), wherein the second media review virtual object is displayed at a third size (e.g., a size of 1330), and wherein displaying a representation of the first media item (e.g., a media item captured at fig. 13E 1-13E 5 or fig. 13I) fades in the user interface from a fourth size to a fifth size, wherein the fourth size is greater than the fifth size, and wherein the fourth size is greater than the third size (e.g., the fourth size is greater than the third size, e.g., the fourth size is in a user interface) (e.g., the fourth size is a size of 1330), and moving from the fourth size is depicted in the fourth location (e.g., the fourth size is a user interface) to the fourth size is a smaller than the fourth size (e.g., the fourth size is) or is depicted in the user interface) (e.g., the fourth location is shown in the user interface is at the first location is at the first 5) and is shown in fig. 13I).

In some implementations, the first media item (e.g., the media item captured in fig. 13E 1-13E 5 or 13I) is a still photograph or video (e.g., as described above in fig. 13E 1-13E 5 and 13I).

In some implementations, after changing the appearance of the viewfinder, the computer system detects a second request to capture media (e.g., 1350c, 1350d, 1350h, 1350 i). In some embodiments, in response to detecting a second request to capture media and in accordance with a determination that the second request to capture media corresponds to a request to capture video (e.g., the request is a tap of a hardware input mechanism in communication with the computer system), the computer system displays a first type of feedback (e.g., the appearance of 1318 at fig. 13E 1-13E 5) (e.g., the first type of feedback includes changing the appearance of a first portion of content within a threshold distance of a first side of a boundary of the viewfinder, and displaying the first type of feedback includes changing the appearance of a second portion of content within a threshold distance of the first side of the boundary of the viewfinder), and in accordance with a determination that the second request to capture media corresponds to a request to capture video (e.g., the request is a tap and hold of a hardware input mechanism in communication with the computer system), the computer system displays a second type of feedback (e.g., the display of 1332 within 1318) that is different than the first type of feedback (e.g., as described above with respect to fig. 13I) (e.g., displaying the second type of feedback includes displaying within a video in the viewfinder indicator that a video has been captured for a long time. Displaying the first type of feedback when the computer system captures still photographs and the second type of feedback when the computer system captures video provides visual feedback to the user as to what type of media capturing operation the computer system is performing, which provides improved visual feedback. Providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by informing a user that the user can review and/or edit captured media. In addition, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capturing operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system.

In some embodiments, a first request to capture media (e.g., 1350c, 1350d, 1350h, 1350 i) corresponds to a first type of input, wherein the first media item is a still photograph (e.g., a media item captured in fig. 13E 1-13E 5) (e.g., a photograph), and wherein the first type of input corresponds to a short press (e.g., a press and release) of a first hardware input mechanism (e.g., a hardware input mechanism in communication with a computer system (e.g., wireless communication and/or wired communication)) of the hardware input mechanism (e.g., the hardware input mechanism is pressed less than a threshold amount of time) (e.g., the first media item includes content within a viewfinder when the computer system detects a request to capture media) (e.g., a solid state input mechanism activated in response to the computer system detecting pressure (e.g., directly (e.g., on the solid state input mechanism) or indirectly (e.g., not on the solid state input mechanism)), in some embodiments, when the hardware input mechanism is a solid state input mechanism is activated in response to a tactile input mechanism, and the first type of media is allowed to automatically be captured by the user when the capture media input system is enabled to satisfy a set of desired media input conditions, this performs the operation without further user input when a set of conditions is met. Capturing still photographs in response to activating a hardware input mechanism allows a user to control a media capturing operation of a computer system without displaying additional controls, which provides additional control options without cluttering a user interface.

In some implementations, after changing the appearance of the viewfinder (e.g., 1318 at fig. 13F), the computer system detects a third request to capture media (e.g., 1350c, 1350d, 1350h, 1350 i). In some embodiments, in response to detecting a third request to capture media and in accordance with a determination that the third request to capture media corresponds to a second type of input (e.g., the second type of input is different from the first type of input), the computer system captures the video media item (e.g., the video media item captured at fig. 13I), wherein the second type of input corresponds to a long press (e.g., 1350 h) (e.g., press and hold) (e.g., as described above with respect to fig. 13I) (e.g., the hardware input mechanism communicates with the computer system (e.g., wireless communication and/or wired communication)) to the computer system (e.g., the first media item includes content included within the viewfinder at any point in time when the computer system is performing a video capture operation) (e.g., the solid state input mechanism activated in response to the computer system detecting pressure (e.g., directly (e.g., on the solid state input mechanism) or indirectly (e.g., not on the solid state input mechanism)). In some embodiments, when the hardware input mechanism is a solid state input mechanism, the hardware input mechanism is automatically enabled in response to the capture media item being allowed to a set of capture media item being automatically performed in response to a desired media input being allowed to a set of capture condition, this performs the operation without further user input when a set of conditions is met. Capturing video in response to activating the hardware input mechanism allows a user to control media capturing operations of the computer system without displaying additional controls, which provides additional control options without cluttering the user interface.

In some implementations, the third request to capture media corresponds to a second type of input (e.g., 1350c, 1350d, 1350h, 1350 k). In some embodiments, in response to detecting a third request to capture media, the computer system displays an indication (e.g., 1332) of a second video capture operation (e.g., as discussed above with respect to fig. 13I) (e.g., the indication is displayed within the viewfinder) (e.g., the indication is displayed as overlaid on top of a representation of the physical environment), wherein displaying the indication includes stopping the display of the indication in response to determining that a set of criteria is not met (e.g., the computer system has detected a second type of input for less than a predetermined amount of time (e.g., 0.1 seconds, 0.3 seconds, 0.5 seconds, 0.7 seconds, 1 seconds, 1.5 seconds, or 2 seconds)), displaying an indication having a first appearance (e.g., the appearance of 1332 at fig. 13I) in response to determining that the set of criteria is met (e.g., the system has detected a second type of input longer than a predetermined amount of time) (e.g., different from the first appearance) (e.g., stopping the display of the indication when the second type of input is detected as compared to the first appearance), stopping the display of the indication in response to detecting the second type of input (e.g., stopping the display of the indication in response to detecting the second type of input) and stopping the computer system in response to determining that the set of input (e.g., the second appearance) has met the set of criteria, as described above with respect to fig. 13I), and in accordance with a determination that the set of criteria is met before ceasing to detect the second type of input (e.g., and the indication is displayed in a second appearance), the computer system continues to perform a second video capture operation (e.g., as described above with respect to fig. 13I) (e.g., the computer system continues to capture video when the computer system does not detect the second input). In some embodiments, the computer system performs a photo capture operation (e.g., the computer system captures a photo) while the computer system stops detecting the second type of input while the indication is displayed in the first appearance. In some implementations, the indication includes instructions on how to capture the video media when the indication is displayed in the first appearance. Stopping performing the second video capture operation in response to stopping detecting the second type of input corresponding to the selection of the hardware input mechanism allows the user to control the media capture operation of the computer system without displaying additional controls, which provides additional control options without cluttering the user interface. Displaying an indication with a particular condition when the prescribed condition is met automatically allows the computer system to perform a display operation that indicates to the user whether the computer system will continue with the second video capture operation if the user stops pressing the hardware input mechanism.

In some embodiments, the indication (e.g., 1332) indicates an amount of time that has elapsed since the computer system (e.g., 700) has initiated the second video capture operation (e.g., the indication is a timer), wherein displaying the indication includes displaying the indication in the first appearance (e.g., the set of criteria is not met when the indication is initially displayed). In some embodiments, the computer system detects that the set of criteria is met when the indication is displayed in the first appearance (e.g., the computer system has detected that the second type of input is longer than a predetermined amount of time). In some embodiments, in response to detecting that the set of criteria is met, the computer system changes the appearance of the indication from a first appearance to a second appearance (e.g., the appearance of 1332 at fig. 13I) (e.g., as discussed above with respect to fig. 13I) (e.g., when the computer system performs a second video capture operation), the computer system changes the appearance of the indication. Changing the appearance of the indication from the first appearance to the second appearance when the set of criteria is met provides visual feedback to the user that the set of criteria is met, and if the computer system stops detecting the second type of input, the computer system will continue to perform the second video capture operation, which provides improved visual feedback. Providing improved visual feedback during media capturing operations enhances the privacy and security of a computer system by informing a user that the user can review and/or edit captured media. Furthermore, providing improved visual feedback during media capturing operations allows a user to capture content that the user intended to capture with a smaller number of capture operations that would otherwise use additional power of the computer system, thus saving battery life of the computer system.

In some embodiments, when a computer system (e.g., 700) is capturing a video media item (e.g., when the computer system is performing a video capture operation), the computer system detects an input (e.g., 1350c, 1350d, 1350h, 1350 k) corresponding to an activation of a third hardware input mechanism (e.g., 711 a) (e.g., a user performs a short press (e.g., presses and releases) and/or a long press (presses and holds) on the third hardware input mechanism (e.g., a solid state input mechanism activated in response to the computer system detecting pressure (e.g., directly (e.g., on a solid state input mechanism) or indirectly (e.g., not on a solid state input mechanism)) in some embodiments, in response to detecting an input corresponding to an activation of the third hardware input mechanism, the computer system ceases capturing of the video media item (e.g., as explained above with respect to fig. 13J.) in some embodiments, the input that ceases capturing of the first media item is of the same type that initiates capturing of the first media item (e.g., presses and releases) and/or the same type of input that the first media item is not in response to the first hardware input being generated in some embodiments, when the capture of the video media item is completed, the captured video media item is optionally added to a media library (e.g., stored on a computer system and/or stored on a cloud server). Stopping the capture of the first media item in response to detecting an input corresponding to activation of the hardware input mechanism allows a user to control a media capture process of the computer system without displaying additional controls, which provides additional control options without cluttering the user interface.

In some embodiments, a first side of the boundary (e.g., the right boundary of 1318) and a second side of the boundary (e.g., the left boundary of 1318) are on opposite sides of the boundary (e.g., the first side of the boundary is on the left side of the boundary and the second side of the boundary is on the right side of the boundary, or the first side is on the top of the boundary and the second side is on the bottom of the boundary).

In some implementations, the first content portion (e.g., 1306) is at a threshold distance from a first side of the boundary of the viewfinder (e.g., right boundary of 1318) (e.g., the first content portion is positioned at/located at a threshold distance from the first side of the boundary of the viewfinder), and wherein the second content portion (e.g., 1306) is at a threshold distance from a second side of the boundary of the viewfinder (e.g., left boundary of 1318) (e.g., the first content portion is positioned at/located at a threshold distance from the second side of the boundary of the viewfinder). In some embodiments, the first content portion and the second content portion are on opposite sides of the viewfinder.

In some implementations, displaying the user interface (e.g., 1304) includes displaying a third reticle virtual object (e.g., 1320) (e.g., the reticle is displayed at one or more corners of the viewfinder or around the perimeter of the viewfinder) (e.g., the reticle is composed of a series of non-contiguous lines or contiguous lines), wherein the third reticle virtual object indicates a capture area of the one or more cameras (e.g., as discussed above with respect to fig. 13D) (e.g., content within the reticle is visible in the resulting media item when the computer system performs a media capture operation (e.g., captures a video or still photograph). Displaying the reticle virtual object indicating the capture area of the one or more cameras provides a visual aid to the user that helps the user to properly position the one or more cameras of the computer system such that the desired content is within the reticle and the desired content is visible in the resulting media, thereby making the media capture process faster and more efficient.

In some embodiments, aspects/operations of methods 800, 900, 1000, 1200, 1400, and 1500 may be interchanged, substituted, and/or added between the methods. For example, altering the appearance of content (e.g., as discussed in method 1500) as part of performing a media capturing process may optionally be applied to the media capturing operation discussed in method 800. For the sake of brevity, these details are not repeated here.

The foregoing description, for purposes of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention and various described embodiments with various modifications as are suited to the particular use contemplated.

As described above, one aspect of the present technology is to collect and use data available from a variety of sources to improve a user's experience of capturing and viewing media in an XR environment. The present disclosure contemplates that in some instances, such collected data may include personal information data that uniquely identifies or may be used to contact or locate a particular person. Such personal information data may include demographic data, location-based data, telephone numbers, email addresses, tweet IDs, home addresses, data or records related to the user's health or fitness level (e.g., vital sign measurements, medication information, exercise information), date of birth, or any other identification or personal information.

The present disclosure recognizes that the use of such personal information data in the present technology may be used to benefit users. For example, personal information data may be used to improve the user's media capturing process. In addition, the present disclosure contemplates other uses for personal information data that are beneficial to the user. For example, health and fitness data may be used to provide insight into the overall health of a user, or may be used as positive feedback to individuals using technology to pursue health goals.

The present disclosure contemplates that entities responsible for collecting, analyzing, disclosing, transmitting, storing, or otherwise using such personal information data will adhere to established privacy policies and/or privacy practices. In particular, such entities should exercise and adhere to privacy policies and practices that are recognized as meeting or exceeding industry or government requirements for maintaining the privacy and security of personal information data. Such policies should be readily accessible to the user and should be updated as the collection and/or use of the data changes. Personal information from users should be collected for legal and reasonable use by entities and not shared or sold outside of these legal uses. In addition, such collection/sharing should be performed after informed consent is received from the user. In addition, such entities should consider taking any necessary steps to defend and secure access to such personal information data and to ensure that others who have access to personal information data adhere to their privacy policies and procedures. In addition, such entities may subject themselves to third party evaluations to prove compliance with widely accepted privacy policies and practices. In addition, policies and practices should be adjusted to collect and/or access specific types of personal information data and to suit applicable laws and standards including specific considerations of jurisdiction. For example, in the united states, the collection or acquisition of certain health data may be governed by federal and/or state law, such as the health insurance flow and liability act (HIPAA); while health data in other countries may be subject to other regulations and policies and should be processed accordingly. Thus, different privacy practices should be maintained for different personal data types in each country.

In spite of the foregoing, the present disclosure also contemplates embodiments in which a user selectively prevents use or access to personal information data. That is, the present disclosure contemplates that hardware elements and/or software elements may be provided to prevent or block access to such personal information data. For example, in the case of capturing and/or displaying media in various XR environments, the present technology may be configured to allow a user to choose to "opt-in" or "opt-out" to participate in the collection of personal information data at any time during or after registration with a service. As another example, the user may choose not to provide data for the captured and/or displayed media. As another example, the user may choose to limit the length of time that data is maintained, or to completely prohibit development of a profile based on the type of media captured and/or displayed. In addition to providing the "opt-in" and "opt-out" options, the present disclosure also contemplates providing notifications related to accessing or using personal information. For example, the user may be notified that his personal information data will be accessed when the application is downloaded, and then be reminded again just before the personal information data is accessed by the application.

Further, it is an object of the present disclosure that personal information data should be managed and processed to minimize the risk of inadvertent or unauthorized access or use. Once the data is no longer needed, risk can be minimized by limiting the data collection and deleting the data. In addition, and when applicable, included in certain health-related applications, the data de-identification may be used to protect the privacy of the user. De-identification may be facilitated by removing specific identifiers (e.g., date of birth, etc.), controlling the amount or specificity of stored data (e.g., collecting location data at a city level instead of at an address level), controlling how data is stored (e.g., aggregating data among users), and/or other methods, as appropriate.

Thus, while the present disclosure broadly covers the use of personal information data to implement one or more of the various disclosed embodiments, the present disclosure also contemplates that the various embodiments may be implemented without accessing such personal information data. That is, various embodiments of the present technology do not fail to function properly due to the lack of all or a portion of such personal information data. For example, the media item may be displayed by inferring a preference based on non-personal information data or absolute minimum metrics of personal information, such as content requested by a device associated with the user, other non-personal information available to the service, or publicly available information.

Claims

1. A method, comprising:

at a computer system in communication with a display generation component and one or more cameras:

Upon displaying a first user interface overlaid on top of a representation of a physical environment via the display generation component, detecting a request to display a media capture user interface, wherein the representation of the physical environment changes as a portion of the physical environment corresponding to the representation of the physical environment changes and/or a viewpoint of a user changes; and

In response to detecting a request to display the media capture user interface, displaying, via the display generating component, a media capture preview comprising a representation of a portion of a field of view of the one or more cameras, the media capture preview having content that is updated as portions of the physical environment in the portion of the field of view of the one or more cameras change, wherein:

the media capture preview indication is to be a boundary of media captured in response to detecting a media capture input while the media capture user interface is displayed;

The media capture preview is displayed while a first portion of the representation of the physical environment is visible, wherein the first portion of the representation of the physical environment is visible before a request to display the media capture user interface is detected; and

The media capture preview is displayed in place of a second portion of the representation of the physical environment, wherein the first portion of the representation of the physical environment is updated as a portion of the physical environment corresponding to the first portion of the representation of the physical environment changes and/or the viewpoint of the user changes.

2. The method of claim 1, wherein the representation of the physical environment is a transparent representation of a real world environment of the computer system.

3. The method of any of claims 1-2, wherein the representation of the physical environment is visible at a first scale and the representation of the portion of the field of view of the one or more cameras included in the media capture preview is displayed at a second scale, and wherein the first scale is greater than the second scale.

4. The method of any of claims 1-3, wherein the representation of the physical environment includes first content that is not included in the representation of the field of view included in the media capture preview.

5. The method of any of claims 1-4, wherein the representation of the physical environment comprises a portion of the physical environment, and wherein the representation of the field of view of the one or more cameras included in the media capture preview comprises the portion of the physical environment.

6. The method of any of claims 1-5, wherein the computer system is in communication with a physical input mechanism, and wherein the request to display the media capturing user interface comprises an activation of the physical input mechanism.

7. The method of any one of claims 1 to 6, further comprising:

detecting an input corresponding to a request to capture media while the media capture preview is displayed; and

In response to detecting an input corresponding to a request to capture media:

in accordance with a determination that the input corresponding to the request to capture media is of a first type, initiating a process of capturing media content of the first type; and

In accordance with a determination that the input corresponding to the request to capture media is of a second type, a process of capturing media content of the second type is initiated.

8. The method of any of claims 1-7, wherein the representation of the portion of the field of view of the one or more cameras included in the media capture preview has a first set of visual disparity attributes, and wherein the representation of the physical environment has a second set of visual disparity attributes that are different from the first set of visual disparity attributes.

9. The method of any of claims 1-8, wherein the representation of the physical environment is an immersive view, and wherein the representation of the portion of the field of view of the one or more cameras included in the media capture preview is a non-immersive view.

10. The method of any of claims 1-9, wherein, prior to detecting the request to display the media capturing user interface, a third portion of the representation of the physical environment has a first visual appearance including visual characteristics having a first magnitude, the method further comprising:

In response to detecting a request to display the media capturing user interface, the third portion of the representation of the physical environment is changed to have a second visual appearance including a visual characteristic having a second magnitude, wherein the second magnitude of the visual characteristic is different from the first magnitude of the visual characteristic.

11. The method of any of claims 1-10, wherein displaying the media capture preview comprises displaying one or more virtual objects having a spatial relationship to a display of the media capture preview, and wherein the media capture preview and the one or more virtual objects are displayed at a first display location, the method further comprising:

Detecting a pose change of the viewpoint of the user when the media capture preview and the one or more virtual objects are displayed at the first display location; and

In response to detecting a pose change of the viewpoint of the user:

displaying the media capture preview and the one or more virtual objects at a second display location different from the first location; and

The spatial relationship between the display of the media capture preview and the one or more virtual objects is maintained.

12. The method of claim 11, wherein the one or more virtual objects comprise an elapsed time virtual object that provides an indication of an amount of time that has elapsed since a process for capturing media was initiated.

13. The method of any of claims 11 to 12, wherein the one or more virtual objects comprise a shutter button virtual object that, when selected, causes a process for capturing media to be initiated.

14. The method of any of claims 11-13, wherein the one or more virtual objects comprise a camera film virtual object that, when selected, causes a previously captured media item to be displayed.

15. The method of claim 14, further comprising:

Detecting a selection of the camera film virtual object while the media capture preview is displayed; and

In response to detecting selection of the camera film virtual object, displaying, via the display generating component, a representation of the previously captured media item, wherein displaying the representation of the previously captured media includes displaying:

a first dismissal virtual object that, when selected, causes the representation of the previously captured media item to be stopped from being displayed;

a shared virtual object that, when selected, causes a process to be initiated for sharing the representation of the previously captured media item;

A media library virtual object that, when selected, causes a plurality of previously captured media items to be displayed; and/or

A resize virtual object indicating that the representation of the previously captured media item is to be resized based on detecting one or more gestures.

16. The method of any of claims 14 to 15, further comprising:

receiving a set of one or more inputs including an input corresponding to the camera film virtual object;

In response to receiving the set of one or more inputs, displaying a representation of a first previously captured media item of a plurality of previously captured media items at a first location;

Receiving a request to navigate to a different previously captured media item of the plurality of previously captured media items while the representation of the first previously captured media item is displayed at the first location; and

In response to receiving a request to navigate to a different one of the plurality of previously captured media items, replacing a display of the representation of the first previously captured media item at the first location with a display of a representation of a second previously captured media item of the plurality of previously captured media items.

17. The method of any of claims 11 to 16, wherein the one or more virtual objects include a second dismissal virtual object that, when selected, causes the media capture preview to cease to be displayed.

18. The method of any of claims 11 to 17, wherein the one or more virtual objects comprise a relocated virtual object, the method further comprising:

Detecting a set of one or more inputs including an input corresponding to the repositioned virtual object while the media capture preview is displayed at a first location in the media capture user interface; and

The media capture preview is moved from the first position to a second position in response to detecting the set of one or more inputs including an input corresponding to the relocated virtual object.

19. The method of any of claims 1 to 18, wherein, while the media capture preview is displayed, a media library virtual object is simultaneously displayed via the display generation component, the media library virtual object, when selected, causing a plurality of previously captured media items to be displayed.

20. The method of any one of claims 1 to 19, wherein:

the portion of the field of view of the one or more cameras included in the media capture preview has a first range of view angles;

the representation of the physical environment represents a second field of view of the one or more cameras having a second range of visual angles; and

The first angular range is narrower than the second angular range.

21. The method of any of claims 1-20, wherein the representation of the portion of the field of view of the one or more cameras included in the media capture preview includes first content in a field of view of a first camera of the one or more cameras and in a field of view of a second camera of the one or more cameras.

22. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component and one or more cameras, the one or more programs comprising instructions for performing the method of any of claims 1-21.

23. A computer system in communication with a display generation component and one or more cameras, the computer system comprising:

One or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-21.

24. A computer system in communication with a display generation component and one or more cameras, the computer system comprising:

Means for performing the method according to any one of claims 1 to 21.

25. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generating component and one or more cameras, the one or more programs comprising instructions for performing the method of any of claims 1-21.

26. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component and one or more cameras, the one or more programs comprising instructions for:

27. A computer system in communication with a display generation component and one or more cameras, the computer system comprising:

One or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for:

28. A computer system in communication with a display generation component and one or more cameras, the computer system comprising:

Means for detecting a request to display a media capturing user interface when a first user interface overlaid on top of a representation of a physical environment is displayed via the display generating means, wherein the representation of the physical environment changes as a portion of the physical environment corresponding to the representation of the physical environment changes and/or a viewpoint of a user changes; and

Means for displaying, via the display generating component, a media capture preview comprising a representation of a portion of a field of view of the one or more cameras in response to detecting a request to display the media capture user interface, the media capture preview having content that is updated as portions of the physical environment in the portion of the field of view of the one or more cameras change, wherein:

29. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more cameras, the one or more programs comprising instructions for:

30. A method, comprising:

While a viewpoint of a user is in a first pose, displaying, via a display generation component, an augmented reality user interface comprising a preview of a field of view of the one or more cameras, the preview overlaid on a first portion of a three-dimensional environment visible in the viewpoint of the user, wherein the preview comprises a representation of the first portion of the three-dimensional environment and is displayed with a respective spatial configuration relative to the viewpoint of the user;

Detecting a change in a pose of the viewpoint of the user from the first pose to a second pose different from the first pose; and

In response to detecting a change in the pose of the view of the user from the first pose to the second pose, shifting the preview of the view of the one or more cameras away from the respective spatial configuration relative to the view of the user in a direction determined based on the change in the pose of the view of the user from the first pose to the second pose, wherein the shifting of the preview of the view of the one or more cameras occurs at a first speed, wherein the representation of the three-dimensional environment changes at a second speed different from the first speed based on the change in the pose of the view of the user when the preview of the view of the one or more cameras is shifting based on the change in the pose of the view of the user.

31. The method according to claim 30, wherein:

the computer system tracking pose changes of the viewpoint of the user with a first amount of tracking lag; and

The first speed introduces an amount of visual delay in updating the position of the preview of the field of view of the one or more cameras that is greater than an amount of visual delay that would be introduced based on the first amount of detection tracking lag in updating the position of the preview of the field of view of the one or more cameras.

32. The method of any of claims 30 to 31, further comprising:

Detecting that the viewpoint of the user is changing pose less than a first threshold amount while the previews of the fields of view of the one or more cameras are being offset at the first speed; and

In response to detecting that the viewpoint of the user is changing pose less than the threshold amount, the previews of the fields of view of the one or more cameras are offset toward the respective spatial configurations at a third speed greater than the first speed.

33. The method of any of claims 30 to 32, further comprising:

Detecting that the viewpoint of the user is changing pose less than a second threshold amount while the previews of the fields of view of the one or more cameras are being offset at the first speed; and

In response to detecting that the viewpoint of the user is changing the gesture less than the second threshold amount:

stopping offsetting the previews of the fields of view of the one or more cameras away from the respective spatial configurations; and

Displaying the previews of the fields of view of the one or more cameras having the respective spatial configurations relative to the viewpoint of the user.

34. The method of any of claims 30 to 33, wherein a pose change comprises a lateral movement along a plane of the viewpoint of the user.

35. The method of any of claims 30 to 34, wherein a pose change comprises a longitudinal movement along the plane of the viewpoint of the user.

36. The method of any of claims 30 to 35, wherein a pose change comprises a back and forth movement of the plane perpendicular to the viewpoint of the user.

37. The method of any of claims 30-36, wherein the preview of the field of view of the one or more cameras does not cover a second portion of the three-dimensional environment visible from the viewpoint of the user.

38. The method of any of claims 30-37, wherein the representation of the three-dimensional environment included in the media capture preview changes based on pose changes of the viewpoint of the user, the method further comprising:

Upon detecting a change in the pose of the viewpoint of the user from the first pose to the second pose, a first visual stabilization is performed on the representation of the three-dimensional environment included in the preview of the field of view of the one or more cameras.

39. The method of claim 38, wherein performing the first stabilization includes applying a first amount of visual stabilization to the representation of the three-dimensional environment included in the preview of the field of view of the one or more cameras, the method further comprising:

Upon detecting a change in the pose of the viewpoint of the user from the first pose to the second pose, performing a second visual stabilization on a second portion of the representation of the three-dimensional environment, wherein the second visual stabilization applies a second amount of visual stabilization less than the first amount of visual stabilization to the second portion of the representation of the three-dimensional environment.

40. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component and one or more cameras, the one or more programs comprising instructions for performing the method of any of claims 30-39.

41. A computer system in communication with a display generation component and one or more cameras, the computer system comprising:

One or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 30-39.

42. A computer system in communication with a display generation component and one or more cameras, the computer system comprising:

Means for performing the method of any one of claims 30 to 39.

43. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generating component and one or more cameras, the one or more programs comprising instructions for performing the method of any of claims 30-39.

44. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component and one or more cameras, the one or more programs comprising instructions for:

45. A computer system in communication with a display generation component and one or more cameras, the computer system comprising:

One or more processors; and

46. A computer system in communication with a display generation component and one or more cameras, the computer system comprising:

Means for displaying, via a display generating component, an augmented reality user interface when a viewpoint of a user is in a first pose, the augmented reality user interface comprising a preview of a field of view of the one or more cameras, the preview overlaid on a first portion of a three-dimensional environment visible in the viewpoint of the user, wherein the preview comprises a representation of the first portion of the three-dimensional environment and is displayed with a respective spatial configuration relative to the viewpoint of the user;

Means for detecting a change in a pose of the viewpoint of the user from the first pose to a second pose different from the first pose; and

Means for shifting the preview of the field of view of the one or more cameras away from the respective spatial configuration relative to the viewpoint of the user in a direction determined based on the change in the pose of the viewpoint of the user from the first pose to the second pose in response to detecting the change in the pose of the viewpoint of the user from the first pose to the second pose, wherein the shifting of the preview of the field of view of the one or more cameras occurs at a first speed, wherein the representation of the three-dimensional environment changes at a second speed different from the first speed when the preview of the field of view of the one or more cameras is shifting based on the change in the pose of the viewpoint of the user.

47. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more cameras, the one or more programs comprising instructions for:

48. A method, comprising:

at a computer system in communication with a display generation component:

Upon displaying an augmented reality environment user interface, detecting a request to display captured media comprising immersive content that, when viewed from a respective range of one or more viewpoints, provides a first set of visual cues at least partially surrounded by content by the user; and

In response to detecting a request to display the captured media, the captured media is displayed as a three-dimensional representation of the captured media, the three-dimensional representation being displayed at a location in the three-dimensional environment selected by the computer system such that a first viewpoint of the user is outside of a respective range of the one or more viewpoints.

49. The method of claim 48, wherein the location selected by the computer system is a location in a physical environment, and wherein the three-dimensional representation of the captured media is an environment-locked virtual object, the method further comprising:

Detecting that the viewpoint of the user has changed while displaying the three-dimensional representation of the captured media; and

In response to detecting that the viewpoint of the user has changed, a display of the three-dimensional representation of the captured media at a location in the three-dimensional environment selected by the computer system is maintained.

50. The method of any of claims 48 to 49, wherein the first viewpoint of the user corresponds to a first viewpoint location a first distance from a location selected by the computer system, the method further comprising:

Detecting a pose change of the viewpoint of the user to a second viewpoint of the user corresponding to a second viewpoint position when the three-dimensional representation of the captured media is displayed at a position selected by the computer system and when the user is at the first viewpoint position, the second viewpoint position being a second distance from a position selected by the computer system, wherein the second distance is less than the first distance; and

In response to detecting a pose change of the viewpoint of the user to the second viewpoint of the user, displaying the three-dimensional representation with a second set of visual cues that the user is at least partially surrounded by content, wherein the second set of visual cues includes at least a second visual cue that the user is at least partially surrounded by content that is not provided when the three-dimensional representation is displayed when viewed from the first viewpoint of the user.

51. The method of claim 50, further comprising:

Detecting a pose change of the viewpoint of the user to a third viewpoint of the user corresponding to a third viewpoint position when the user is at the second viewpoint position and when the three-dimensional representation of the captured media is displayed; and

In response to detecting a pose change of the viewpoint of the user to the third viewpoint of the user:

in accordance with a determination that a first set of display criteria is met, the three-dimensional representation of the captured media is stopped from being displayed, wherein the first set of display criteria includes a first criterion being met when a distance between the third viewpoint location and a location selected by the computer system is greater than a first threshold distance.

52. The method of claim 50, further comprising:

Detecting a pose change of the viewpoint of the user to a fourth viewpoint of the user corresponding to a fourth viewpoint position when the user is at the second viewpoint position and when the three-dimensional representation of the captured media is displayed; and

In response to detecting a pose change of the viewpoint of the user to the fourth viewpoint of the user:

In accordance with a determination that a second set of display criteria is satisfied, the three-dimensional representation of the captured media is displayed at a respective location selected by the computer system that is farther from the fourth viewpoint location than the location selected by the computer system, wherein the second set of display criteria includes a second criterion that is satisfied when a distance between the fourth viewpoint location and the location selected by the computer system is less than a second threshold distance.

53. The method of any of claims 48 to 52, wherein the first viewpoint of the user corresponds to a fifth viewpoint location a fifth distance from a location selected by the computer system, the method further comprising:

Detecting a pose change of the viewpoint of the user to a sixth viewpoint of the user corresponding to a sixth viewpoint location, the sixth viewpoint location being a sixth distance from a location selected by the computer system, when the three-dimensional representation of the captured media is displayed at the location selected by the computer and the user is at the fifth viewpoint location; and

In response to detecting a pose change of the viewpoint of the user to the sixth viewpoint of the user, the three-dimensional representation with a third set of visual cues at least partially surrounded by content for the user is displayed.

54. The method of any of claims 48-53, wherein the three-dimensional representation of the captured media includes a plurality of virtual objects including a first virtual object and a second virtual object, the method further comprising:

detecting a pose change from the viewpoint of the user to a seventh viewpoint of the user; and

In response to detecting a pose change of the viewpoint of the user to the seventh viewpoint of the user, the first virtual object that moves relative to the second virtual object based on the pose change of the user is displayed via the display generating component.

55. The method of any of claims 48-54, wherein the three-dimensional representation of the captured media is displayed as a first type of projection, the method further comprising:

Detecting a request to display the three-dimensional representation of the captured media as a projection of a second type different from the first type of projection while the three-dimensional representation is displayed as the first projected shape; and

In response to detecting a request to display the three-dimensional representation as a projection of the second type, the three-dimensional representation is displayed as a projection of the second type.

56. The method of claim 55, wherein the first type of projection and the second type of projection are independently selected from the group consisting of:

Spherical stereoscopic projection; and

Flat stereoscopic projection.

57. The method of any of claims 48-56, wherein the three-dimensional representation of the captured media is displayed at a first size, the method further comprising:

Detecting a set of one or more gestures while displaying the three-dimensional representation at the first size; and

In response to detecting the set of one or more gestures, a display of the three-dimensional representation of the captured media is expanded to a second size that is larger than the first size.

58. The method of claim 57, wherein:

prior to detecting the set of one or more gestures, the augmented reality environment user interface includes a first portion of a representation of a physical environment; and

Expanding the display of the three-dimensional representation of the captured media to the second size that is larger than the first size includes displaying the three-dimensional representation of the captured media as the second size in place of the first portion of the representation of the physical environment.

59. The method of any one of claims 48 to 58, further comprising:

Detecting a second set of one or more gestures comprising a movement component while the three-dimensional representation of the captured media is displayed; and

In response to detecting the second set of one or more gestures:

Ceasing to display the three-dimensional representation of the captured media; and

A second three-dimensional representation of a second captured media is displayed at a location selected by the computer system.

60. The method of any one of claims 48 to 59, further comprising:

Receiving a request to play back captured media while the captured media is displayed as a three-dimensional representation of the captured media; and

In response to receiving a request to play back the captured media and in accordance with a determination that the captured media includes audio data, play back the captured media, wherein play back of the captured media item includes outputting spatial audio corresponding to the audio data.

61. The method of any of claims 48 to 60, wherein the computer system communicates with an external device, and wherein the captured media comprises depth data, the method further comprising:

receiving a request to play back the captured media on the external device while the captured media is displayed as a three-dimensional representation of the captured media; and

In response to receiving a request to play back the captured media on the external device and in accordance with a determination that the external device is unable to display the depth data included in the captured media, play back of the captured media is initiated on the external device without a stereoscopic depth effect.

62. The method of claim 61, wherein playing back the captured media on the external device comprises outputting spatial audio corresponding to the captured media.

63. The method of any one of claims 48 to 62, wherein the computer system has a default pupillary distance value setting, the method further comprising:

detecting a request to play back the captured media; and

In response to detecting a request to play back the captured media and in accordance with a determination that the user's eye has a different pupillary distance value than the default pupillary distance value, play back the captured media item is initiated at a first visual offset based on the user's pupillary distance.

64. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component, the one or more programs comprising instructions for performing the method of any of claims 48-63.

65. A computer system in communication with a display generation component, the computer system comprising:

One or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 48-63.

66. A computer system in communication with a display generation component, the computer system comprising:

Means for performing the method of any one of claims 48 to 63.

67. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, the one or more programs comprising instructions for performing the method of any of claims 48-63.

68. A non-transitory computer-readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component, the one or more programs comprising instructions for:

In response to detecting a request to display the captured media, the captured media is displayed as a three-dimensional representation of the captured media, the three-dimensional representation being displayed at a location selected by the computer system such that a first viewpoint of the user is outside of a respective range of the one or more viewpoints.

69. A computer system in communication with a display generation component, the computer system comprising:

One or more processors; and

70. A computer system in communication with a display generation component, the computer system comprising:

Means for detecting, while displaying the augmented reality environment user interface, a request to display captured media comprising immersive content that, when viewed from a respective range of one or more viewpoints, provides a first set of visual cues at least partially surrounded by content by the user; and

Means for displaying the captured media as a three-dimensional representation of the captured media in response to detecting a request to display the captured media, the three-dimensional representation being displayed at a location selected by the computer system such that a first viewpoint of the user is outside of a respective range of the one or more viewpoints.

71. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, the one or more programs comprising instructions for:

72. A method, comprising:

Displaying, via the display generating component, an augmented reality camera user interface, the augmented reality camera user interface comprising:

A representation of the physical environment; and

A recording indicator indicating a recording area within a field of view of the one or more cameras, wherein the recording indicator includes at least a first edge area having a visual parameter that decreases in a visible portion of the recording indicator by a plurality of different values of the visual parameter, wherein the value of the parameter decreases progressively with increasing distance from the first edge area of the recording indicator.

73. The method of claim 72, further comprising:

Detecting an input corresponding to a request to capture media while displaying the augmented reality camera user interface; and

In response to detecting an input corresponding to a request to capture media, media is captured that includes a representation of at least a portion of the physical environment within the recording area.

74. The method of claim 73, wherein the captured media is static media.

75. The method of claim 73, wherein the captured media is animated media.

76. The method of any of claims 73-75, wherein the captured media comprises a representation of the field of view of the one or more cameras that is different from a representation of the physical environment within the recording area.

77. The method of any one of claims 72 to 76, wherein the visual parameter is a hue gradient.

78. The method of claim 77, wherein the recording indicator includes a second edge area farther from a center of the recording area than the first edge area, and wherein the second edge area is larger than the first edge area.

79. The method of any of claims 72-78, wherein the one or more cameras in communication with a computer system have an optimal capture distance for capturing depth data, and wherein the size and/or shape of the recording indicator facilitates positioning the computer system at the optimal capture distance of the one or more cameras relative to one or more objects in the field of view of the one or more cameras.

80. The method of any of claims 72-79, wherein the recording indicator is displayed at a fixed analog depth within the representation of the physical environment.

81. The method of any of claims 72-80, wherein the recording indicator includes one or more corners, the method further comprising:

Upon displaying the recording indicator, and in accordance with a determination that a set of display criteria is met, an auxiliary recording indicator is displayed at the one or more corners of the recording indicator via the display generating component.

82. The method of claim 81, wherein the recording indicator is displayed in a first plane, and wherein the auxiliary recording indicator is displayed in the first plane.

83. The method of any of claims 81-82 wherein displaying the auxiliary recording indicator comprises:

in accordance with a determination that the physical environment has a first amount of brightness, displaying the auxiliary recording indicator in a first visually significant amount relative to the recording indicator; and

In accordance with a determination that the physical environment has a second amount of brightness that is less than the first amount of brightness, the auxiliary recording indicator is displayed with a second amount of visual salience relative to the recording indicator, wherein the second amount of visual salience is greater than the first amount of visual salience.

84. The method of any of claims 81 to 83, wherein the set of display criteria comprises criteria met when conditions of one or more objects are suitable for depth capture.

85. The method of any one of claims 81 to 84, further comprising:

Displaying the recording indicator around the representation of the second object while the auxiliary recording indicator is displayed with a first visual appearance indicating that the current condition is not suitable for depth capture; and

In accordance with a determination that a set of one or more depth capture criteria is met, the visual appearance of the auxiliary recording indicator is changed from the first visual appearance to a second visual appearance indicating that the current condition is suitable for depth capture.

86. The method of any one of claims 72-85, further comprising:

Detecting an input corresponding to a request to display the augmented reality camera user interface prior to displaying the augmented reality camera user interface; and

In response to detecting an input corresponding to a request to display the augmented reality camera user interface, displaying the augmented reality camera user interface, wherein displaying the augmented reality camera user interface includes displaying an animation of the recording indication Fu Danru.

87. The method of any of claims 72-86, wherein the representation of the physical environment includes a third portion surrounding the recording indicator, and wherein the third portion of the representation of the physical environment and the recording area have substantially the same amount of brightness modification due to a displayed user interface element.

88. The method of claim 87, wherein the recording indicates Fu Juyou a third edge region, and wherein the first edge region is darker than the third edge region.

89. The method of claim 87, wherein the recording indicates Fu Juyou a fourth edge region, and wherein the fourth edge region is darker than the first edge region.

90. The method of any one of claims 72-89, further comprising:

A capture virtual object is displayed within the recording indicator that, when selected, causes a process to be initiated for capturing media.

91. The method of any one of claims 72-90, further comprising:

a camera filmstrip virtual object is displayed within the recording indicator that, when selected, causes a process to be initiated for displaying previously captured media.

92. The method of any of claims 72-91, wherein the recording indicator includes one or more rounded corners.

93. The method of any one of claims 72-92, further comprising:

detecting a change in pose of the one or more cameras while the recording indicator is displayed around a fourth portion of the representation of the physical environment;

in response to detecting a change in the pose of the one or more cameras:

the recording indicator is displayed around a fifth portion of the representation of the physical environment and the recording indicator is not displayed around the fourth portion of the representation of the physical environment.

94. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component and one or more cameras, the one or more programs comprising instructions for performing the method of any of claims 72-93.

95. A computer system configured to communicate with a display generation component and one or more cameras, the computer system comprising:

One or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 72-93.

96. A computer system configured to communicate with a display generation component and one or more cameras, comprising:

Means for performing the method of any one of claims 72 to 93.

97. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more cameras, the one or more programs comprising instructions for performing the method of any of claims 72-93.

98. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component and one or more cameras, the one or more programs comprising instructions for:

A representation of the physical environment; and

99. A computer system configured to communicate with a display generation component and one or more cameras, comprising:

One or more processors; and

A representation of the physical environment; and

100. A computer system configured to communicate with a display generation component and one or more cameras, comprising:

means for displaying an augmented reality camera user interface via the display generating component, the augmented reality camera user interface comprising:

A representation of the physical environment; and

101. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more cameras, the one or more programs comprising instructions for:

A representation of the physical environment; and

102. A method, comprising:

at a computer system in communication with a display generation component, one or more input devices, and one or more cameras:

Detecting, via the one or more input devices, a request to display a camera user interface; and

In response to detecting a request to display the camera user interface, displaying the camera user interface, wherein the camera user interface includes a reticle virtual object indicating a capture area of the one or more cameras, wherein displaying the camera user interface comprises:

in accordance with a determination that a set of one or more criteria is met, displaying the camera user interface within the camera user interface with a tutorial, wherein the tutorial provides information about how media was captured with the computer system while the camera user interface was displayed; and

In accordance with a determination that the set of one or more criteria is not met, the camera user interface is displayed without displaying the tutorial.

103. The method of claim 102, wherein the set of one or more criteria includes a criterion that is met when the camera user interface is initially displayed.

104. The method of any of claims 102-103, wherein displaying the course includes displaying instructions for capturing a first media item using the one or more cameras.

105. The method of any of claims 102 to 104, wherein the course includes video.

106. The method of any of claims 102-105, wherein displaying the camera user interface includes displaying a viewfinder virtual object, and wherein displaying the tutorial includes displaying the tutorial overlaying at least a portion of the viewfinder virtual object.

107. The method of any of claims 102-106, wherein the computer system is in communication with a hardware input mechanism that, when activated, causes a media capture process to be initiated, wherein the course includes a representation of the hardware input, and wherein displaying the course includes displaying a representation of an input corresponding to activation of the hardware input mechanism.

108. The method of claim 107, wherein the hardware input mechanism is not visible to a user while the user is operating the computer system.

109. The method of any one of claims 107-108, further comprising:

detecting a first activation of the hardware input mechanism, wherein the first activation of the hardware input mechanism is a first type of input; and

In response to detecting the first activation of the hardware input mechanism, a second media item is captured using the one or more cameras.

110. The method of any one of claims 107-109, further comprising:

Detecting a second activation of the hardware input mechanism, wherein the second activation of the hardware input mechanism corresponds to a second type of input, the second type of input comprising maintaining input for a predetermined period of time; and

In response to detecting the second activation of the hardware input mechanism, a third media item is captured.

111. The method of any one of claims 107-110, further comprising:

detecting a third activation of the hardware input mechanism while the camera user interface is displayed with the tutorial; and

In response to detecting the third activation of the hardware input mechanism, display of the course is stopped.

112. The method of any one of claims 102-111, wherein:

in accordance with a determination that a set of criteria is met, the camera user interface includes a camera shutter virtual object that, when selected, initiates a process for capturing a media item; and

In accordance with a determination that the set of criteria is not met, the camera user interface does not include the camera shutter virtual object for initiating a process of capturing a media item.

113. The method of claim 112, wherein the set of criteria includes criteria that are met when a setting of the computer system is enabled.

114. The method of any of claims 102-113, wherein the camera user interface includes a closed virtual object that, when selected, causes the camera user interface to be stopped from being displayed.

115. The method of any of claims 102-114, wherein the camera user interface is displayed within an augmented reality environment, wherein a first portion of the augmented reality environment is displayed within the reticle virtual object.

116. The method of any one of claims 102 to 115, further comprising:

Detecting a request to capture a fourth media item; and

In response to detecting the request to capture the fourth media item, the fourth media item is captured, wherein the fourth media item is a stereoscopic media item.

117. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs comprising instructions for performing the method of any of claims 102-116.

118. A computer system configured to communicate with a display generation component, one or more input devices, and one or more cameras, the computer system comprising:

One or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 102-116.

119. A computer system configured to communicate with a display generation component, one or more input devices, and one or more cameras, comprising:

means for performing the method of any one of claims 102-116.

120. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs comprising instructions for performing the method of any of claims 102-116.

121. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs comprising instructions for:

122. A computer system configured to communicate with a display generation component, one or more input devices, and one or more cameras, comprising:

One or more processors; and

123. A computer system configured to communicate with a display generation component, one or more input devices, and one or more cameras, comprising:

means for detecting, via the one or more input devices, a request to display a camera user interface; and

Means for displaying the camera user interface in response to detecting a request to display the camera user interface, wherein the camera user interface includes a reticle virtual object indicating a capture region of the one or more cameras, wherein displaying the camera user interface comprises:

124. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs comprising instructions for:

125. A method, comprising:

At a computer system in communication with a display generation component, one or more cameras, and one or more input devices:

Displaying, via the display generating component, a user interface, the user interface comprising:

a representation of a physical environment, wherein a first portion of the representation of the physical environment is within a capture area of the one or more cameras and a second portion of the representation of the physical environment is outside the capture area of the one or more cameras; and

A viewfinder, wherein the viewfinder comprises a boundary;

While displaying the user interface, detecting a first request to capture media via the one or more input devices; and

In response to detecting the first request to capture media:

capturing a first media item using the one or more cameras, the first media item comprising at least the first portion of the representation of the physical environment; and

Changing the appearance of the viewfinder, wherein changing the appearance of the viewfinder comprises:

changing the appearance of a first content portion that is within a threshold distance of a first side of the boundary of the viewfinder; and

Changing the appearance of a second content portion that is within the threshold distance of a second side of the boundary of the viewfinder, the second side of the boundary of the viewfinder being different from the first side of the boundary of the viewfinder.

126. The method of claim 125, wherein the first media item is a stereoscopic media item.

127. The method of any of claims 125-126, wherein prior to detecting the first request to capture media, the viewfinder is displayed in a first appearance, and wherein changing the appearance of the viewfinder includes displaying the viewfinder in a second appearance that is different from the first appearance, the method further comprising:

Changing the appearance of the viewfinder from the second appearance to the first appearance after the viewfinder has been displayed in the second appearance for a period of time.

128. The method of any of claims 125-127, wherein an appearance of the first content portion and an appearance of the second content portion are changed in the same manner.

129. The method of claim 128, wherein changing the appearance of the viewfinder comprises changing the appearance of a third content portion that is within the threshold distance of a third side of the boundary of the viewfinder, and wherein the appearances of the first content portion, the second content portion, and the third content portion are changed in the same manner.

130. The method of any of claims 125-129, wherein the boundary of the viewfinder is a reticle virtual object, and wherein the reticle virtual object is displayed in a first appearance before the first request to capture media is detected, the method further comprising:

In response to detecting the first request to capture media, changing an appearance of the reticle virtual object from the first appearance to a second appearance different from the first appearance.

131. The method of claim 130, wherein displaying the viewfinder comprises displaying one or more elements within the viewfinder, and wherein changing the appearance of the reticle virtual object comprises changing the appearance of the reticle virtual object relative to the one or more elements displayed within the viewfinder.

132. The method of any one of claims 130-131, wherein displaying the user interface includes displaying one or more corners, and wherein changing the appearance of the viewfinder includes:

changing the appearance of the one or more corners in a first manner; and

The appearance of at least the first side of the boundary of the viewfinder is changed in a second manner that is different from the first manner.

133. The method of any of claims 125-132, wherein changing an appearance of the viewfinder comprises changing a first set of one or more optical properties of the first content portion within the viewfinder.

134. The method of claim 133, wherein the first set of one or more optical properties includes a contrast of the content within the viewfinder.

135. The method of any of claims 133-134 in which the first set of one or more optical properties includes brightness of the content within the viewfinder.

136. The method of any of claims 133-135 in which the first set of one or more optical properties includes translucency of the content within the viewfinder.

137. The method of any of claims 133-136 in which the first set of one or more optical properties includes a size of the content within the viewfinder.

138. The method of any of claims 125-137, wherein displaying the user interface includes displaying a first set of virtual control objects within the boundary of the viewfinder.

139. The method of claim 138, wherein the first set of virtual objects includes a first closed virtual object, the method further comprising:

Detecting an input corresponding to a selection of the first closed virtual object; and

In response to detecting an input corresponding to selection of the first closed virtual object, display of the user interface is stopped.

140. The method of any of claims 138-139, wherein the first set of virtual objects includes a first media review virtual object, the method further comprising:

Detecting an input corresponding to a selection of the first media review virtual object; and

In response to detecting an input corresponding to selection of the first media review virtual object, one or more representations of previously captured media items are displayed.

141. The method of any of claims 138-140, wherein the first set of virtual objects includes a recording time virtual object indicating an amount of time that has elapsed since the computer system initiated a first video capture operation.

142. The method of any of claims 125-141, wherein displaying the user interface includes displaying a second set of virtual control objects outside the boundary of the viewfinder.

143. The method of claim 142, wherein the second set of virtual control objects comprises a set of camera mode virtual objects, each camera mode virtual object corresponding to a respective mode of operation of the one or more cameras, wherein the set of camera mode virtual objects comprises a first camera mode virtual object and a second camera mode virtual object, the method further comprising:

detecting an input corresponding to a selection of a respective camera mode virtual object of the set of camera mode virtual objects; and

In response to detecting an input corresponding to a selection of a respective camera mode virtual object of the set of camera mode virtual objects:

in accordance with a determination that the input corresponds to a selection of the first camera mode virtual object, configuring the one or more cameras to operate in a first mode; and

In accordance with a determination that the input corresponds to a selection of the second camera mode virtual object, the one or more cameras are configured to operate in a second mode.

144. The method of any of claims 142-143, wherein the second set of virtual control objects comprises a second closed virtual object, the method further comprising:

Detecting an input corresponding to a selection of the second closed virtual object; and

In response to detecting an input corresponding to selection of the second virtual object, ceasing to display the user interface.

145. The method of any one of claims 125-144, further comprising:

after capturing the first media item, a representation of the first media item is displayed faded into a display of the user interface.

146. The method of claim 145, wherein displaying the representation of the first media item fades into a display of the user interface comprises:

Displaying the representation of the first media item transitions from a first size to a second size, wherein the first size is greater than the second size; and

The representation of the first media item is displayed moving from a first position in the user interface to a second position in the user interface, wherein the second position corresponds to an angle of the viewfinder.

147. The method of any of claims 145-146, wherein the viewfinder includes a second reticle virtual object prior to displaying the representation of the first media item, and wherein the display of the representation of the first media item replaces the display of the second reticle virtual object.

148. The method of any of claims 145-147, wherein displaying the user interface includes displaying a second media review virtual object at a third location in the user interface, wherein the second media review virtual object is displayed at a third size, and wherein displaying the representation of the first media item fades into the user interface includes:

Displaying the representation of the first media item transitions from a fourth size to a fifth size, wherein the fourth size is greater than the fifth size, and wherein the fourth size is greater than the third size; and

The representation of the first media item is displayed as moving from a fourth location in the user interface to the third location in the user interface.

149. The method of any of claims 125-148, wherein the first media item is a still photograph or video.

150. The method of claim 149, further comprising:

detecting a second request to capture media after changing the appearance of the viewfinder; and

In response to detecting the second request to capture media:

In accordance with a determination that the second request to capture media corresponds to a request to capture still photographs, displaying a first type of feedback; and

In accordance with a determination that the second request to capture media corresponds to a request to capture video, a second type of feedback is displayed that is different from the first type of feedback.

151. The method of any one of claims 125-150, wherein:

The first request to capture media corresponds to a first type of input, wherein the first media item is a still photograph, and wherein the first type of input corresponds to a short press on a first hardware input mechanism.

152. The method of any one of claims 125-151, further comprising:

Detecting a third request to capture media after changing the appearance of the viewfinder; and

In response to detecting the third request to capture media and in accordance with a determination that the third request to capture media corresponds to a second type of input, capturing a video media item, wherein the second type of input corresponds to a long press on a second hardware input mechanism.

153. The method of claim 152, wherein the third request to capture media corresponds to the second type of input, the method further comprising:

In response to detecting the third request to capture media, displaying an indication that capture of the video media item corresponds to a second video capture operation, wherein displaying the indication comprises:

Displaying the indication in a first appearance in accordance with a determination that a set of criteria is not met;

In accordance with a determination that the set of criteria is met, displaying the indication in a second appearance;

stopping detecting the second type of input while the indication is displayed; and

Responsive to ceasing to detect the second type of input:

In accordance with a determination that the set of criteria is not met before ceasing to detect the second type of input, ceasing to perform the second video capture operation; and

In accordance with a determination that the set of criteria is met before ceasing to detect the second type of input, continuing to perform the second video capture operation.

154. The method of claim 153, wherein the indication indicates an amount of time that has elapsed since the computer system has initiated the second video capture operation, wherein displaying the indication comprises:

Displaying the indication with the first appearance;

detecting that the set of criteria is met while the indication is displayed in the first appearance; and

In response to detecting that the set of criteria is met, changing an appearance of the indication from the first appearance to the second appearance.

155. The method of any of claims 152-154, further comprising:

Detecting an input corresponding to activation of a third hardware input mechanism while the computer system is capturing the video media item; and

In response to detecting an input corresponding to activation of the third hardware input mechanism, capturing of the video media item is stopped.

156. The method of any of claims 125-155, wherein the first side of the boundary and the second side of the boundary are on opposite sides of the boundary.

157. The method of any of claims 125-156, wherein the first content portion is at the threshold distance from the first side of the boundary of the viewfinder, and wherein the second content portion is at the threshold distance from the second side of the boundary of the viewfinder.

158. The method of any of claims 125-157, wherein displaying the user interface comprises displaying a third reticle virtual object, wherein the third reticle virtual object indicates the capture region of the one or more cameras.

159. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs comprising instructions for performing the method of any of claims 125-158.

160. A computer system configured to communicate with a display generation component, one or more input devices, and one or more cameras, the computer system comprising:

One or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 125-158.

161. A computer system configured to communicate with a display generation component, one or more input devices, and one or more cameras, comprising:

means for performing the method of any one of claims 125-158.

162. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs comprising instructions for performing the method of any of claims 125-158.

163. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs comprising instructions for:

A viewfinder, wherein the viewfinder comprises a boundary;

In response to detecting the first request to capture media:

164. A computer system configured to communicate with a display generation component, one or more input devices, and one or more cameras, comprising:

One or more processors; and

A viewfinder, wherein the viewfinder comprises a boundary;

In response to detecting the first request to capture media:

165. A computer system configured to communicate with a display generation component, one or more input devices, and one or more cameras, comprising:

means for displaying a user interface via the display generating component, the user interface comprising:

A viewfinder, wherein the viewfinder comprises a boundary;

Means for detecting a first request to capture media via the one or more input devices while the user interface is displayed; and

Means for, in response to detecting the first request to capture media:

166. A computer program product comprising one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component, one or more input devices, and one or more cameras, the one or more programs comprising instructions for:

A viewfinder, wherein the viewfinder comprises a boundary;

In response to detecting the first request to capture media: