[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107491295B - Application integration with digital assistant - Google Patents

Application integration with digital assistant Download PDF

Info

Publication number
CN107491295B
CN107491295B CN201710386931.3A CN201710386931A CN107491295B CN 107491295 B CN107491295 B CN 107491295B CN 201710386931 A CN201710386931 A CN 201710386931A CN 107491295 B CN107491295 B CN 107491295B
Authority
CN
China
Prior art keywords
user
intent
application
module
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710386931.3A
Other languages
Chinese (zh)
Other versions
CN107491295A (en
Inventor
R·A·瓦尔克二世
B·J·纽厄多普
R·达萨里
R·D·朱利
T·R·格鲁伯
C·E·拉德鲍格
A·加格
V·科斯拉
J·H·拉塞尔
C·彼得森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apple Inc
Original Assignee
Apple Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from DKPA201670540A external-priority patent/DK201670540A1/en
Application filed by Apple Inc filed Critical Apple Inc
Priority to CN202010814076.3A priority Critical patent/CN111913778B/en
Publication of CN107491295A publication Critical patent/CN107491295A/en
Application granted granted Critical
Publication of CN107491295B publication Critical patent/CN107491295B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/041Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
    • G06F3/0412Digitisers structurally integrated in a display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Stored Programmes (AREA)

Abstract

The invention provides application integration with digital assistant. The present invention provides systems and processes for application integration with a digital assistant. According to one embodiment, a method includes, at an electronic device having one or more processors and memory, receiving natural language user input, identifying, with the one or more processors, an intent object in a set of intent objects and a parameter associated with the intent, wherein the intent object and the parameter are obtained from the natural language user input. The method also includes identifying a software application associated with an intent object of the set of intent objects; and providing the intent object and parameters to the software application.

Description

Application integration with digital assistant
Technical Field
The present disclosure relates generally to interacting with applications, and more particularly to techniques for application integration with digital assistants.
Background
A digital assistant may facilitate a user performing various functions on a user device. For example, the digital assistant may set an alarm clock, provide weather updates, and perform searches both locally and on the internet, while providing a natural language interface for the user. However, existing digital assistants cannot be effectively integrated with applications, such as those stored locally on user devices, particularly third party applications. Thus, existing digital assistants fail to provide a natural language interface for such applications.
Disclosure of Invention
Exemplary methods are disclosed herein. An example method includes, at an electronic device having one or more processors, receiving a natural language user input, and identifying, with the one or more processors, an intent of a set of intents and a parameter associated with the intent, wherein the intent and parameter are obtained from the natural language user input. The method also includes identifying a software application associated with an intent of the set of intents and providing the intent and parameters to the software application.
An exemplary method includes, at one or more electronic devices each having one or more processors, receiving natural language user input; determining an intent of a set of intents and a parameter associated with the intent based on the natural language user input; identifying a software application based on at least one of the intent or parameter; and providing the intent and parameters to the software application.
An exemplary method includes, at one or more electronic devices each having one or more processors, receiving natural language user input; identifying, based on the natural language user input, an intent of a set of intents and a parameter associated with the intent; and determining whether a task corresponding to the intent can be satisfied based on at least one of the intent or the parameter. The method also includes, in accordance with a determination that a task corresponding to the intent can be satisfied, providing the intent and parameters to a software application associated with the intent, and in accordance with a determination that a task corresponding to the intent cannot be satisfied, providing a list of one or more software applications associated with the intent.
An example method includes, at a first electronic device having one or more processors, receiving a natural language user input, wherein the natural language user input indicates an intent of a set of intentions; providing the natural language user input to a second electronic device; and receiving, from the second electronic device, an indication that the software application associated with the intent is not located on the first electronic device. The method also includes, in response to the notification, obtaining a list of applications associated with the intent; displaying, with a touch-sensitive display of the first electronic device, a list of applications associated with the intent in a user interface; receiving a user input indicating a selection of an application in the list of applications; and providing the intent of the set of intents to the application.
Example non-transitory computer readable media are disclosed herein. An exemplary non-transitory computer readable storage medium stores one or more programs. The one or more programs include instructions that, when executed by one or more processors of an electronic device, cause the electronic device to receive natural language user input; identifying, with the one or more processors, an intent of a set of intents and a parameter associated with the intent, wherein the intent and parameter are obtained from the natural language user input; identifying a software application associated with the intent of the set of intents; and providing the intent and parameters to the software application.
An exemplary non-transitory computer readable storage medium stores one or more programs. The one or more programs include instructions that, when executed by one or more processors of one or more electronic devices, cause the one or more electronic devices to receive natural language user input; determining an intent of a set of intents and a parameter associated with the intent based on the natural language user input; identifying a software application based on at least one of the intent or parameter; and providing the intent and parameters to the software application.
An exemplary non-transitory computer readable storage medium stores one or more programs. The one or more programs include instructions that, when executed by one or more processors of one or more electronic devices, cause the one or more electronic devices to receive natural language user input; identifying, based on the natural language user input, an intent of a set of intents and a parameter associated with the intent; determining, based on at least one of the intent or parameters, whether a task corresponding to the intent can be satisfied; in accordance with a determination that a task corresponding to the intent can be satisfied, providing the intent and parameters to a software application associated with the intent; and in accordance with a determination that the task corresponding to the intent cannot be satisfied, providing a list of one or more software applications associated with the intent.
An exemplary non-transitory computer readable storage medium stores one or more programs. The one or more programs include instructions that, when executed by one or more processors of an electronic device, cause the electronic device to receive a natural language user input, wherein the natural language user input indicates an intent of a set of intentions; providing the natural language user input to a second electronic device; receiving, from the second electronic device, an indication that a software application associated with the intent is not located on the first electronic device; obtaining, in response to the notification, a list of applications associated with the intent; displaying, with a touch-sensitive display of the first electronic device, a list of applications associated with the intent in a user interface; receiving a user input indicating a selection of an application in the list of applications; and providing the intent of the set of intents to the application.
Example electronic devices and systems are disclosed herein. An exemplary electronic device includes: one or more processors; a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for receiving natural language user input; identifying, with the one or more processors, an intent of a set of intents and a parameter associated with the intent, wherein the intent and parameter are obtained from the natural language user input; identifying a software application associated with the intent of the set of intents; and providing the intent and parameters to the software application.
An exemplary system includes one or more processors of one or more electronic devices; one or more memories of one or more electronic devices; and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs including instructions for receiving natural language user input; determining an intent of a set of intents and a parameter associated with the intent based on the natural language user input; identifying a software application based on at least one of the intent or parameter; and providing the intent and parameters to the software application.
An exemplary system includes one or more processors of one or more electronic devices; one or more memories of one or more electronic devices; and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs including instructions for receiving natural language user input; identifying, based on the natural language user input, an intent of a set of intents and a parameter associated with the intent; determining, based on at least one of the intent or parameters, whether a task corresponding to the intent can be satisfied; in accordance with a determination that a task corresponding to the intent can be satisfied, providing the intent and parameters to a software application associated with the intent; and in accordance with a determination that the task corresponding to the intent cannot be satisfied, providing a list of one or more software applications associated with the intent.
An exemplary electronic device includes one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for receiving natural language user input, wherein the natural language user input indicates an intent of a set of intentions; providing the natural language user input to a second electronic device; receiving, from the second electronic device, an indication that a software application associated with the intent is not located on the first electronic device; obtaining a list of applications associated with the intent in response to the notification; displaying, with a touch-sensitive display of the first electronic device, a list of applications associated with the intent in a user interface; receiving a user input indicating a selection of an application in the list of applications; and providing the intent of the set of intents to the application.
An exemplary electronic device comprises means for receiving a natural language user input; means for identifying an intent of a set of intents and a parameter associated with the intent, wherein the intent and parameter are obtained from the natural language user input; means for identifying a software application associated with an intent of the set of intentions; and means for providing the intent and parameters to the software application.
An exemplary system comprises means for receiving a natural language user input; means for determining an intent of a set of intents and a parameter associated with the intent based on the natural language user input; means for identifying a software application based on at least one of the intent or parameter; and means for providing the intent and parameters to the software application.
An exemplary system comprises means for receiving a natural language user input; means for identifying an intent of a set of intents and a parameter associated with the intent based on the natural language user input; means for determining whether a task corresponding to the intent can be satisfied based on at least one of the intent or a parameter; means for providing the intent and parameters to a software application associated with the intent in accordance with a determination that a task corresponding to the intent can be satisfied; and means for providing a list of one or more software applications associated with the intent in accordance with the determination that the task corresponding to the intent cannot be satisfied.
An exemplary electronic device comprises means for receiving a natural language user input, wherein the natural language user input indicates an intent of a set of intentions; means for providing the natural language user input to a second electronic device; means for receiving, from the second electronic device, an indication that a software application associated with the intent is not located on the first electronic device; means for obtaining a list of applications associated with the intent in response to the notification; means for displaying, with a touch-sensitive display of the first electronic device, a list of applications associated with the intent in a user interface; means for receiving a user input indicating a selection of an application in the list of applications; and means for providing the intent of the set of intents to the application.
Drawings
For a better understanding of the various described embodiments, reference should be made to the following detailed description taken in conjunction with the following drawings in which like reference numerals indicate corresponding parts throughout the figures.
FIG. 1 is a block diagram illustrating a system and environment for implementing a digital assistant in accordance with various embodiments.
Figure 2A is a block diagram illustrating a portable multifunction device implementing a client-side portion of a digital assistant, according to some embodiments.
Fig. 2B is a block diagram illustrating exemplary components for event processing, in accordance with various embodiments.
Figure 3 illustrates a portable multifunction device implementing a client-side portion of a digital assistant, in accordance with various embodiments.
FIG. 4 is a block diagram of an exemplary multifunction device with a display and a touch-sensitive surface in accordance with various embodiments.
FIG. 5A illustrates an exemplary user interface of an application menu on a portable multifunction device in accordance with various embodiments.
FIG. 5B illustrates an exemplary user interface of a multifunction device with a touch-sensitive surface separate from a display in accordance with various embodiments.
FIG. 6A illustrates a personal electronic device, in accordance with various embodiments.
Fig. 6B is a block diagram illustrating a personal electronic device, in accordance with various embodiments.
Fig. 7A is a block diagram illustrating a digital assistant system or server portion thereof in accordance with various embodiments.
Fig. 7B illustrates functionality of the digital assistant illustrated in fig. 7A in accordance with various embodiments.
FIG. 7C illustrates a portion of an ontology in accordance with various embodiments.
Fig. 8 illustrates a flow diagram of a process for operating a digital assistant, according to some embodiments.
Fig. 9 illustrates a flow diagram of a process for operating a digital assistant, according to some embodiments.
Fig. 10A-10C illustrate exemplary user interfaces of an electronic device, according to some embodiments.
Fig. 10D-10E illustrate exemplary data flows of a digital assistant system according to some embodiments.
Fig. 11-14 illustrate functional block diagrams of electronic devices according to some embodiments.
Detailed Description
In the following description of the present disclosure and embodiments, reference is made to the accompanying drawings, in which are shown by way of illustration specific embodiments that may be practiced. It is to be understood that other embodiments and examples may be practiced and that changes may be made without departing from the scope of the present disclosure.
Although the following description uses the terms "first," "second," etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first input may be named a second input and similarly a second input may be named a first input without departing from the scope of the various described embodiments. Both the first and second inputs may be outputs, and in some cases, may be separate different inputs.
The terminology used in the description of the various described embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Depending on the context, the term "if" may be interpreted to mean "when" ("where" or "upon") or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined." or "if [ a stated condition or event ] is detected" may be interpreted to mean "upon determining.. or" in response to determining. "or" upon detecting [ a stated condition or event ] or "in response to detecting [ a stated condition or event ]" depending on the context.
1. System and environment
Fig. 1 illustrates a block diagram of a system 100, in accordance with various embodiments. In some embodiments, system 100 may implement a digital assistant. The terms "digital assistant," "virtual assistant," "intelligent automated assistant," or "automatic digital assistant" may refer to any information processing system that interprets natural language input in spoken and/or textual form to infer user intent, and performs actions (e.g., tasks) based on the inferred user intent. For example, to make the inferred user intent, the system may perform one or more of the following: identifying a task flow by steps and parameters designed to achieve the inferred user intent, entering into the task flow specific requirements from the inferred user intent; executing a task flow by calling a program, method, service, API, etc.; and generating an output response to the user in audible (e.g., speech) and/or visual form.
In particular, the digital assistant is capable of accepting user requests at least partially in the form of natural language commands, requests, statements, narratives and/or inquiries. Typically, the user request may seek either an informational answer or a task performed by the digital assistant. A satisfactory response to a user request may be to provide the requested informational answer, to perform the requested task, or a combination of both. For example, a user may ask a digital assistant such as "where do i am present? "and the like. Based on the user's current location, the digital assistant may answer "you are near the central park siemens. A "user may also request to perform a task, such as" please invite my friend to join my girlfriend's birthday party on the next week. In response, the digital assistant can acknowledge the request by speaking "good, now" and then send an appropriate calendar invitation on behalf of the user to each of the user's friends listed in the user's electronic address book. During the performance of requested tasks, the digital assistant can sometimes interact with the user over a long period of time in a continuous conversation involving multiple exchanges of information. There are many other ways to interact with a digital assistant to request information or perform various tasks. In addition to providing verbal responses and taking programmed actions, the digital assistant can also provide other visual or audio forms, e.g., as responses to text, alerts, music, video, animation, etc.
As shown in fig. 1, in some embodiments, the digital assistant may be implemented according to a client-server model. The digital assistant may include a client-side portion 102 (hereinafter "DA client 102") executing on a user device 104, and a server-side portion 106 (hereinafter "DA server 106") executing on a server system 108. The DA client 102 may communicate with the DA server 106 over one or more networks 110. The DA client 102 may provide client-side functionality such as user-oriented input and output processing and communicate with the DA server 106. The DA server 106 may provide server-side functionality for any number of DA clients 102, each of the number of DA clients 102 located on a respective user device 104.
In some embodiments, DA server 106 may include a client-facing I/O interface 112, one or more processing modules 114, data and models 116, and an I/O interface 118 to external services. The client-facing I/O interface 112 may facilitate client-facing input and output processing of the DA server 106. The one or more processing modules 114 may utilize the data and module 116 to process the speech input and determine the user's intent based on the natural language input. Also, the one or more processing modules 114 perform task execution based on the inferred user intent. In some embodiments, DA server 106 may communicate with external services 120 over one or more networks 110 to accomplish tasks or gather information. An I/O interface 118 to external services may facilitate such communication.
The user device 104 may be any suitable electronic device. For example, the user device may be a portable multifunction device (e.g., device 200 described below with reference to fig. 2A), a multifunction device (e.g., device 400 described below with reference to fig. 4), or a personal electronic device (e.g., device 600 described below with reference to fig. 6A-6B). PortableThe multifunction device may be, for example, a mobile phone that also contains other functions, such as PDA and/or music player functions. Particular embodiments of portable multifunction devices can include those from Apple Inc
Figure BDA0001306601960000081
iPod
Figure BDA0001306601960000082
And
Figure BDA0001306601960000083
an apparatus. Other embodiments of the portable multifunction device may include, but are not limited to, a laptop or tablet. Also, in some embodiments, the user device 104 may be a non-portable multifunction device. In particular, the user device 104 may be a desktop computer, a game console, a television, or a television set-top box. In some embodiments, the user device 104 may include a touch-sensitive surface (e.g., a touch screen display and/or a touch pad). However, the user device 104 optionally may include one or more other physical user interface devices, such as a physical keyboard, mouse, and/or joystick. Various embodiments of electronic devices, such as multifunction devices, are described in more detail below.
Embodiments of one or more communication networks 110 may include a Local Area Network (LAN) and a Wide Area Network (WAN), such as the internet. The one or more communication networks 110 may be implemented using any known network protocol, including various wired or wireless protocols, such as, for example, Ethernet, Universal Serial Bus (USB), FIREWIRE (FIREWIRE), Global System for Mobile communications (GSM), Enhanced Data GSM Environment (EDGE), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wi-Fi, Voice over Internet protocol (VoIP), Wi-MAX, or any other suitable communication protocol.
The server system 108 may be implemented on one or more stand-alone data processing devices of a computer or a distributed network. In some embodiments, the server system 108 may also employ various virtual devices and/or services of a third party service provider (e.g., a third party cloud service provider) to provide potential computing resources and/or infrastructure resources of the server system 108.
In some embodiments, user device 104 may communicate with DA server 106 via second user device 122. The second user device 122 may be similar or identical to the user device 104. For example, the second user equipment 122 may be similar to the apparatus 200,400, or 600 described below with reference to fig. 2A, 4, and 6A-6B. The user device 104 may be configured to communicatively couple to the second user device 122 via a direct communication connection, such as bluetooth, NFC, BTLE, etc., or via a wired or wireless network, such as a local Wi-Fi network. In some embodiments, second user device 122 may be configured to act as a proxy between user device 104 and DA server 106. For example, DA client 102 of user device 104 may be configured to transmit information (e.g., a user request received at user device 104) to DA server 106 via second user device 122. DA server 106 may process the information and return relevant data (e.g., data content in response to the user request) to user device 104 via second user device 122.
In some embodiments, the user device 104 may be configured to transmit an abbreviated request for data to the second user device 122 to reduce the amount of information transmitted from the user device 104. The second user device 122 may be configured to determine that supplemental information added to the abbreviated request generates a complete request to transmit to the DA server 106. The system architecture may advantageously allow a user device 104 (e.g., a watch or similar compact electronic device) with limited communication capabilities and/or limited battery power to access services provided by DA server 106 by using a second user device 122 (such as a mobile phone, laptop, tablet, etc.) with stronger communication capabilities and/or battery power as a proxy for DA server 106. Although only two user devices 104 and 122 are shown in fig. 1, it should be understood that system 100 may include any number and type of user devices configured in the proxy configuration to communicate with DA server system 106.
Although the digital assistant shown in fig. 1 may include both a client-side portion (e.g., DA client 102) and a server-side portion (e.g., DA server 106), in some embodiments, the functionality of the digital assistant may be implemented as a standalone application installed on a user device. Moreover, the division of functionality between the client portion and the server portion of the digital assistant may vary in different implementations. For example, in some embodiments, the DA client may be a thin client that provides only user-oriented input and output processing functions, and delegates all other functions of the digital assistant to a backend server.
2. Electronic device
Attention is now directed to embodiments of an electronic device for implementing a client-side portion of a digital assistant. FIG. 2A is a block diagram illustrating a portable multifunction device 200 with a touch-sensitive display system 212 in accordance with some embodiments. The touch sensitive display 212 is sometimes referred to as a "touch screen" for convenience, and may sometimes be referred to or called a "touch sensitive display system". Device 200 includes memory 202 (which optionally includes one or more computer-readable storage media), a memory controller 222, one or more processing units (CPUs) 220, a peripheral interface 218, RF circuitry 208, audio circuitry 210, a speaker 211, a microphone 213, an input/output (I/O) subsystem 206, other input control devices 216, and an external port 224. The device 200 optionally includes one or more optical sensors 264. Device 200 optionally includes one or more contact intensity sensors 265 for detecting the intensity of contacts on device 200 (e.g., a touch-sensitive surface, such as touch-sensitive display system 212 of device 200). Device 200 optionally includes one or more tactile output generators 267 for generating tactile outputs on device 200 (e.g., generating tactile outputs on a touch-sensitive surface such as touch-sensitive display system 212 of device 200 or touch panel 455 of device 400). These components optionally communicate over one or more communication buses or signal lines 203.
As used in this specification and claims, the term "intensity" of a contact on a touch-sensitive surface refers to the force or pressure (force per unit area) of a contact (e.g., a finger contact) on the touch-sensitive surface, or to a substitute (surrogate) for the force or pressure of a contact on the touch-sensitive surface. The intensity of the contact has a range of values that includes at least four different values and more typically includes hundreds of different values (e.g., at least 256). The intensity of the contact is optionally determined (or measured) using various methods and various sensors or combinations of sensors. For example, one or more force sensors below or adjacent to the touch-sensitive surface are optionally used to measure forces at different points on the touch-sensitive surface. In some implementations, force measurements from multiple force sensors are combined (e.g., a weighted average) to determine an estimated contact force. Similarly, the pressure sensitive tip of the stylus is optionally used to determine the pressure of the stylus on the touch-sensitive surface. Alternatively, the size of the contact area detected on the touch-sensitive surface and/or changes thereof, the capacitance of the touch-sensitive surface proximate to the contact and/or changes thereof, and/or the resistance of the touch-sensitive surface proximate to the contact and/or changes thereof, are optionally used as a substitute for the force or pressure of the contact on the touch-sensitive surface. In some implementations, the surrogate measurement of contact force or pressure is used directly to determine whether an intensity threshold has been exceeded (e.g., the intensity threshold is described in units corresponding to the surrogate measurement). In some implementations, the surrogate measurement of contact force or pressure is converted into an estimated force or pressure, and the estimated force or pressure is used to determine whether an intensity threshold has been exceeded (e.g., the intensity threshold is a pressure threshold measured in units of pressure). Using the intensity of the contact as an attribute of the user input allows the user to access additional device functionality that the user may not have access to on a smaller sized device with limited real estate for displaying affordances (e.g., on a touch-sensitive display) and/or receiving user input (e.g., via a touch-sensitive display, a touch-sensitive surface, or physical/mechanical controls, such as knobs or buttons).
As used in this specification and claims, the term "haptic output" refers to a physical displacement of a device relative to a previous position of the device, a physical displacement of a component of the device (e.g., a touch-sensitive surface) relative to another component of the device (e.g., a housing), or a displacement of a component relative to a center of mass of the device that is to be detected by a user with the user's sense of touch. For example, where the device or component of the device is in contact with a surface of the user that is sensitive to touch (e.g., a finger, palm, or other portion of the user's hand), the haptic output generated by the physical displacement will be interpreted by the user as a haptic sensation corresponding to a perceived change in a physical characteristic of the device or component of the device. For example, movement of the touch-sensitive surface (e.g., a touch-sensitive display or trackpad) is optionally interpreted by the user as a "down click" or "up click" of a physical actuation button. In some cases, the user will feel a tactile sensation, such as a "press click" or "release click," even when the physical actuation button associated with the touch-sensitive surface that is physically pressed (e.g., displaced) by the user's movement is not moving. As another example, movement of the touch sensitive surface may optionally be interpreted or sensed by the user as "roughness" of the touch sensitive surface even when there is no change in the smoothness of the touch sensitive surface. While such interpretation of touch by a user will be limited by the user's individualized sensory perception, many sensory perceptions of the presence of touch are common to most users. Thus, when a haptic output is described as corresponding to a particular sensory perception of a user (e.g., "click down," "click up," "roughness"), unless otherwise stated, the generated haptic output corresponds to a physical displacement of the device or a component thereof that would generate a sensory perception of a typical (or ordinary) user.
It should be understood that device 200 is only one embodiment of a portable multifunction device, and that device 200 optionally has more or fewer components than shown, optionally combines two or more components, or optionally has a different configuration or arrangement of these components. The various components shown in fig. 2A are implemented in hardware, software, or a combination of both hardware and software, including one or more signal processing circuits and/or application specific integrated circuits.
Memory 202 may include one or more computer-readable storage media. The computer-readable storage medium may be tangible and non-transitory. The memory 202 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state memory devices. Memory controller 222 may control other components of device 200 to access memory 202.
In some embodiments, a non-transitory computer-readable storage medium of memory 202 may be used to store instructions (e.g., for performing aspects of process 1100 described below) for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In other embodiments, the instructions (e.g., for performing aspects of the process 1100 described below) may be stored on a non-transitory computer-readable storage medium (not shown) of the server system 108 or may be divided between the non-transitory computer-readable storage medium of the memory 202 and the non-transitory computer-readable storage medium of the server system 108. In the context of this document, a "non-transitory computer-readable storage medium" can be any medium that can contain or store the program for use by or in connection with the instruction execution system, apparatus, or device.
Peripheral interface 218 may be used to couple the input and output peripherals of the device to CPU220 and memory 202. The one or more processors 220 execute or execute various software programs and/or sets of instructions stored in memory 202 to perform various functions of device 200 and to process data. In some embodiments, peripherals interface 218, CPU220, and memory controller 222 may be implemented on a single chip, such as chip 204. In some other embodiments, they may be implemented on separate chips.
RF (radio frequency) circuitry 208 receives and transmits RF signals, also known as electromagnetic signals. The RF circuitry 208 converts electrical signals to/from electromagnetic signals and communicates with communication networks and other communication devices via electromagnetic signals. RF circuitry 208 optionally includes well-known circuitry for performing these functions, including but not limited to an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a codec chipset, a Subscriber Identity Module (SIM) card, memory, and so forth. RF circuitry 208 optionally communicates with networks, such as the internet, also known as the World Wide Web (WWW), intranets, and/or wireless networks, such as cellular telephone networks, wireless Local Area Networks (LANs), and/or Metropolitan Area Networks (MANs), as well as other devices via wireless communications. The RF circuitry 208 optionally includes well-known circuitry for detecting Near Field Communication (NFC) fields, such as by short-range communication radios. The wireless communication optionally uses any of a number of communication standards, protocols, and techniques, including but not limited to global system for mobile communications (GSM), Enhanced Data GSM Environment (EDGE), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), evolution, pure data (EV-DO), HSPA +, dual cell HSPA (DC-HSPDA), Long Term Evolution (LTE), Near Field Communication (NFC), wideband code division multiple access (W-CDMA), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), bluetooth low power, wireless fidelity (Wi-Fi) (e.g., IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, IEEE 802.11n, and/or IEEE 802.11ac), voice over internet protocol (VoIP), Wi-MAX, email protocol (e.g., internet Message Access Protocol (IMAP) and/or Post Office Protocol (POP)), instant messaging (e.g., extensible messaging and presence protocol (XMPP), session initiation protocol with extensions for instant messaging and presence (SIMPLE), Instant Messaging and Presence Service (IMPS)), and/or Short Message Service (SMS), or any other suitable communication protocol including communication protocols not yet developed at the filing date of this document.
Audio circuitry 210, speaker 211, and microphone 213 provide an audio interface between a user and device 200. The audio circuit 210 receives audio data from the peripheral interface 218, converts the audio data to an electrical signal, and transmits the electrical signal to the speaker 211. The speaker 211 converts the electrical signals into human-audible sound waves. The audio circuit 210 also receives electrical signals converted from sound waves by the microphone 213. The audio circuit 210 converts the electrical signals to audio data and transmits the audio data to the peripheral interface 218 for processing. The audio data may be retrieved from memory 202 and/or RF circuitry 208 and/or transmitted to memory 102 and/or RF circuitry 108 by peripheral interface 218. In some embodiments, the audio circuit 210 also includes a headset jack (e.g., 312 in fig. 3). The headset jack provides an interface between the audio circuitry 210 and a removable audio input/output peripheral such as an output-only headset or a headset having both an output (e.g., a monaural headset or a binaural headset) and an input (e.g., a microphone).
The I/O subsystem 206 couples input/output peripheral devices on the device 200, such as the touch screen 212 and other input control devices 216, to a peripheral interface 218. The I/O subsystem 206 optionally includes a display controller 256, an optical sensor controller 258, an intensity sensor controller 259, a haptic feedback controller 261, and one or more input controllers 260 for other input or control devices. The one or more input controllers 260 receive/transmit electrical signals from/to other input control devices 216 to/from other input control devices 116. Other input control devices 216 optionally include physical buttons (e.g., push buttons, rocker buttons, etc.), dials, slide switches, joysticks, click wheels, and the like. In some alternative embodiments, the one or more input controllers 260 are optionally coupled to (or not coupled to) any of: a keyboard, an infrared port, a USB port, and a pointing device such as a mouse. The one or more buttons (e.g., 308 in fig. 3) optionally include an up/down button for volume control of the speaker 211 and/or microphone 213. The one or more buttons optionally include a push button (e.g., 306 in fig. 3).
A quick push of the push button unlocks the touch screen 212 or initiates the process of Unlocking the Device using a gesture on the touch screen, as described in U.S. patent application 11/322,549 entitled "Unlocking a Device by Performance on an Unlock Image," filed on 23.12.2005, and U.S. patent application No.7,657,849. The above-mentioned U.S. patent application is hereby incorporated by reference in its entirety. Pressing the push button (e.g., 306) longer may turn the device 200 on or off. The user can customize the functionality of one or more buttons. The touch screen 212 is used to implement virtual or soft buttons and one or more soft keyboards.
The touch sensitive display 212 provides an input interface and an output interface between the device and the user. Display controller 256 receives electrical signals from touch screen 212 and/or transmits electrical signals to touch screen 112. Touch screen 212 displays visual output to a user. The visual output may include graphics, text, icons, video, and any combination thereof (collectively "graphics"). In some implementations, some or all of the visual output may correspond to a user interface object.
Touch screen 212 has a touch-sensitive surface, sensor, or group of sensors that accept input from a user based on tactile sensation and/or tactile contact. Touch screen 212 and display controller 256 (along with any associated modules and/or sets of instructions in memory 202) detect contact (and any movement or breaking of the contact) on touch screen 212 and convert the detected contact into interaction with user interface objects (e.g., one or more soft keys, icons, web pages, or images) displayed on touch screen 212. In an exemplary embodiment, the point of contact between the touch screen 212 and the user corresponds to a finger of the user.
The touch screen 212 may use LCD (liquid crystal display) technology, LPD (light emitting polymer display) technology, or LED (light emitting diode) technology, although other display technologies may be used in other embodiments. The touch screen 212 and display controller 256 may detect contact and any movement or breaking thereof using any of a variety of touch sensing technologies now known or later developed, including but not limited to capacitive technologies, resistive technologies, infrared technologies, and surface acoustic wave technologies, as well as other proximity sensor arrays or other elements for determining one or more points of contact with the touch screen 212. In one exemplary embodiment, projected mutual capacitance sensing technology is used, such as that in Apple Inc. (Cupertino, California)
Figure BDA0001306601960000141
And iPod
Figure BDA0001306601960000142
The technique found.
The touch sensitive display in some embodiments of the touch screen 212 may be similar to the multi-touch sensitive touchpad described in the following U.S. patents: 6,323,846(Westerman et al), 6,570,557 (Westerman et al) and/or 6,677,932 (Westerman); and/or U.S. patent publication 2002/0015024a1, each of which is hereby incorporated by reference in its entirety. However, touch screen 212 displays visual output from device 200, while touch sensitive touchpads do not provide visual output.
Touch sensitive displays in some embodiments of touch screen 212 may be described as in the following patent applications: (1) U.S. patent application 11/381,313, "Multipoint Touch surface controller", filed on 2.5.2006; (2) U.S. patent application 10/840,862, "MultipointTouchscreen", filed on 6.5.2004; (3) U.S. patent application 10/903,964, "Gestures for touch Sensitive Input Devices," filed on 30.7.2004; (4) U.S. patent application 11/048,264, "Gestures For Touch Sensitive Input Devices," filed on 31/1/2005; (5) U.S. patent application 11/038,590, "model-Based Graphical User Interfaces For touch sensitive Input Devices", filed on 18.1.2005; (6) U.S. patent application 11/228,758, "Virtual Input Device On A Touch Screen User Interface," filed On 16.9.2005; (7) U.S. patent application 11/228,700, "Operation Of A Computer With A touchScreen Interface," filed on 16.9.2005; (8) U.S. patent application 11/228,737, "activating Virtual Keys Of A Touch-Screen Virtual Keys", filed on 16.9.2005; and (9) U.S. patent application 11/367,749, "Multi-Functional Hand-Held Device," filed 3.3.2006. All of these patent applications are incorporated herein by reference in their entirety.
The touch screen 212 may have a video resolution in excess of 100 dpi. In some embodiments, the touch screen has a video resolution of about 160 dpi. The user may make contact with touch screen 212 using any suitable object or appendage, such as a stylus, a finger, and so forth. In some embodiments, the user interface is designed to work primarily with finger-based contacts and gestures, which may not be as accurate as stylus-based input due to the larger contact area of the finger on the touch screen. In some embodiments, the device translates the rough finger-based input into a precise pointer/cursor position or command for performing the action desired by the user.
In some embodiments, in addition to a touch screen, device 200 may include a touch pad (not shown) for activating or deactivating particular functions. In some embodiments, the touchpad is a touch-sensitive area of the device that, unlike a touch screen, does not display visual output. The touchpad may be a touch-sensitive surface separate from the touch screen 212 or an extension of the touch-sensitive surface formed by the touch screen.
The device 200 also includes a power system 262 for powering the various components. Power system 262 may include a power management system, one or more power sources (e.g., battery, Alternating Current (AC)), a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator (e.g., a Light Emitting Diode (LED)), and any other components associated with the generation, management, and distribution of power in a portable device.
The device 200 may also include one or more optical sensors 264. Fig. 2A shows an optical sensor coupled to optical sensor controller 258 in I/O subsystem 206. The optical sensor 264 may include a Charge Coupled Device (CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The optical sensor 264 receives light projected through one or more lenses from the environment and converts the light into data representing an image. In conjunction with the imaging module 243 (also referred to as a camera module), the optical sensor 264 may capture still images or video. In some embodiments, the optical sensor is located on the back of device 200 opposite touch screen display 212 on the front of the device so that the touch screen display can be used as a viewfinder for still and/or video image acquisition. In some embodiments, the optical sensor is located in the front of the device so that images of the user may be acquired for the video conference while the user views other video conference participants on the touch screen display. In some implementations, the position of the optical sensor 264 can be changed by the user (e.g., by rotating a lens and sensor in the device housing) so that a single optical sensor 264 can be used with a touch screen display for both video conferencing and still image and/or video image capture.
Device 200 optionally further comprises one or more contact intensity sensors 265. FIG. 2A shows a contact intensity sensor coupled to intensity sensor controller 259 in I/O subsystem 206. Contact intensity sensor 265 optionally includes one or more piezoresistive strain gauges, capacitive force sensors, electrical force sensors, piezoelectric force sensors, optical force sensors, capacitive touch-sensitive surfaces, or other intensity sensors (e.g., sensors for measuring the force (or pressure) of a contact on a touch-sensitive surface). Contact intensity sensor 265 receives contact intensity information (e.g., pressure information or a surrogate for pressure information) from the environment. In some embodiments, at least one contact intensity sensor is juxtaposed or adjacent to the touch-sensitive surface (e.g., touch-sensitive display system 212). In some embodiments, at least one contact intensity sensor is located on the back of device 200 opposite touch screen display 212, which is located on the front of device 200.
The device 200 may also include one or more proximity sensors 266. Fig. 2A shows a proximity sensor 266 coupled to the peripheral interface 218. Alternatively, the proximity sensor 266 may be coupled to the input controller 260 in the I/O subsystem 206. The proximity sensor 266 may be implemented as described in the following U.S. patent applications: 11/241,839 entitled "Proximaty Detector In Handheld Device"; 11/240,788 entitled "Proximaty Detectorin Handheld Device"; 11/620,702 entitled "Using Ambient Light Sensor To augmented reality Generator Output"; 11/586,862 entitled "Automated Response To And sensing of User Activity In Portable Devices"; and 11/638,251 entitled "Methods And systems for Automatic Configuration Of Peripherals," which are hereby incorporated by reference in their entirety. In some embodiments, the proximity sensor turns off and disables the touch screen 212 when the multifunction device is placed near the user's ear (e.g., when the user is making a phone call).
Device 200 optionally further comprises one or more tactile output generators 267. Fig. 2A shows a tactile output generator coupled to a tactile feedback controller 261 in the I/O subsystem 206. Tactile output generator 267 optionally includes one or more electro-acoustic devices such as speakers or other audio components; and/or an electromechanical device such as a motor, solenoid, electroactive aggregator, piezoelectric actuator, electrostatic actuator, or other tactile output generating component for converting energy into linear motion (e.g., a component for converting an electrical signal into a tactile output on the device). Contact intensity sensor 265 receives haptic feedback generation instructions from haptic feedback module 233 and generates haptic output on device 200 that can be felt by a user of device 200. In some embodiments, at least one tactile output generator is juxtaposed or adjacent to a touch-sensitive surface (e.g., touch-sensitive display system 212), and optionally generates tactile output by moving the touch-sensitive surface vertically (e.g., into/out of the surface of device 200) or laterally (e.g., back and forth in the same plane as the surface of device 200). In some embodiments, at least one tactile output generator sensor is located on the back of device 200 opposite touch screen display 212, which is located on the front of device 200.
Device 200 may also include one or more accelerometers 268. Fig. 2A shows accelerometer 268 coupled to peripheral interface 218. Alternatively, accelerometer 268 may be coupled to input controller 260 in I/O subsystem 206. Accelerometer 268 may be implemented as described in the following U.S. patent publications: 20050190059 entitled "Acceleration-Based Detection System For Portable Electronic Devices" And 20060017692, entitled "Methods And apparatus For Operating A Portable Device Based On" the disclosures of both U.S. Patents are hereby incorporated by reference in their entirety. In some embodiments, information is displayed in a portrait view or a landscape view on the touch screen display based on analysis of data received from the one or more accelerometers. Device 200 optionally includes a magnetometer (not shown) and a GPS (or GLONASS or other global navigation system) receiver (not shown) in addition to the one or more accelerometers 268 for obtaining information about the position and orientation (e.g., portrait or landscape) of device 200.
In some embodiments, the software components stored in memory 202 include an operating system 226, a communication module (or set of instructions) 228, a contact/motion module (or set of instructions) 230, a graphics module (or set of instructions) 232, a text input module (or set of instructions) 234, a Global Positioning System (GPS) module (or set of instructions) 235, a digital assistant client module 229, and an application program (or set of instructions) 236. Moreover, memory 202 may store data and models, such as user data and models 231. Further, in some embodiments, memory 202 (fig. 2A) or 470 (fig. 4) stores device/global internal state 257, as shown in fig. 2A, and fig. 4. Device/global internal state 257 includes one or more of: an active application state indicating which applications (if any) are currently active; display state indicating what applications, views, or other information occupy various areas of the touch screen display 212; sensor states including information obtained from the various sensors of the device and the input control device 216; and location information regarding the device's location and/or attitude.
The operating system 226 (e.g., Darwin, RTXC, LINUX, UNIX, OS X, iOS, WINDOWS, or embedded operating systems such as VxWorks) includes various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communication between various hardware and software components.
The communication module 228 facilitates communication with other devices through one or more external ports 224 and also includes functionality for processing by the RF circuitry 208 and/or the external ports 224 various software components of the received data. External port 224 (e.g., Universal Serial Bus (USB), firewire, etc.) is adapted to couple directly to other devices or indirectly through a network (e.g., the internet, wireless LAN, etc.). In some embodiments, the external port is an external port
Figure BDA0001306601960000181
(trademark of Apple inc.) a multi-pin (e.g., 30-pin) connector that is the same as or similar to and/or compatible with the 30-pin connector used on the device.
The contact/motion module 230 optionally detects contact with the touch screen 212 (in conjunction with the display controller 256) and other touch sensitive devices (e.g., a touchpad or a physical click wheel). The contact/motion module 230 includes various software components for performing various operations related to contact detection, such as determining whether contact has occurred (e.g., detecting a finger-down event), determining contact intensity (e.g., force or pressure of contact, or a substitute for force or pressure of contact), determining whether there is movement of contact and tracking movement across the touch-sensitive surface (e.g., detecting one or more finger-dragging events), and determining whether contact has ceased (e.g., detecting a finger-up event or a break in contact). The contact/motion module 230 receives contact data from the touch-sensitive surface. Determining movement of the point of contact optionally includes determining velocity (magnitude), velocity (magnitude and direction), and/or acceleration (change in magnitude and/or direction) of the point of contact, the movement of the point of contact being represented by a series of contact data. These operations are optionally applied to single point contacts (e.g., single finger contacts) or multiple point simultaneous contacts (e.g., "multi-touch"/multiple finger contacts). In some embodiments, the contact/motion module 230 and the display controller 256 detect contact on the touch pad.
In some embodiments, the contact/motion module 230 uses a set of one or more intensity thresholds to determine whether an operation has been performed by the user (e.g., determine whether the user has "clicked" on an icon). In some embodiments, at least a subset of the intensity thresholds are determined as a function of software parameters (e.g., the intensity thresholds are not determined by the activation thresholds of particular physical actuators and may be adjusted without changing the physical hardware of device 200). For example, the mouse "click" threshold of the trackpad or touchscreen can be set to any one of a wide range of predefined thresholds without changing the trackpad or touchscreen display hardware. Additionally, in some implementations, a user of the device is provided with software settings for adjusting one or more intensity thresholds of a set of intensity thresholds (e.g., by adjusting individual intensity thresholds and/or by adjusting multiple intensity thresholds at once with a system-level click on an "intensity" parameter).
The contact/motion module 230 optionally detects gesture input by the user. Different gestures on the touch-sensitive surface have different contact patterns (e.g., different motions, timings, and/or intensities of detected contacts). Thus, the gesture is optionally detected by detecting a specific contact pattern. For example, detecting a finger tap gesture includes detecting a finger-down event, and then detecting a finger-up (lift-off) event at the same location (or substantially the same location) as the finger-down event (e.g., at the location of the icon). As another example, detecting a finger swipe gesture on the touch-sensitive surface includes detecting a finger-down event, then detecting one or more finger-dragging events, and then subsequently detecting a finger-up (lift-off) event.
Graphics module 232 includes various known software components for rendering and displaying graphics on touch screen 212 or other display, including components for changing the visual impact (e.g., brightness, transparency, saturation, contrast, or other visual characteristics) of the displayed graphics. As used herein, the term "graphic" includes any object that may be displayed to a user, including without limitation text, web pages, icons (such as user interface objects including soft keys), digital images, videos, animations and the like.
In some embodiments, graphics module 232 stores data to be used to represent graphics. Each graphic is optionally assigned a corresponding code. The graphic module 232 receives one or more codes for specifying a graphic to be displayed from an application program or the like, and also receives coordinate data and other graphic attribute data together if necessary, and then generates screen image data to output to the display controller 256.
Haptic feedback module 233 includes various software components for generating instructions for use by haptic output generator 267 to produce haptic outputs at one or more locations on device 200 in response to user interaction with device 200.
Text input module 234, which may be a component of graphics module 232, provides a soft keyboard for entering text in a variety of applications, such as contacts 237, email 240, instant message 241, browser 247, and any other application that requires text input.
The GPS module 235 determines the location of the device and provides this information for use in various applications (e.g., to the phone 238 for use in location-based dialing, to the camera 243 as picture/video metadata, and to applications that provide location-based services, such as weather desktop applets, local yellow pages desktop applets, and map/navigation desktop applets).
The digital assistant client module 229 may include various client-side digital assistant instructions that provide client-side functionality of a digital assistant. For example, the digital assistant client module 229 can accept voice input (e.g., voice input), text input, touch input, and/or gesture input through various user interfaces of the portable multifunction device 200 (e.g., the microphone 213, the accelerometer 268, the touch-sensitive display system 212, the optical sensor 229, the other input control device 216, etc.). The digital assistant client module 229 can also provide output in audio (e.g., speech output), visual, and/or tactile forms through various output interfaces of the portable multifunction device 200 (e.g., speaker 211, touch-sensitive display system 212, touch output generator 267, etc.). For example, the output may be provided as voice, sound, alarm, text message, menu, graphics, video, animation, vibration, and/or a combination of two or more of the foregoing. During operation, digital assistant client module 229 may communicate with DA server 106 using RF circuitry 208.
The user data and models 231 can include various data associated with the user (e.g., user-specific vocabulary data, user preference data, user-specified name pronunciations, data from the user's electronic address book, to-do lists, shopping lists, etc.) to provide client-side functionality of the digital assistant. Moreover, the user data and models 231 may include various models (e.g., speech recognition models, statistical language modules, natural language processing models, ontologies, task flow models, service models, etc.) for processing user input and determining user intent.
In some embodiments, the digital assistant client module 229 may utilize various sensors, subsystems, and peripherals of the portable multifunction device 200 to gather additional information from the surroundings of the portable multifunction device 200 to establish a context associated with the user, the current user interaction, and/or the current user input. In some embodiments, the digital assistant client module 229 may provide the contextual information, or a subset thereof, along with the user input to the DA server 106 to help infer the user's intent. In some embodiments, the digital assistant may also use the contextual information to determine how to prepare and communicate the output to the user. Context information may refer to context data.
In some embodiments, the contextual information accompanying the user input may include sensor information, such as lighting, ambient noise, ambient temperature, images or video of the surrounding environment, and the like. In some embodiments, the context information may also include physical states of the device, such as device orientation, device location, device temperature, power level, velocity, acceleration, motion pattern, cellular signal strength, and the like. In some embodiments, information related to the software state of the DA server 106, such as the running process of the portable multifunction device 200, installed programs, past and current network activities, background services, error logs, resource usage, etc., may be provided to the DA server 106 as contextual information associated with user input.
In some embodiments, the digital assistant client module 229 may selectively provide information (e.g., user data 231) stored on the portable multifunction device 200 in response to a request from the DA server 106. In some embodiments, the digital assistant client module 229 may also elicit additional input from the user via a natural language dialog or other user interface upon request by the DA server 106. The digital assistant client module 229 may transmit the additional input to the DA server 106 to assist the DA server 106 in intent inference and/or to satisfy the user intent expressed in the user request.
A more detailed description of the digital assistant is described below with reference to fig. 7A-7C. It should be appreciated that the digital assistant client module 229 may include any number of the sub-modules of the digital assistant module 726 described below.
The application programs 236 may include the following modules (or sets of instructions), or a subset or superset thereof:
a contacts module 237 (sometimes also referred to as a contact list or contact list);
a phone module 238;
a video conferencing module 239;
an email client module 240;
an Instant Messaging (IM) module 241;
fitness support module 242;
a camera module 243 for still and/or video images;
an image management module 244;
a video player module;
a music player module;
a browser module 247;
a calendar module 248;
desktop applet modules 249 that may include one or more of the following: a weather desktop applet 249-1, a stock market desktop applet 249-2, a calculator desktop applet 249-3, an alarm desktop applet 249-4, a dictionary desktop applet 249-5, and other desktop applets acquired by the user and a desktop applet 249-6 created by the user;
a desktop applet creator module 250 for generating a user-created desktop applet 249-6;
a search module 251;
a video and music player module 252 that incorporates a video player module and a music player module;
a notepad module 253;
a map module 254; and/or
Online video module 255.
Examples of other application programs 236 that may be stored in memory 202 include other word processing applications, other image editing applications, drawing applications, rendering applications, JAVA-enabled applications, encryption, digital rights management, voice recognition, and voice replication.
In conjunction with the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, and the text input module 234, the contacts module 237 may be used to manage an address book or contact list (e.g., stored in the application internal state 292 of the contacts module 237 in the memory 202 or 470) including: adding one or more names to an address book; deleting one or more names from the address book; associating one or more telephone numbers, one or more email addresses, one or more physical addresses, or other information with a name; associating the image with a name; classifying and classifying names; providing a telephone number or email address to initiate and/or facilitate communication via telephone 238, video conference module 239, email 240, or IM 241; and so on.
In conjunction with the RF circuitry 208, the audio circuitry 210, the speaker 211, the microphone 213, the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, and the text input module 234, the phone module 238 may be used to enter a sequence of characters corresponding to a phone number, access one or more phone numbers in the contacts module 237, modify the entered phone number, dial a corresponding phone number, conduct a conversation, and disconnect or hang up when the conversation is completed. As described above, wireless communication may use any of a number of communication standards, protocols, and technologies.
In conjunction with the RF circuitry 208, audio circuitry 210, speaker 211, microphone 213, touch screen 212, display controller 256, optical sensor 264, optical sensor controller 258, contact/motion module 230, graphics module 232, text input module 234, contacts module 237, and phone module 238, the video conference module 239 includes executable instructions to initiate, conduct, and terminate video conferences between the user and one or more other participants according to user instructions.
In conjunction with RF circuitry 208, touch screen 212, display controller 256, contact/motion module 230, graphics module 232, and text input module 234, email client module 240 includes executable instructions to create, send, receive, and manage emails in response to user instructions. In conjunction with the image management module 244, the e-mail client module 240 makes it very easy to create and send an e-mail having a still image or a video image photographed by the camera module 243.
In conjunction with the RF circuitry 208, the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, and the text input module 234, the instant message module 241 includes executable instructions for: inputting a sequence of characters corresponding to an instant message, modifying previously input characters, transmitting a corresponding instant message (e.g., using a Short Message Service (SMS) or Multimedia Messaging Service (MMS) protocol for a phone-based instant message or using XMPP, SIMPLE, or IMPS for an internet-based instant message), receiving an instant message, and viewing the received instant message. In some embodiments, the transmitted and/or received instant messages may include graphics, photos, audio files, video files, and/or other attachments supported in MMS and/or Enhanced Messaging Service (EMS). As used herein, "instant message" refers to both telephony-based messages (e.g., messages transmitted using SMS or MMS) and internet-based messages (e.g., messages transmitted using XMPP, SIMPLE, or IMPS).
In conjunction with RF circuitry 208, touch screen 212, display controller 256, contact/motion module 230, graphics module 232, text input module 234, GPS module 235, map module 254, and music player module, fitness support module 242 includes executable instructions for: creating fitness (e.g., with time, distance, and/or calorie burning goals); communicating with fitness sensors (mobile devices); receiving fitness sensor data; calibrating a sensor for monitoring fitness; selecting body-building music and playing; and displaying, storing and transmitting fitness data.
In conjunction with the touch screen 212, the display controller 256, the one or more optical sensors 264, the optical sensor controller 258, the contact/motion module 230, the graphics module 232, and the image management module 244, the camera module 243 includes executable instructions for: capturing still images or video (including video streams) and storing them in the memory 202, modifying features of the still images or video, or deleting the still images or video from the memory 202.
In conjunction with the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, the text input module 234, and the camera module 243, the image management module 244 includes executable instructions for arranging, modifying (e.g., editing), or otherwise manipulating, labeling, deleting, presenting (e.g., in a digital slide or album), and storing still and/or video images.
In conjunction with RF circuitry 208, touch screen 212, display controller 256, contact/motion module 230, graphics module 232, and text input module 234, browser module 247 includes executable instructions for browsing the internet (including searching for, linking to, receiving, and displaying web pages or portions thereof, and attachments and other files linked to web pages) according to user instructions.
In conjunction with the RF circuitry 208, the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, the text input module 234, the email client module 240, and the browser module 247, the calendar module 248 includes executable instructions for creating, displaying, modifying, and storing a calendar and data associated with the calendar (e.g., calendar entries, to-do, etc.) according to user instructions.
In conjunction with the RF circuitry 208, the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, the text input module 234, and the browser module 247, the desktop applet module 249 is a mini-application (e.g., a weather desktop applet 249-1, a stock market desktop applet 249-2, a calculator desktop applet 249-3, an alarm desktop applet 249-4, and a dictionary desktop applet 249-5) or a mini-application created by a user (e.g., a user-created desktop applet 249-6) that may be downloaded and used by the user. In some embodiments, the desktop applet includes an HTML (hypertext markup language) file, a CSS (cascading style sheet) file, and a JavaScript file. In some embodiments, the desktop applet includes an XML (extensible markup language) file and a JavaScript file (e.g., Yahoo! desktop applet).
In conjunction with RF circuitry 208, touch screen 212, display controller 256, contact/motion module 230, graphics module 232, text input module 234, and browser module 247, desktop applet creator module 250 may be used by a user to create a desktop applet (e.g., to transfer a user-specified portion of a web page into the desktop applet).
In conjunction with touch screen 212, display controller 256, contact/motion module 230, graphics module 232, and text input module 234, search module 251 includes executable instructions for searching memory 202 for text, music, sound, images, videos, and/or other files that match one or more search criteria (e.g., one or more user-specified search terms) according to user instructions.
In conjunction with touch screen 212, display controller 256, contact/motion module 230, graphics module 232, audio circuitry 210, speakers 211, RF circuitry 208, and browser module 247, video and music player module 252 includes executable instructions that allow a user to download and playback recorded music and other sound files stored in one or more file formats, such as MP3 or AAC files, as well as executable instructions for displaying, rendering, or otherwise playing back video (e.g., on touch screen 212 or on an external display connected via external port 224). In some embodiments, the device 200 optionally includes the functionality of an MP3 player, such as an iPod (trademark of Apple inc.).
In conjunction with the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, and the text input module 234, the notepad module 253 includes executable instructions to create and manage notepads, backlogs, and the like according to user instructions.
In conjunction with RF circuitry 208, touch screen 212, display controller 256, contact/motion module 230, graphics module 232, text input module 234, GPS module 235, and browser module 247, map module 254 may be used to receive, display, modify, and store maps and data associated with maps (e.g., driving directions, data related to stores and other points of interest at or near a particular location, and other location-based data) according to user instructions.
In conjunction with touch screen 212, display controller 256, contact/motion module 230, graphics module 232, audio circuit 210, speaker 211, RF circuit 208, text input module 234, email client module 240, and browser module 247, online video module 255 includes instructions that allow a user to access, browse, receive (e.g., by streaming and/or downloading), play back (e.g., on the touch screen or on an external display connected via external port 224), send emails with links to particular online videos, and otherwise manage online videos in one or more file formats, such as h.264. In some embodiments, a link to a particular online video is sent using instant messaging module 241 rather than email client module 240. Additional descriptions of Online video applications may be found in U.S. provisional patent application No.60/936,562 entitled "Portable Multifunction Device, Method, and Graphical User Interface for Playing on-line video," filed on day 6/20 2007, and U.S. patent application No.11/968,067 entitled "Portable Multifunction Device, Method, and Graphical User Interface for Playing on-line video," filed on day 12/31 2007, the contents of which are hereby incorporated by reference in their entirety.
Each of the modules and applications described above corresponds to a set of executable instructions for performing one or more of the functions described above as well as the methods described in this patent application (e.g., the computer-implemented methods and other information processing methods described herein). These modules (e.g., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various embodiments. For example, a video player module may be combined with a music player module into a single module (e.g., video and music player module 252 in fig. 2A). In some embodiments, memory 202 may store a subset of the modules and data structures described above. Further, memory 202 may store additional modules and data structures not described above.
In some embodiments, device 200 is a device on which the operation of a predefined set of functions is performed exclusively through a touch screen and/or a touchpad. By using a touch screen and/or touch pad as the primary input control device for operation of device 200, the number of physical input control devices (such as push buttons, dials, and the like) on device 200 may be reduced.
The predefined set of functions performed exclusively by the touchscreen and/or touchpad optionally include navigating between user interfaces. In some embodiments, the touchpad, when touched by a user, navigates device 200 from any user interface displayed on device 200 to a main, home, or root menu. In such embodiments, a touchpad is used to implement a "menu button". In some other embodiments, the menu button is a physical push button or other physical input control device, rather than a touchpad.
Fig. 2B is a block diagram illustrating exemplary components for event processing, according to some embodiments. In some embodiments, memory 202 (FIG. 2A) or memory 470 (FIG. 4) includes event classifier 270 (e.g., in operating system 226) and corresponding application 236-1 (e.g., any of the aforementioned applications 237, 251,255,480, 490).
The event sorter 270 receives the event information and determines the application 236-1 to which the event information is to be delivered and the application view 291 of the application 236-1. The event sorter 270 includes an event monitor 271 and an event dispatcher module 274. In some embodiments, the application 236-1 includes an application internal state 292 that indicates one or more current application views that are displayed on the touch-sensitive display 212 when the application is active or executing. In some embodiments, device/global internal state 257 is used by event classifier 270 to determine which application(s) are currently active, and application internal state 292 is used by event classifier 270 to determine the application view 291 to which to deliver event information.
In some embodiments, the application internal state 292 includes additional information, such as one or more of the following: resume information to be used when the application 236-1 resumes execution, user interface state information indicating information being displayed by the application 236-1 or information that is ready for display by the application 236-1, a state queue for enabling a user to return to a previous state or view of the application 136-1, and a repeat/undo queue of previous actions taken by the user.
The event monitor 271 receives event information from the peripheral interface 218. The event information includes information about a sub-event (e.g., a user touch on the touch-sensitive display 212 as part of a multi-touch gesture). Peripherals interface 218 transmits information it receives from I/O subsystem 206 or sensors (such as proximity sensor 266), one or more accelerometers 268, and/or microphone 213 (through audio circuitry 210). Information received by peripheral interface 218 from I/O subsystem 206 includes information from touch-sensitive display 212 or a touch-sensitive surface.
In some embodiments, event monitor 271 sends requests to peripheral interface 218 at predetermined intervals. In response, peripheral interface 218 transmits event information. In other embodiments, peripheral interface 218 transmits event information only when there is a significant event (e.g., receiving input above a predetermined noise threshold and/or receiving input for more than a predetermined duration).
In some embodiments, event classifier 270 also includes hit view determination module 272 and/or activity event recognizer determination module 273.
When the touch-sensitive display 212 displays more than one view, the hit view determination module 272 provides a software process for determining where within one or more views a sub-event has occurred. The view consists of controls and other elements that the user can see on the display.
Another aspect of the user interface associated with an application is a set of views, sometimes referred to herein as application views or user interface windows, in which information is displayed and touch-based gestures occur. The application view (of the respective application) in which the touch is detected may correspond to a programmatic level within a programmatic or view hierarchy of applications. For example, the lowest level view in which a touch is detected may be referred to as a hit view, and the set of events identified as correct inputs may be determined based at least in part on the hit view of the initial touch that began the touch-based gesture.
Hit view determination module 272 receives information related to sub-events of the touch-based gesture. When the application has multiple views organized in a hierarchy, hit view determination module 272 identifies the hit view as the lowest view in the hierarchy that should handle the sub-event. In most cases, the hit view is the lowest level view in which the initiating sub-event (e.g., the first sub-event in the sequence of sub-events that form an event or potential event) occurs. Once the hit view is identified by hit view determination module 272, the hit view typically receives all sub-events related to the same touch or input source for which it was identified as the hit view.
The activity event identifier determination module 273 determines which view or views within the view hierarchy should receive a particular sequence of sub-events. In some implementations, the activity event recognizer determination module 273 determines that only the hit view should receive a particular sequence of sub-events. In other embodiments, the active event recognizer determination module 273 determines that all views that include the physical location of the sub-event are actively participating views, and thus determines that all actively participating views should receive a particular sequence of sub-events. In other embodiments, even if the touch sub-event is completely confined to the area associated with one particular view, the higher views in the hierarchy will remain as actively participating views.
Event dispatcher module 274 dispatches event information to event recognizers (e.g., event recognizer 280). In embodiments that include the activity event recognizer determination module 273, the event dispatcher module 274 delivers the event information to the event recognizer determined by the activity event recognizer determination module 273. In some embodiments, the event dispatcher module 274 stores event information in an event queue, which is retrieved by the respective event receiver 282.
In some embodiments, the operating system 226 includes an event classifier 270. Alternatively, the application 236-1 includes an event classifier 270. In further embodiments, the event classifier 270 is a separate module or is part of another module stored in the memory 202 (such as the contact/motion module 230).
In some embodiments, the application 236-1 includes a plurality of event handlers 290 and one or more application views 291, where each application view includes instructions for handling touch events that occur within a respective view of the application's user interface. Each application view 291 of the application 236-1 includes one or more event recognizers 280. Typically, the respective application view 291 includes a plurality of event recognizers 280. In other embodiments, one or more of the event recognizers 280 are part of a separate module, such as a user interface toolkit (not shown) or a higher level object from which the application 236-1 inherits methods and other properties. In some embodiments, the respective event handlers 290 include one or more of: data updater 276, object updater 277, GUI updater 278, and/or event data 279 received from event classifier 270. Event handler 290 may utilize or call data updater 276, object updater 277 or GUI updater 278 to update application internal state 292. Alternatively, one or more of the application views 291 include one or more respective event handlers 290. Additionally, in some embodiments, one or more of the data updater 276, the object updater 277, and the GUI updater 278 are included in respective application views 291.
The corresponding event identifier 280 receives event information (e.g., event data 279) from the event classifier 270 and identifies events from the event information. Event recognizer 280 includes an event receiver 282 and an event comparator 284. In some embodiments, event recognizer 280 also includes at least a subset of: metadata 283, and event delivery instructions 288 (which may include sub-event delivery instructions).
Event receiver 282 receives event information from event sorter 270. The event information includes information about a sub-event (e.g., a touch or touch movement). According to the sub-event, the event information further includes additional information, such as the location of the sub-event. When the sub-event relates to motion of a touch, the event information may also include the velocity and direction of the sub-event. In some embodiments, the event comprises rotation of the device from one orientation to another (e.g., from a portrait orientation to a landscape orientation, or vice versa), and the event information comprises corresponding information about the current orientation of the device (also referred to as the device pose).
Event comparator 284 compares the event information to predefined event or sub-event definitions and determines an event or sub-event, or determines or updates the state of an event or sub-event, based on the comparison. In some embodiments, event comparator 284 includes an event definition 286. The event definition 286 contains definitions of events (e.g., predefined sub-event sequences), such as event 1(287-1), event 2(287-2), and other events. In some embodiments, sub-events in event (287) include, for example, touch start, touch end, touch move, touch cancel, and multi-touch. In one embodiment, the definition of event 1(287-1) is a double click on the displayed object. For example, a double tap includes a first touch (touch start) on the displayed object for a predetermined length of time, a first lift-off (touch end) for a predetermined length of time, a second touch (touch start) on the displayed object for a predetermined length of time, and a second lift-off (touch end) for a predetermined length of time. In another example, the definition of event 2(287-2) is a drag on the displayed object. For example, the drag includes a predetermined length of time of touch (or contact) on the displayed object, movement of the touch across the touch sensitive display 212, and lifting of the touch (touch end). In some embodiments, the event also includes information for one or more associated event handlers 290.
In some embodiments, the event definitions 287 include definitions of events for respective user interface objects. In some embodiments, event comparator 284 performs a hit test to determine which user interface object is associated with a sub-event. For example, in an application view that displays three user interface objects on the touch-sensitive display 212, when a touch is detected on the touch-sensitive display 212, the event comparator 284 performs a hit test to determine which of the three user interface objects is associated with the touch (sub-event). If each displayed object is associated with a corresponding event handler 290, the event comparator uses the results of the hit test to determine which event handler 290 should be activated. For example, the event comparator 284 selects the event handler associated with the sub-event and the object that triggered the hit test.
In some embodiments, the definition of the respective event (287) further comprises a delay action that delays the delivery of the event information until it has been determined whether the sequence of sub-events does or does not correspond to the event type of the event recognizer.
When the respective event recognizer 280 determines that the sequence of sub-events does not match any event in the event definition 286, the respective event recognizer 280 enters an event not possible, event failed, or event ended state, after which subsequent sub-events of the touch-based gesture are ignored. In this case, other event recognizers (if any) that remain active for the hit view continue to track and process sub-events of the persistent touch-based gesture.
In some embodiments, the respective event recognizer 280 includes metadata 283 with configurable attributes, tags, and/or lists for indicating how the event delivery system should perform sub-event delivery to actively participating event recognizers. In some embodiments, metadata 283 includes configurable attributes, flags, and/or lists that indicate how event recognizers may interact with each other or be enabled to interact with each other. In some embodiments, metadata 283 includes configurable attributes, tags, and/or lists that indicate whether a sub-event is delivered to a different level in the view or programmatic hierarchy.
In some embodiments, when one or more particular sub-events of an event are identified, the respective event identifier 280 activates the event handler 290 associated with the event. In some embodiments, the respective event identifier 280 delivers event information associated with the event to the event handler 290. Activating the event handler 290 is different from sending (and deferring) sub-events to the corresponding hit view. In some embodiments, event recognizer 280 throws a marker associated with the recognized event, and event handler 290 associated with the marker retrieves the marker and performs a predefined process.
In some embodiments, the event delivery instructions 288 include sub-event delivery instructions that deliver event information about sub-events without activating an event handler. Instead, the sub-event delivery instructions deliver event information to event handlers associated with the sequence of sub-events or to actively participating views. Event handlers associated with the sequence of sub-events or with actively participating views receive the event information and perform a predetermined process.
In some embodiments, the data updater 276 creates and updates data used in the application 236-1. For example, the data updater 276 updates a phone number used in the contacts module 237 or stores a video file used in the video player module. In some embodiments, the object updater 277 creates and updates objects used in the application 236-1. For example, object updater 277 creates a new user interface object or updates the location of a user interface object. The GUI updater 278 updates the GUI. For example, GUI updater 278 prepares display information and sends it to graphics module 232 for display on a touch-sensitive display.
In some embodiments, one or more event handlers 290 include data updater 276, object updater 277, and GUI updater 278 or have access to data updater 276, object updater 277, and GUI updater 278. In some embodiments, the data updater 276, the object updater 277, and the GUI updater 278 are included in a single module of the respective application 236-1 or application view 291. In other embodiments, they are included in two or more software modules.
It should be understood that the above discussion of event processing with respect to user touches on a touch sensitive display also applies to other forms of user input utilizing an input device to operate multifunction device 200, not all of which are initiated on a touch screen. For example, mouse movements and mouse button presses, optionally in combination with single or multiple keyboard presses or holds; contact movements on the touchpad, such as taps, drags, scrolls, and the like; inputting by a stylus; movement of the device; verbal instructions; the detected eye movement; inputting biological characteristics; and/or any combination thereof, optionally as input corresponding to the sub-event for defining the event to be identified.
Fig. 3 illustrates a portable multifunction device 200 with a touch screen 212 in accordance with some embodiments. The touch screen optionally displays one or more graphics within a User Interface (UI) 300. In this embodiment, as well as other embodiments described below, a user can select one or more of these graphics by making gestures on the graphics, for example, with one or more fingers 302 (not drawn to scale in the figures) or with one or more styluses 303 (not drawn to scale in the figures). In some embodiments, selection of one or more graphics will occur when the user breaks contact with the one or more graphics. In some embodiments, the gesture optionally includes one or more taps, one or more swipes (left to right, right to left, up, and/or down), and/or a rolling of a finger (right to left, left to right, up, and/or down) that has made contact with device 200. In some implementations, or in some cases, inadvertent contact with a graphic does not select the graphic. For example, when the gesture corresponding to the selection is a tap, a swipe gesture that swipes over the application icon optionally does not select the respective application.
Device 200 may also include one or more physical buttons, such as a "home" button or menu button 304. As previously described, the menu button 304 may be used to navigate to any application 236 in a set of applications that may be executed on the device 200. Alternatively, in some embodiments, the menu buttons are implemented as soft keys in a GUI displayed on touch screen 212.
In some embodiments, device 200 includes a touch screen 212, menu buttons 304, a push button 306 for powering the device on/off and for locking the device, one or more volume adjustment buttons 308, a Subscriber Identity Module (SIM) card slot 310, a headset jack 312, and a docking/charging external port 224. The push button 306 is optionally used to: powering on/off the device by pressing and maintaining the button in a depressed state for a predetermined time interval; locking the device by pressing the button and releasing the button before a predetermined time interval has elapsed; and/or unlocking the device or initiating an unlocking process. In an alternative embodiment, device 200 also accepts voice input through microphone 213 for activating or deactivating certain functions. Device 200 also optionally includes one or more contact intensity sensors 265 for detecting the intensity of contacts on touch screen 212, and/or one or more tactile output generators 267 for generating tactile outputs for a user of device 200.
Fig. 4 is a block diagram of an exemplary multifunction device with a display and a touch-sensitive surface in accordance with some embodiments. The device 400 need not be portable. In some embodiments, the device 400 is a laptop, desktop, tablet, multimedia player device, navigation device, educational device (such as a child learning toy), gaming system, or control device (e.g., a home controller or industrial controller). Device 400 typically includes one or more processing units (CPUs) 410, one or more network or other communication interfaces 460, memory 470, and one or more communication buses 420 for interconnecting these components. The communication bus 420 optionally includes circuitry (sometimes called a chipset) that interconnects and controls communication between system components. Device 400 includes an input/output (I/O) interface 430 with a display 440, which is typically a touch screen display. The I/O interface 430 also optionally includes a keyboard and/or mouse (or other pointing device) 450 and a touchpad 455, a tactile output generator 457 (e.g., similar to one or more tactile output generators 267 described above with reference to fig. 2A), a sensor 459 (e.g., an optical sensor, an acceleration sensor, a proximity sensor, a touch-sensitive sensor, and/or a contact intensity sensor similar to one or more contact intensity sensors 265 described above with reference to fig. 2A) for generating tactile outputs on the device 400. Memory 470 includes high-speed random access memory such as DRAM, SRAM, DDR RAM or other random access solid state memory devices, and optionally includes non-volatile memory such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices or other non-volatile solid state storage devices. Memory 470 optionally includes one or more storage devices located remotely from CPU 410. In some embodiments, memory 470 stores programs, modules, and data structures similar to or a subset of the programs, modules, and data structures stored in memory 202 of portable multifunction device 200 (fig. 2A). In addition, memory 470 optionally stores additional programs, modules, and data structures not present in memory 202 of portable multifunction device 200. For example, memory 470 of device 400 optionally stores drawing module 480, presentation module 482, word processing module 484, website creation module 486, disk editing module 488, and/or spreadsheet module 490, while memory 202 of portable multifunction device 200 (FIG. 2A) optionally does not store these modules.
Each of the above-described elements in fig. 4 may be stored in one or more of the aforementioned memory devices. Each of the above modules corresponds to a set of instructions for performing a function described above. The modules or programs (e.g., sets of instructions) described above need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various embodiments. In some embodiments, memory 470 may store a subset of the modules and data structures described above. In addition, memory 470 may store additional modules and data structures not described above.
Attention is now directed to embodiments of user interfaces that may be implemented on, for example, portable multifunction device 200.
Fig. 5A illustrates an exemplary user interface of an application menu on a portable multifunction device 200 according to some embodiments. A similar user interface may be implemented on device 400. In some embodiments, the user interface 500 includes the following elements, or a subset or superset thereof:
signal strength indicators 502 for wireless communications (such as cellular signals and Wi-Fi signals);
time 504;
a bluetooth indicator 505;
a battery status indicator 506;
tray 508 with common application icons such as:
an icon 516 of the phone module 238 labeled "phone", the icon 416 optionally including an indicator 514 of the number of missed calls or voice messages;
an icon 518 for the email client module 240, labeled "mail", optionally including an indicator 510 of the number of unread emails;
icon 520 of browser module 247 labeled "browser"; and
an icon 522 labeled "iPod" for the video and music player module 252 (also known as iPod (trademark of Apple inc.) module 252); and
icons for other applications, such as:
icon 524 of IM module 241 labeled "message";
icon 526 of calendar module 248 labeled "calendar";
icon 528 of image management module 244 labeled "photo";
icon 530 for camera module 243 labeled "camera";
icon 532 for online video module 255 labeled "online video";
an icon 534 labeled "stock market" for the O-stock desktop applet 249-2;
icon 536 for the map module 254 labeled "map";
icon 538 for weather desktop applet 249-1 labeled "weather";
icon 540 labeled "clock" for alarm clock desktop applet 249-4;
icon 542 labeled "fitness support" for fitness support module 242;
icon 544 labeled "notepad" for notepad module 253; and
icon 546 labeled "settings" for setting applications or modules, this icon 446 providing access to settings of device 200 and its various applications 236.
It should be noted that the icon labels shown in fig. 5A are merely exemplary. For example, icon 522 of video and music player module 252 may optionally be labeled "music" or "music player". Other tabs are optionally used for various application icons. In some embodiments, the label of the respective application icon includes a name of the application corresponding to the respective application icon. In some embodiments, the label of a particular application icon is different from the name of the application corresponding to the particular application icon.
Fig. 5B illustrates an exemplary user interface on a device (e.g., device 400 of fig. 4) having a touch-sensitive surface 551 (e.g., tablet or touchpad 455 of fig. 4) separate from a display 550 (e.g., touchscreen display 212). The device 400 also optionally includes one or more contact intensity sensors (e.g., one or more of the sensors 457) for detecting the intensity of contacts on the touch-sensitive surface 551 and/or one or more tactile output generators 459 for generating tactile outputs for a user of the device 400.
Although some of the examples that follow will be given with reference to input on the touch screen display 212 (where the touch-sensitive surface and the display are combined), in some embodiments, the device detects input on a touch-sensitive surface that is separate from the display, as shown in fig. 5B. In some implementations, the touch-sensitive surface (e.g., 551 in fig. 5B) has a major axis (e.g., 552 in fig. 5B) that corresponds to a major axis (e.g., 553 in fig. 5B) on the display (e.g., 550). According to these embodiments, the device detects contacts (e.g., 560 and 562 in fig. 5B) with the touch-sensitive surface 551 at locations corresponding to respective locations on the display (e.g., 560 corresponds to 568 and 562 corresponds to 570 in fig. 5B). As such, when the touch-sensitive surface (e.g., 551 in fig. 5B) is separated from the display (550 in fig. 5B) of the multifunction device, user inputs (e.g., contacts 560 and 562 and their movements) detected by the device on the touch-sensitive surface are used by the device to manipulate the user interface on the display. It should be understood that similar methods are optionally used for the other user interfaces described herein.
Additionally, while the following examples are given primarily with reference to finger inputs (e.g., finger contact, single-finger tap gesture, finger swipe gesture), it should be understood that in some embodiments one or more of these finger inputs are replaced by inputs from another input device (e.g., mouse-based inputs or stylus inputs). For example, the swipe gesture is optionally replaced by a mouse click (e.g., rather than a contact) followed by movement of the cursor along the path of the swipe (e.g., rather than movement of the contact). As another example, a flick gesture is optionally replaced by a mouse click (e.g., rather than detection of a contact followed by termination of detection of the contact) while the cursor is over the location of the flick gesture. Similarly, when multiple user inputs are detected simultaneously, it should be understood that multiple computer mice are optionally used simultaneously, or mouse and finger contacts are optionally used simultaneously.
Fig. 6A illustrates an exemplary personal electronic device 600. The device 600 includes a body 602. In some embodiments, device 600 may include some or all of the features described for devices 200 and 400 (e.g., fig. 2A-4). In some embodiments, device 600 has a touch-sensitive display screen 604, hereinafter referred to as touch screen 604. Instead of or in addition to the touch screen 604, the device 600 has a display and a touch-sensitive surface. As with devices 200 and 400, in some embodiments, touch screen 604 (or touch-sensitive surface) may have one or more intensity sensors for detecting the intensity of an applied contact (e.g., touch). One or more intensity sensors of touch screen 604 (or touch-sensitive surface) may provide output data representing the intensity of a touch. The user interface of device 600 may respond to the touch based on the strength of the touch, meaning that different strengths of the touch may invoke different user interface operations on device 600.
Techniques for detecting and processing touch intensity can be found, for example, in the following related patent applications: international patent Application Ser. No. PCT/US2013/040061 entitled "Device, Method, and Graphical User Interface for displaying User Interface Objects reforming to an Application", filed on 8.5.2013, and International patent Application Ser. No. PCT/US2013/069483 entitled "Device, Method, and Graphical User Interface for transforming Between Touch Input to display output references", filed on 11.2013, 11.11.2013, each of which is hereby incorporated by reference in its entirety.
In some embodiments, device 600 has one or more input mechanisms 606 and 608. Input mechanisms 606 and 608 (if included) may be in physical form. Examples of physical input mechanisms include push buttons and rotatable mechanisms. In some embodiments, device 600 has one or more attachment mechanisms. Such attachment mechanisms, if included, may allow device 600 to be attached with, for example, a hat, glasses, earrings, necklace, shirt, jacket, bracelet, watchband, bracelet, pants, belt, shoe, purse, backpack, and the like. These attachment mechanisms may allow the user to wear the device 600.
Fig. 6B illustrates an exemplary personal electronic device 600. In some embodiments, the apparatus 600 may include some or all of the components described with reference to fig. 2A, 2B, and 4. The device 600 has a bus 612 that operatively couples an I/O portion 614 with one or more computer processors 616 and a memory 618. I/O portion 614 may be connected to display 604, which may have touch sensitive component 622 and optionally also touch intensity sensitive component 624. Further, I/O portion 614 may connect with communications unit 630 for receiving applications and operating system data using Wi-Fi, bluetooth, Near Field Communication (NFC), cellular, and/or other wireless communication technologies. Device 600 may include input mechanisms 606 and/or 608. For example, input mechanism 606 may be a rotatable input device or a depressible input device as well as a rotatable input device. In some examples, input mechanism 608 may be a button.
In some examples, input mechanism 608 may be a microphone. The personal electronic device 600 may include various sensors, such as a GPS sensor 632, an accelerometer 634, an orientation sensor 640 (e.g., a compass), a gyroscope 636, a motion sensor 638, and/or combinations thereof, all of which may be operatively connected to the I/O section 614.
The memory 618 of the personal electronic device 600 may be a non-transitory computer-readable storage medium for storing computer-executable instructions that, when executed by one or more computer processors 616, may, for example, cause the computer processors to perform the techniques described below, including processes 800 and 900 (fig. 8-9). The computer-executable instructions may also be stored and/or transmitted within any non-transitory computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. The personal electronic device 600 is not limited to the components and configuration of fig. 6B, but may include other components or additional components in a variety of configurations.
As used herein, the term "affordance" refers to a user-interactive graphical user interface object that may be displayed on a display screen of device 200,400, and/or 600 (FIGS. 2, 4, and 6). For example, images (e.g., icons), buttons, and text (e.g., links) can each constitute an affordance.
As used herein, the term "focus selector" refers to an input element that is used to indicate the current portion of the user interface with which the user is interacting. In some implementations that include a cursor or other position marker, the cursor acts as a "focus selector" such that when an input (e.g., a press input) is detected on a touch-sensitive surface (e.g., touchpad 455 in fig. 4 or touch-sensitive surface 551 in fig. 5B) while the cursor is over a particular user interface element (e.g., a button, window, slider, or other user interface element), the particular user interface element is adjusted according to the detected input. In some implementations that include a touch screen display (e.g., touch-sensitive display system 212 in fig. 2A or touch screen 212 in fig. 5A) that enables direct interaction with user interface elements on the touch screen display, a detected contact on the touch screen acts as a "focus selector" such that when an input (e.g., a press input by the contact) is detected at a location of a particular user interface element (e.g., a button, window, slider, or other user interface element) on the touch screen display, the particular user interface element is adjusted in accordance with the detected input. In some implementations, the focus is moved from one area of the user interface to another area of the user interface without corresponding movement of a cursor or movement of a contact on the touch screen display (e.g., by moving the focus from one button to another using tab or arrow keys); in these implementations, the focus selector moves according to focus movement between different regions of the user interface. Regardless of the particular form taken by the focus selector, the focus selector is typically a user interface element (or contact on a touch screen display) that is controlled by the user to deliver the user's intended interaction with the user interface (e.g., by indicating to the device the element with which the user of the user interface desires to interact). For example, upon detection of a press input on a touch-sensitive surface (e.g., a touchpad or touchscreen), the location of a focus selector (e.g., a cursor, contact, or selection box) over a respective button will indicate that the user desires to activate the respective button (as opposed to other user interface elements shown on the device display).
As used in the specification and in the claims, the term "characteristic intensity" of a contact refers to a characteristic of the contact based on one or more intensities of the contact. In some embodiments, the characteristic intensity is based on a plurality of intensity samples. The characteristic intensity is optionally based on a predefined number of intensity samples or a set of intensity samples acquired during a predetermined time period (e.g., 0.05 seconds, 0.1 seconds, 0.2 seconds, 0.5 seconds, 1 second, 2 seconds, 5 seconds, 10 seconds) relative to a predefined event (e.g., after detecting contact, before detecting contact lift, before or after detecting contact start movement, before or after detecting contact end, before or after detecting an increase in intensity of contact, and/or before or after detecting a decrease in intensity of contact). The characteristic intensity of the contact is optionally based on one or more of: maximum value of contact strength, mean value of contact strength, average value of contact strength, value at the first 10% of contact strength, half maximum value of contact strength, 90% maximum value of contact strength, and the like. In some embodiments, the duration of the contact is used in determining the characteristic intensity (e.g., when the characteristic intensity is an average of the intensity of the contact over time). In some embodiments, the characteristic intensity is compared to a set of one or more intensity thresholds to determine whether the user has performed an operation. For example, the set of one or more intensity thresholds may include a first intensity threshold and a second intensity threshold. In the present embodiment, a contact whose characteristic intensity does not exceed the first threshold value results in a first operation, a contact whose characteristic intensity exceeds the first intensity threshold value but does not exceed the second intensity threshold value results in a second operation, and a contact whose characteristic intensity exceeds the second threshold value results in a third operation. In some embodiments, the comparison between the characteristic strength and the one or more thresholds is used to determine whether to perform the one or more operations (e.g., whether to perform the respective operation or to forgo performing the respective operation), rather than to determine whether to perform the first operation or the second operation.
In some implementations, a portion of the gesture is recognized for determining the characteristic intensity. For example, the touch-sensitive surface may receive a continuous swipe contact that transitions from a starting location and reaches an ending location where the intensity of the contact increases. In this embodiment, the characteristic strength of the contact at the end position may be based on only a portion of the continuous light sweep contact, and not the entire light sweep contact (e.g., only the light sweep contact portion at the end position). In some implementations, a smoothing algorithm may be applied to the intensity of the swipe gesture before determining the characteristic intensity of the contact. For example, the smoothing algorithm optionally includes one or more of: a non-weighted moving average smoothing algorithm, a triangular smoothing algorithm, a median filter smoothing algorithm, and/or an exponential smoothing algorithm. In some cases, these smoothing algorithms eliminate narrow spikes or dips in the intensity of the swipe contact for the purpose of determining the feature intensity.
The intensity of a contact on the touch-sensitive surface may be characterized relative to one or more intensity thresholds, such as a contact detection intensity threshold, a light press intensity threshold, a deep press intensity threshold, and/or one or more other intensity thresholds. In some embodiments, the light press intensity threshold corresponds to an intensity that: at which intensity the device will perform the operations typically associated with clicking a button or touchpad of a physical mouse. In some embodiments, the deep press intensity threshold corresponds to an intensity that: at which intensity the device will perform a different operation than that typically associated with clicking a button of a physical mouse or trackpad. In some embodiments, when a contact is detected whose characteristic intensity is below a light press intensity threshold (e.g., and above a nominal contact detection intensity threshold, a contact below the nominal contact detection intensity threshold is no longer detected), the device will move the focus selector in accordance with movement of the contact on the touch-sensitive surface without performing operations associated with a light press intensity threshold or a deep press intensity threshold. Generally, unless otherwise stated, these intensity thresholds are consistent between different sets of user interface drawings.
Increasing the contact characteristic intensity from an intensity below the light press intensity threshold to an intensity between the light press intensity threshold and the deep press intensity threshold is sometimes referred to as a "light press" input. Increasing the contact characteristic intensity from an intensity below the deep press intensity threshold to an intensity above the deep press intensity threshold is sometimes referred to as a "deep press" input. Increasing the contact characteristic intensity from an intensity below the contact detection intensity threshold to an intensity between the contact detection intensity threshold and the light press intensity threshold is sometimes referred to as detecting a contact on the touch surface. The decrease in contact characteristic intensity from an intensity above the contact detection intensity threshold to an intensity below the contact detection intensity threshold is sometimes referred to as detecting a lift of the contact from the touch surface. In some embodiments, the contact detection intensity threshold is zero. In some embodiments, the contact detection intensity threshold is greater than zero.
In some embodiments described herein, one or more operations are performed in response to detecting a gesture that includes a respective press input or in response to detecting a respective press input performed with a respective contact (or contacts), wherein the respective press input is detected based at least in part on detecting an increase in intensity of the contact (or contacts) above a press input intensity threshold. In some embodiments, the respective operation is performed in response to detecting an increase in intensity of the respective contact above a press input intensity threshold (e.g., a "down stroke" of the respective press input). In some embodiments, the press input includes an increase in intensity of the respective contact above a press input intensity threshold and a subsequent decrease in intensity of the contact below the press input intensity threshold, and the respective operation is performed in response to detecting a subsequent decrease in intensity of the respective contact below the press input threshold (e.g., an "up stroke" of the respective press input).
In some embodiments, the device employs intensity hysteresis to avoid accidental input sometimes referred to as "jitter," where the device defines or selects a hysteresis intensity threshold having a predefined relationship to the press input intensity threshold (e.g., the hysteresis intensity threshold is X intensity units lower than the press input intensity threshold, or the hysteresis intensity threshold is 75%, 90%, or some reasonable proportion of the press input intensity threshold). Thus, in some embodiments, the press input includes an increase in intensity of the respective contact above a press input intensity threshold and a subsequent decrease in intensity of the contact below a hysteresis intensity threshold corresponding to the press input intensity threshold, and the respective operation is performed in response to detecting a subsequent decrease in intensity of the respective contact below the hysteresis intensity threshold (e.g., an "upstroke" of the respective press input). Similarly, in some embodiments, a press input is detected only when the device detects an increase in intensity of the contact from an intensity at or below the hysteresis intensity threshold to an intensity at or above the press input intensity threshold and optionally a subsequent decrease in intensity of the contact to an intensity at or below the hysteresis intensity, and a corresponding operation is performed in response to detecting the press input (e.g., an increase in intensity of the contact or a decrease in intensity of the contact, depending on the circumstances).
For ease of explanation, optionally, a description of an operation performed in response to a press input associated with a press input intensity threshold or in response to a gesture that includes a press input is triggered in response to detection of any of the following: the contact intensity increases above the press input intensity threshold, the contact intensity increases from an intensity below the hysteresis intensity threshold to an intensity above the press input intensity threshold, the contact intensity decreases below the press input intensity threshold, and/or the contact intensity decreases below the hysteresis intensity threshold corresponding to the press input intensity threshold. Additionally, in examples in which operations are described as being performed in response to detecting that the intensity of the contact decreases below the press input intensity threshold, the operations are optionally performed in response to detecting that the intensity of the contact decreases below a hysteresis intensity threshold that corresponds to and is less than the press input intensity threshold.
3. Digital assistant system
Fig. 7A is a block diagram of a digital assistant system 700 according to various embodiments. In some embodiments, the digital assistant system 700 may be implemented on a stand-alone computer system. In some embodiments, the digital assistant system 700 may be distributed across multiple computers. In some embodiments, some of the modules and functionality of the digital assistant may be divided into a server portion and a client portion, where the client portion is located on one or more user devices (e.g., devices 104,122,200,400 or 600) and communicates with the server portion (e.g., server system 108) over one or more networks, for example as shown in fig. 1. In some embodiments, digital assistant system 700 may be a specific implementation of server system 108 (and/or DA server 106) shown in fig. 1. It should be noted that the digital assistant system 700 is only one example of a digital assistant system, and that the digital assistant system 700 may have more or fewer components than shown, may combine two or more components, or may have a different configuration or layout of components. The various components shown in fig. 7A may be implemented in hardware, software instructions for execution by one or more processors, firmware (including one or more signal processing integrated circuits and/or application specific integrated circuits), or a combination thereof.
The digital assistant system 700 can include a memory 702, one or more processors 704, input/output (I/O) interfaces 706, and a network communication interface 708. These components may communicate with each other via one or more communication buses or signal lines 710.
In some embodiments, the memory 302 may include a non-transitory computer-readable medium, such as high-speed random access memory and/or a non-volatile computer-readable storage medium (e.g., one or more disk storage devices, flash memory devices, or other non-volatile solid-state memory devices).
In some embodiments, the I/O interface 706 may couple input/output devices 716, such as a display, a keyboard, a touch screen, and a microphone, of the digital assistant system 700 to the user interface module 722. I/O interface 706, in conjunction with user interface module 722, may receive user inputs (e.g., voice inputs, keyboard inputs, touch inputs, etc.) and process those inputs accordingly. In some embodiments, such as when the digital assistant is implemented on a standalone user device, the digital assistant system 700 may include any of the components and I/O communication interfaces described with respect to the devices 200,400, or 600 in fig. 2A, 4, 6A-6B, respectively. In some embodiments, the digital assistant system 700 may represent a server portion of a digital assistant implementation and may interact with a user through a client-side portion located on a user device (e.g., device 104,200,400 or 600).
In some embodiments, the network communication interface 708 may include wireless transmit and receive circuitry 714 and/or one or more wired communication ports 712. The one or more wired communication ports may receive and transmit communication signals via one or more wired interfaces, such as ethernet, Universal Serial Bus (USB), firewire, and the like. The wireless circuitry 714 may receive and transmit RF and/or optical signals to and from communication networks and other communication devices. The wireless communication may use any of a variety of communication standards, protocols, and technologies, such as GSM, EDGE, CDMA, TDMA, Bluetooth, Wi-Fi, VoIP, Wi-MAX, or any other suitable communication protocol. Network communication interface 708 may enable communication between digital assistant system 700 and other devices via a network, such as the internet, an intranet, and/or a wireless network, such as a cellular telephone network, a wireless Local Area Network (LAN), and/or a Metropolitan Area Network (MAN).
In some embodiments, memory 702, or a computer-readable storage medium of memory 702, may store programs, modules, instructions, and data structures, including all or a subset of the following: an operating system 718, a communications module 720, a user interface module 722, one or more application programs 724, and a digital assistant module 726. In particular, memory 702 or the computer-readable storage medium of memory 702 may store instructions for performing processes 800,900 described below. The one or more processors 704 may execute the programs, modules, and instructions and read data from, or write data to, the data structures.
The operating system 718 (e.g., Darwin, RTXC, LINUX, UNIX, iOS, OS X, WINDOWS, or embedded operating systems such as VxWorks) may include various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communication between various hardware, firmware, and software components.
The communication module 720 may facilitate communications between the digital assistant system 700 and other devices via the network communication interface 708. For example, the communication module 720 may communicate with the RF circuitry 208 of an electronic device, such as the devices 200,400, and 600 shown in fig. 2A, 4, and 6A-6B, respectively. The communications module 720 may also include various components for processing data received by the wireless circuitry 714 and/or the wired communications port 712.
User interface module 722 may receive commands and/or input from a user (e.g., from a keyboard, touch screen, pointing device, controller, and/or microphone) via I/O interface 706 and generate user interface objects on the display. User interface module 722 may also prepare and communicate output (e.g., voice, sound, animation, text, icons, vibration, haptic feedback, lighting, etc.) to the user via I/O interface 706 (e.g., through a display, audio channel, speaker, touch pad, etc.).
The application programs 724 may include programs and/or modules configured to be executed by the one or more processors 704. For example, if the digital assistant system is implemented on a standalone user device, the applications 724 may include user applications such as games, calendar applications, navigation applications, or mail applications. If the digital assistant system 700 is implemented on a server, the application 724 may include, for example, an asset management application, a diagnostic application, or a scheduling application.
The memory 702 may also store a digital assistant module 726 (or a server portion of a digital assistant). In some embodiments, digital assistant module 726 may include the following sub-modules, or a subset or superset thereof: an input/output processing module 728, a Speech To Text (STT) processing module 730, a natural language processing module 732, a conversation stream processing module 734, a task stream processing module 736, a services processing module 738, and a speech synthesis module 740. Each of these modules may have access to one or more, or a subset or superset thereof, of the systems or data and models of the following digital assistant module 726: ontology 760, vocabulary index 744, user data 748, task flow model 754, service model 756, and ASR system.
In some embodiments, using the processing modules, data, and models implemented on the digital assistant module 726, the digital assistant can perform at least some of the following: converting the voice input to text; identifying a user's intent expressed from a natural language input received by the user; actively elicit and obtain the information needed to fully infer the user's intent (e.g., by disambiguating words, names, intentions, etc.); determining a task flow for implementing the inferred intent; and executing the task flow to achieve the inferred intent.
In some embodiments, as shown in fig. 7B, I/O processing module 728 may interact with a user through I/O device 716 in fig. 7A or with a user device (e.g., device 104,200,400, or 600) through network communication interface 708 in fig. 7A to obtain user input (e.g., voice input) and provide a response to the user input (e.g., as voice output). The I/O processing module 728 may optionally obtain contextual information associated with the user input from the user device along with or shortly after receiving the user input. The contextual information may include user-specific data, vocabulary, and/or preferences related to user input. In some embodiments, the context information also includes software and hardware states of the user device at the time the user request is received, and/or information relating to the user's surroundings at the time the user request is received. In some embodiments, the I/O processing module 728 may also send follow-up questions to the user regarding the user request and receive answers from the user. When a user request is received by the I/O processing module 728 and the user request may include speech input, the I/O processing module 728 may forward the speech input to the STT processing module 730 (or speech recognizer) for speech-to-text conversion.
STT processing module 730 may include one or more ASR systems. The one or more ASR systems may process the speech input received through the I/O processing module 728 to generate recognition results. Each ASR system may include a front-end speech preprocessor. A front-end speech preprocessor may extract representative features from the speech input. For example, a front-end speech preprocessor may perform a fourier transform on a speech input to extract spectral features that characterize the speech input as a sequence of representative multi-dimensional vectors. Moreover, each ASR system may include one or more speech recognition models (e.g., acoustic models and/or language models) and may implement one or more speech recognition engines. Embodiments of speech recognition models may include hidden Markov models, Gaussian mixture models, deep neural network models, n-gram language models, and other statistical models. Embodiments of the speech recognition engine may include a dynamic time warping based engine and a Weighted Finite State Transducer (WFST) based engine. One or more speech recognition models and one or more speech recognition engines may be used to process the extracted representative features of the front-end preprocessor to produce intermediate recognition results (e.g., phonemes, speech strings, and sub-words) and ultimately text recognition results (e.g., words, word strings, or symbol sequences). In some embodiments, the speech input may be processed at least in part by a third party service or on a user device (e.g., device 104,200,400 or 600) to produce a recognition result. Once STT processing module 730 generates a recognition result (e.g., a word, or a string of words, or a sequence of symbols) that includes a text string, the recognition result may be passed to natural language processing module 732 for intent inference.
More details regarding the processing of speech to text are described in U.S. utility model patent application serial No. 13/236,942 entitled "consistent speech Recognition Results" filed on 20/9/2011, the entire disclosure of which is incorporated herein by reference.
In some embodiments, STT processing module 730 may include and/or access a vocabulary of recognizable words via speech-to-alphabet conversion module 731. Each vocabulary word may be associated with one or more candidate pronunciations for the word represented in the speech recognition phonetic alphabet. In particular, the vocabulary of recognizable words may include multiple candidate pronunciations associated therewithThe word of the union. For example, the vocabulary may include
Figure BDA0001306601960000441
And
Figure BDA0001306601960000442
Figure BDA0001306601960000443
the word "tomato" associated with the candidate pronunciation. Also, the vocabulary words may be associated with customized candidate pronunciations based on previous speech input from the user. Such customized candidate pronunciations can be stored in STT processing module 730 and can be associated with a particular user via a user's profile on the device. In some embodiments, candidate pronunciations for words may be determined based on the spelling of the word and one or more linguistic and/or phonetic rules. In some embodiments, candidate pronunciations may be manually generated, for example, based on known canonical pronunciations.
In some embodiments, candidate pronunciations may be ranked based on their common points. For example, candidate pronunciations
Figure BDA0001306601960000451
Can be compared
Figure BDA0001306601960000452
The ranking is higher because the former is a more common pronunciation (e.g., among all users, for users in a particular geographic area, or for any other suitable subset of users). In some embodiments, the candidate pronunciations may be ranked based on whether the candidate pronunciations are customized candidate pronunciations associated with the user. For example, the custom candidate pronunciation may be ranked higher than the canonical candidate pronunciation. This is useful for identifying proper nouns with unique pronunciations that deviate from the canonical pronunciation. In some embodiments, the candidate pronunciations may be associated with one or more speech characteristics, such as geographic origin, nationality, or ethnicity. For example, candidate pronunciations
Figure BDA0001306601960000453
Can be associated with the United states to make candidate pronunciations
Figure BDA0001306601960000454
May be associated with the united kingdom. Moreover, the ranking of the candidate pronunciations may be based on one or more characteristics of the user (e.g., geographic origin, nationality, race, etc.) in the user's profile stored on the device. For example, the determination may be based on a profile of a user associated with the United states. Based on the user associated with the United states, a candidate pronunciation (associated with the United states)
Figure BDA0001306601960000455
May be ranked higher than the candidate pronunciation (associated with the united kingdom). In some embodiments, one of the ranked candidate pronunciations may be selected as the predicted pronunciation (e.g., the most likely pronunciation).
When a speech input is received, the STT processing module 730 may be used to determine phonemes corresponding to the speech input (e.g., using an acoustic model) and then attempt to determine words that match the phonemes (e.g., using a language model). For example, if STT processing module 730 may first identify a phoneme sequence corresponding to a portion of a speech input
Figure BDA0001306601960000456
Then it may be determined that the sequence corresponds to the word "tomato" based on the lexical index 744.
In some embodiments, STT processing module 730 may use approximate matching techniques to determine words in the utterance. Thus, for example, STT processing module 730 may determine a phoneme sequence
Figure BDA0001306601960000457
Corresponding to the word "tomato", even if the particular phoneme sequence is not one of the candidate phoneme sequences for that word.
The natural language processing module 732 ("natural language processor") of the digital assistant may take the sequence of words or symbols ("symbol sequence") generated by the STT processing module 730 and attempt to associate the symbol sequence with one or more "actionable intents" identified by the digital assistant. An "actionable intent" may represent a task that may be performed by a digital assistant and that may have an associated task flow implemented in task flow model 754. The associated task stream may be a series of programmed actions and steps taken by the digital assistant to perform the task. The capability scope of the digital assistant may depend on the number and variety of task flows that have been implemented and stored in task flow model 754, or in other words, on the number and variety of "actionable intents" that the digital assistant recognizes. However, the effectiveness of a digital assistant may also depend on the assistant's ability to infer the correct "executable intent or intents" from a user request expressed in natural language.
In some embodiments, in addition to the sequence of words or symbols obtained from STT processing module 730, natural language processing module 732 may also receive context information associated with the user request, such as from I/O processing module 728. The natural language processing module 732 may optionally use the context information to clarify, supplement, and/or further define information contained in the sequence of symbols received from the STT processing module 730. The contextual information may include, for example, user preferences, hardware and/or software states of the user device, sensor information collected before, during, or shortly after the user request, previous interactions (e.g., conversations) between the digital assistant and the user, and so forth. As described herein, contextual information may be dynamic and may vary with time, location, content of a conversation, and other factors.
In some embodiments, natural language processing may be based on ontology 760, for example. Ontology 760 may be a hierarchical structure that includes a number of nodes, each node representing an "actionable intent" or an "attribute" related to one or more of the "actionable intents" or other "attributes". As described above, an "actionable intent" may represent a task that a digital assistant is capable of performing, i.e., that is, "actionable" or can be performed. An "attribute" may represent a parameter associated with a sub-aspect of an executable intent or another attribute. The connection between the actionable intent node and the property node in ontology 760 may define how the parameters represented by the property node pertain to the task represented by the actionable intent node.
In some embodiments, ontology 760 may be composed of actionable intent nodes and attribute nodes. Within ontology 760, each actionable intent node may be connected to one or more property nodes directly or through one or more intermediate property nodes. Similarly, each property node may be connected directly to one or more actionable intent nodes or through one or more intermediate property nodes. For example, as shown in FIG. 7C, ontology 760 can include a "restaurant reservation" node (i.e., an actionable intent node). The property nodes "restaurant," "date/time" (for reservation), and "party size" may all be directly connected to the actionable intent node (i.e., "restaurant reservation" node).
Further, the property nodes "cuisine," price interval, "" phone number, "and" location "may be child nodes of the property node" restaurant, "and may each be connected to the" restaurant reservation "node (i.e., actionable intent node) through an intermediate property node" restaurant. As another example, as shown in fig. 7C, ontology 760 may also include a "set reminder" node (i.e., another actionable intent node). The property node "date/time" (for set reminders) and "subject" (for reminders) may both be connected to the "set reminders" node. Since the attribute "date/time" may be related to both the task of making a restaurant reservation and the task of setting a reminder, the attribute node "date/time" may be connected to both the "restaurant reservation" node and the "set reminder" node in ontology 760.
The actionable intent node, along with the concept nodes to which it connects, may be described as a "domain". In the present discussion, each domain may be associated with a respective actionable intent and refer to a set of nodes (and relationships between those nodes) associated with a particular actionable intent. For example, ontology 760 shown in FIG. 7C may include an embodiment of a restaurant reservation field 762 and an embodiment of a reminder field 764 within ontology 760. The restaurant reservation domain includes the actionable intent node "restaurant reservation," the property nodes "restaurant," date/time, "and" party size, "and the child property nodes" cuisine, "" price interval, "" phone number, "and" location. The reminder field 764 may include the executable intent node "set reminder" and the property nodes "subject" and "date/time". In some embodiments, ontology 760 may be composed of multiple domains. Each domain may share one or more attribute nodes with one or more other domains. For example, in addition to the restaurant reservation field 762 and reminder field 764, a "date/time" property node may be associated with many different fields (e.g., a scheduling field, a travel reservation field, a movie tickets field, etc.).
Although fig. 7C shows two exemplary domains within ontology 760, other domains may include, for example, "find movie," "initiate phone call," "find direction," "arrange meeting," "send message," and "provide answer to question," "read list," "provide navigation instructions," "provide instructions for task," etc. The "send message" field may be associated with a "send message" actionable intent node, and may also include attribute nodes such as "one or more recipients," message type, "and" message body. The attribute node "recipient" may be further defined, for example, by child attribute nodes such as "recipient name" and "message address".
In some embodiments, ontology 760 may include all domains (and thus actionable intents) that a digital assistant is able to understand and act upon. In some embodiments, ontology 760 may be modified, such as by adding or removing entire domains or nodes, or by modifying relationships between nodes within ontology 760.
In some embodiments, nodes associated with multiple related executables may be clustered under a "super domain" in ontology 760. For example, a "travel" super-domain may include a cluster of attribute nodes and actionable intent nodes related to travel. Executable intent nodes related to travel may include "airline reservations," "hotel reservations," "car rentals," "route planning," "finding points of interest," and so forth. An actionable intent node under the same super-domain (e.g., a "travel" super-domain) may have multiple attribute nodes in common. For example, executable intent nodes for "airline reservations," hotel reservations, "" car rentals, "" route plans, "and" find points of interest "may share one or more of the attribute nodes" starting location, "" destination, "" departure date/time, "" arrival date/time, "and" co-workers.
In some embodiments, each node in ontology 760 may be associated with a set of words and/or phrases that are related to the property or actionable intent represented by the node. The respective set of words and/or phrases associated with each node may be a so-called "vocabulary" associated with the node. The respective set of words and/or phrases associated with each node may be stored in the lexical index 744 associated with the property or actionable intent represented by the node. For example, returning to fig. 7B, the vocabulary associated with the node of the "restaurant" attribute may include words such as "food," "drinks," "cuisine," "hunger," "eating," "pizza," "fast food," "meal," and so forth. As another example, the words associated with the node of the actionable intent of "initiate a phone call" may include words and phrases such as "call," "make phone call," "dial," "make phone call with … …," "call the number," "call to," and so on. The vocabulary index 744 may optionally include words and phrases in different languages.
Natural language processing module 732 may receive a sequence of symbols (e.g., a text string) from STT processing module 730 and determine which nodes are involved in words in the sequence of symbols. In some embodiments, if a word or phrase in the sequence of symbols is found to be associated with one or more nodes in ontology 760 (via lexical index 744), the word or phrase may "trigger" or "activate" those nodes. Based on the number and/or relative importance of the activated nodes, natural language processing module 732 may select one of the actionable intents as a task that the user intends for the digital assistant to perform. In some embodiments, the domain with the most "triggered" nodes may be selected. In some embodiments, the domain with the highest confidence (e.g., based on the relative importance of its respective triggered node) may be selected. In some embodiments, the domain may be selected based on a combination of the number and importance of triggered nodes. In some embodiments, additional factors are also considered in selecting a node, such as whether the digital assistant has previously correctly interpreted a similar request from the user.
The user data 748 may include user-specific information such as user-specific vocabulary, user preferences, user addresses, the user's default and second languages, the user's contact list, and other short-term or long-term information for each user. In some embodiments, the natural language processing module 732 may use user-specific information to supplement information contained in the user input to further define the user intent. For example, for a user request "invite my friend to my birthday party," natural language processing module 732 can access user data 748 to determine which people "friends" are and where and when the "birthday party" will be held without the user explicitly providing such information in their request.
Additional details of Searching for ontologies based on symbolic strings are described in U.S. utility patent application serial No. 12/341,743 entitled "method and Apparatus for Searching Using An Active Ontology" filed on 22.12.2008, the entire disclosure of which is incorporated herein by reference.
In some embodiments, once natural language processing module 732 identifies an executable intent (or domain) based on the user request, natural language processing module 732 may generate a structured query to represent the identified executable intent. In some embodiments, the structured query may include parameters for one or more nodes within the domain that can execute the intent, and at least some of the parameters are populated with specific information and requirements specified in the user request. For example, the user may say "help me reserve a seat at 7 pm in a sushi shop. In this case, the natural language processing module 732 can correctly recognize the executable intention as "restaurant reservation" based on the user input. According to the ontology, the structured query for the "restaurant reservation" domain may include parameters such as { cuisine }, { time }, { date }, { party size }, and the like. In some embodiments, based on the speech input and text obtained from the speech input using STT processing module 730, natural language processing module 732 may generate a partially structured query for the restaurant reservation field, where the partially structured query includes parameters { cuisine ═ sushi class "} and { time ═ 7 pm" }. However, in this embodiment, the user utterance contains insufficient information to complete a structured query associated with the domain. Thus, based on the currently available information, other necessary parameters such as { co-workers } and { date } may not be specified in the structured query. In some embodiments, natural language processing module 732 may populate some parameters of the structured query with the received contextual information. For example, in some embodiments, if the user requests a sushi store that is "nearby," the natural language processing module 732 may populate the { location } parameter in the structured query with GPS coordinates from the user device.
In some embodiments, the natural language processing module 732 may pass the generated structured query (including any completed parameters) to the task flow processing module 736 ("task flow processor"). Task stream processing module 736 may be configured to receive the structured query from natural language processing module 732, complete the structured query (if necessary), and perform the actions required to "complete" the user's final request. In some embodiments, the various processes necessary to accomplish these tasks may be provided in task flow model 754. In some embodiments, task flow model 754 may include procedures for obtaining additional information from a user, as well as task flows for performing actions associated with executable intents.
As described above, to complete a structured query, the task flow processing module 736 may need to initiate additional conversations with the user in order to obtain additional information and/or clarify potentially ambiguous utterances. When such interaction is necessary, the task flow processor 736 may invoke the conversation flow processor module 734 to engage in a conversation with the user. In some embodiments, the dialog flow processor module 734 may determine how (and/or when) to request additional information from the user, and receive and process the user response. The questions may be provided to the user and answers may be received from the user through the I/O processing module 728. In some embodiments, the dialog processing module 734 may present dialog output to the user via audio and/or video output and receive input from the user via spoken or physical (e.g., click) responses. Continuing with the above-described embodiment, when the task flow processing module 736 invokes the conversation flow processing module 734 to determine "number of peers" and "date" information for a structured query associated with the domain "restaurant reservation," the conversation flow processing module 734 may generate a query such as "a few digits in a line? "and" which day to subscribe? "and the like to the user. Upon receiving an answer from the user, the dialog flow processing module 734 may populate the structured query with missing information or pass the information to the task flow processing module 736 to complete the missing information from the structured query.
Once the task flow processing module 736 has completed the structured query against the executable intent, the task flow processing module 736 may continue to perform the final task associated with the executable intent. Thus, the task flow processing module 736 may perform the steps and instructions in the task flow model according to the specific parameters contained in the structured query. For example, a task flow model for the actionable intent "restaurant reservation" may include steps and instructions for contacting a restaurant and actually requesting a reservation for a particular colleague at a particular time. For example, by using structured queries such as: { restaurant reservation, restaurant, ABC cafe, date 2012/3/12, time 7 pm, peer 5 }, task flow processing module 736 may perform the following steps: (1) a server or restaurant reservation system such as that logged into ABC cafe
Figure BDA0001306601960000501
(2) Entering date, time, and peer information in a form on a website, (3) submitting the form, and (4) making a calendar entry for the reservation in the user's calendar.
In some embodiments, the task flow processing module 736 may complete the tasks requested in the user input or provide the informational answers requested in the user input with the assistance of the service processing module 738 ("service processing module"). For example, the service processing module 738 may initiate phone calls, set calendar entries, invoke map searches, invoke or interact with other user applications installed on the user device, and invoke or interact with third-party services (e.g., restaurant reservation portals, social networking sites, bank portals, etc.) on behalf of the task flow processing module 736. In some embodiments, the protocols and Application Programming Interfaces (APIs) required for each service may be specified by respective ones of service models 756. The service handling module 738 may access the appropriate service model for the service and generate a request for the service according to the service model according to the protocols and APIs required by the service.
For example, if a restaurant has enabled an online reservation service, the restaurant may submit a service model that specifies the necessary parameters to make a reservation and an API to communicate the values of the necessary parameters to the online reservation service. The service processing module 738, when requested by the task flow processing module 736, may use the web address stored in the service model to establish a network connection with the online booking service and send the necessary parameters for booking (e.g., time, date, number of peers) to the online booking interface in a format according to the API of the online booking service.
In some embodiments, the natural language processing module 732, the dialog processing module 734, and the task flow processing module 736 may be used jointly and iteratively to infer and define the user's intent, to obtain information to further clarify and refine the user's intent, and to ultimately generate a response (i.e., output to the user, or complete a task) to satisfy the user's intent. The generated response may be a dialog response to a speech input that at least partially satisfies the user's intent. Also, in some embodiments, the generated response may be output as a voice output. In these embodiments, the generated response may be sent to a speech synthesis module 740 (e.g., a speech synthesizer) where it may be processed to synthesize a conversational response in speech form. In other embodiments, the generated response may be data content related to satisfying the user request in the voice input.
The speech synthesis module 740 may be configured to synthesize speech output for presentation to a user. The speech synthesis module 740 synthesizes speech output based on text provided by the digital assistant. For example, the generated dialog response may be in the form of a text string. The speech synthesis module 740 may convert the text strings into audible speech output. The speech synthesis module 740 may use any suitable speech synthesis technique to generate speech output from text, including, but not limited to, concatenative synthesis, unit selection synthesis, diphone synthesis, domain-specific synthesis, formant synthesis, articulatory synthesis, Hidden Markov Model (HMM) based synthesis, and sinusoid synthesis. In some embodiments, speech synthesis module 740 may be configured to synthesize respective words based on the phoneme strings corresponding to the words. For example, the phoneme string may be associated with a word in the generated dialog response. The phoneme string may be stored in metadata associated with the word. Speech synthesis model 740 may be configured to directly process the phone strings in the metadata to synthesize words in speech.
In some embodiments, rather than (or in addition to) using speech synthesis module 740, speech synthesis may be performed on a remote device (e.g., server system 108) and the synthesized speech may be sent to a user device user for output to the user. This may occur, for example, in some implementations, where the output of the digital assistant occurs at the server system. And because server systems typically have more processing power or resources than user devices, it is possible to obtain higher quality speech output than is practical for client-side synthesis.
Additional details regarding digital assistants can be found in U.S. patent application No. 12/987,982 entitled "intelligent automated Assistant" filed on 10.1.2011 and U.S. patent application No. 13/251,088 entitled "Generating and Processing Task Items at repeat Tasks to performance" filed on 30.9.2011, the entire disclosures of which are incorporated herein by reference.
4. Process for operating a digital assistant
Fig. 8 illustrates a flow diagram of a process 800 for operating a digital assistant, according to some embodiments. Process 800 is performed, for example, using one or more electronic devices (e.g., devices 104,108,200,400, or 600) implementing a digital assistant. In some embodiments, process 800 is performed using a client-server system (e.g., system 100), and the blocks of process 800 may be divided in any manner between the server (e.g., DA server 106) and the client device. Thus, although portions of process 800 are described herein as being performed by a particular device of a client-server system, it should be understood that process 800 is not so limited. In other embodiments, process 800 is performed using only a client device (e.g., user device 104). In process 800, some blocks are optionally combined, the order of some blocks is optionally changed, and some blocks are optionally omitted. In some embodiments, additional steps may be performed in conjunction with process 800.
At block 805, natural language user input is received by a user device, such as user device 104 of FIG. 1. The natural language input is, for example, a speech input or a text input. In some embodiments, natural language input may include requesting a user device and/or other device to perform a task. For example, in the embodiment "dispatch a vehicle to 1200 avenues," the natural language input may include requesting the user device to pick up a vehicle using the ride reservation service. In some embodiments, the natural language input may also specify one or more parameters of the requested task. "1200 avenues" for example specifies a predetermined pick-up location for a car. In an embodiment, "order from Domino's," the natural language input may include requesting the user device to order from a pizza chain Domino's. "myusual" may also specify, from a contextual perspective, that a predetermined food is desired.
At block 810, an intent and optionally one or more parameters associated with the intent are identified. The intent and parameters may be obtained, for example, from natural language user input. As noted, the intent may correspond to a task requested by the user. Thus, identifying (e.g., determining) the intent may include identifying a task specified in the natural language user input and/or inferring an intent corresponding to the requested task based on the language and/or context of the natural language user input. The intent may correspond to any kind of task performed by the user device, and in particular may correspond to a task performed by one or more applications of the user device, as described in more detail below.
In some embodiments, the intent is associated with (e.g., included in) one or more domains (e.g., an intent category, a set of intents). Each domain may include a particular kind of intent, allowing intuitive grouping of intents. For example, an intent to reserve a car, cancel a car reservation, and/or any other intent related to a task commonly associated with a car reservation may be included in the car reservation domain. In other embodiments, the intent to board the flight, cancel the flight, reschedule the flight, obtain flight information, and/or any other intent related to a task commonly associated with air travel may be included in the air travel domain. In other embodiments, directions may be provided, the intent to obtain road condition information, and/or any other intent related to a task commonly associated with navigation may be included in the navigation field. In other embodiments, the intent to issue a payment, receive a payment, and/or any other intent related to a task commonly associated with a financial transaction may be included in the financial transaction domain.
Identifying parameters may include identifying portions of the natural language input that specify a manner in which a task corresponding to the intent is to be performed. The parameters may specify, for example, location (e.g., address or point of interest), time, date, contact, type, text (e.g., to be inserted into an email or message), quantity (e.g., distance, money), and, in some cases, the name of the software application performing the task. The parameters may also specify other conditions for the task, embodiments of which are described herein.
The parameters may be identified using one or more detectors, for example. Each of the detectors may be configured to analyze natural language user input (e.g., a textual representation of the natural language user input) and identify one or more corresponding data types. For example, a first detector may be configured to identify a user contact and a second detector may be configured to identify an address. Other detectors may identify data types including, but not limited to, phone number, name, person of interest, place of interest, URL, time, flight number, package tracking number, and date.
In some embodiments, words of the customized vocabulary may be identified as parameters. For example, one or more detectors may be configured to identify customized vocabulary for one or more applications, respectively. The customized vocabulary of the application may include the name of the application (e.g., Uber, Lyft, Instagram, Flickr, WeChat, WhatsApp, LINE, Viber) and/or may include other terms uniquely associated with the application (e.g., UberX, DM, Lyftline, ZipCar).
In some embodiments, one or more applications may register with the application registration service. The services may be hosted by the server 108 and/or the user device 104 or otherwise accessible by the server 108 and/or the user device 104. Registering in this manner may include specifying one or more customized vocabulary terms associated with the application and, optionally, one or more language models of the customized vocabulary terms. The language model may, for example, provide one or more pronunciations for each of the custom lexical terms. The language model provided in this manner may then be used to help identify the use of such custom terms during analysis of natural language user input. In some embodiments, the customized vocabulary may be included in the vocabulary index 744 (FIG. 7B).
In some embodiments, one or more parameters may be inferred from natural language user input. For example, in the example "please drive me to the stadium," one parameter associated with intent may be inferred to be the user's location. In another embodiment, "refund john rice," one parameter that may be inferred to be associated with the intent is the number of money.
In some embodiments, an intent is identified based on natural language user input, and then a parameter associated with the intent is identified. Also, in some embodiments, parameters not associated with intent are not identified. For example, an intent corresponding to a heading (e.g., driving direction) may be associated with parameters that specify one or more locations (e.g., origin and/or destination) and/or transit modes. Consider, for example, the example "please give me real-time driving directions to 1200 avenues". In this way, the recognized intention corresponds to the task of providing the direction, "driving" is a parameter that specifies the transit mode, and "1200 avenues" is a parameter that specifies the location. The portion of the user input "real-time" is not a parameter associated with the identified intent. Thus, while "real-time" may be some intended valid parameter, it will not be recognized as a parameter during operation.
In other embodiments, one or more parameters may be identified first and an intent may be identified based on the identified one or more parameters. In other embodiments, the intent and the parameters associated with the intent may be identified simultaneously.
In some embodiments, the intent and parameters of the natural language user input are recognized by a user device, such as user device 104 of fig. 1. In other embodiments, the user device provides the natural language user input (or a representation thereof) to a server, such as server 108 of fig. 1, and the server identifies (e.g., determines) the intent and parameters of the natural language user input, as described. The server then provides (transmits) the identified intent and parameters to the user device.
Optionally, once the intent and any parameters are identified, the user device determines the identified intent and/or parameters, and in some cases, inferred parameters, to the user of the user device. Determining in this manner may include prompting the user to determine the identified intent and all identified parameters associated with the intent in response to the natural language query. For example, in response to a user input of "please drive me to the airport," the user device may provide a natural language query of "do you want to drive from your current location to the airport? "the user device provided natural language query may be provided to the user as text using a touch-sensitive display of the user device and/or may be provided to the user as speech using an audio output component of the user device (e.g., speaker 211 of fig. 2). The user may respond to the natural language query, for example, by providing natural language user input to the user device.
Optionally, the user equipment determines the respective parameters. In some embodiments, this may include prompting the user to determine one or more parameters. For example, in response to a user input of "please drive me to station," the user device may provide a natural language query of "do you say guest station? "as another example, in response to a user input" pay john $5, "the user device may provide a natural language query" do you say john smith? "the user device provided natural language query may be provided to the user as text using a touch-sensitive display of the user device and/or may be provided to the user as speech using an audio output component of the user device (e.g., speaker 211 of fig. 2). The user may respond to the natural language query, for example, by providing natural language user input to the user device.
In some embodiments, the one or more parameters are context. Accordingly, the user device may determine (e.g., resolve) one or more parameters based on the context information. The context information may be context information of the user device (or any data stored therein) and/or context information of a user of the user device. For example, the natural language user input may say "please send a vehicle to my home". Because "my home" is a contextual parameter and does not specify a real location, the user device may determine the location of the user device and identify the determined location as a parameter (i.e., instead of "my home"). As another example, the natural language user input may speak "call back to him". Because "he" is a context parameter and does not specify a particular contact, the user device may determine the contact that "he" wants and pass the contact as a parameter (i.e., instead of "he").
In some embodiments, the identified intent and parameters are implemented as intent objects. When so implemented, each intent object is an object (e.g., a data structure, a programming object) and corresponds to a respective intent. Each intent object may include one or more fields (e.g., instance variables) that respectively correspond to one or more parameters. For example, an intent object corresponding to an intent in a ride reservation intent can generate (e.g., instantiate) pseudo code as follows:
Figure BDA0001306601960000561
as will be appreciated by those skilled in the art, the above pseudo-code is exemplary and the intended object may be implemented in other ways.
By implementing the intent as an intent object, the intent may not be linguistically known. As described, the intent may be obtained from natural language user input. Thus, the same intent can be obtained from natural language input provided in any number of spoken languages. For example, an English-language natural language user input of "find Uber to 1200 park avenues" and a German natural language user input of "find Uber to 1200 park avenues" each originate from the same intent (and from the same intent object) that was identified.
At block 815, a software application associated with the intent may be identified (e.g., selected). In general, this can include identifying one or more software applications configured to perform a task corresponding to the intent.
In some embodiments, identifying the software application may include determining one or more domains corresponding to the intent and identifying the application corresponding to the domain. As described, one or more software applications may register with the application registration service. Registering in this manner may include specifying which domains (e.g., ride booking domain, flight travel domain, navigation domain) correspond to the software application. The application corresponding to the domain may support each of the domain-specified intents or may support only some of the domain-specified intents. In some embodiments, the applications may be registered with respective intents, and identifying the applications may include identifying the applications that correspond to the identified intents.
In some embodiments, the application is identified based on the identified parameters. For example, based on a user input of "send a black car to an airport," an intent to order a car using a ride reservation service may be identified along with a parameter specifying the type of vehicle (i.e., "black car"). While several available applications may typically be configured to reserve a vehicle, those applications configured to reserve a "black car" may be identified. As another example, based on the user input "send message to Sam to call" hello ". Although several applications may be configured to send messages, those applications having contact information for the contact of Sam may be identified.
In some embodiments, only applications that are installed on and/or accessible to the user device are identified. For example, while several available applications may be configured to perform tasks, those applications that may access the user device are identified. Accessible applications include applications that are resident and/or installed on the user device and also include applications that are remotely accessible by the user device, for example, on one or more other devices.
Thus, in at least some embodiments, the identified application is an application that is configured to perform a task according to the identified parameters and that is accessible to the user device. In some embodiments, multiple applications may satisfy these criteria, however the user device may wish to identify fewer applications, or a single application. Thus, the application may also be identified based on previous usage of the application by the user device. In some embodiments, for a given intent, the application most recently used to perform the task corresponding to the intent is identified. Consider a user input "call to robo," where intent (i.e., making a call) and parameters (i.e., "robo") may be identified. In this embodiment, the application that was last used to place a call is identified. In other embodiments, the applications most commonly used to perform the task corresponding to the intent are identified. For the same embodiment, the application most commonly used to place a call is identified. In some embodiments, the application is further selected based on one or more parameters. For example, the application most recently used to place a call to a contact in robo is identified, or the application most commonly used to place a call to a contact in robo is identified. In some embodiments, a default application may be specified for one or more particular tasks and/or parameters, for example, by a user or a digital assistant. The user may specify, for example, to use the first application when calling the first contact and to use the second application when calling the second contact.
As described, in some cases, the natural language user input may include a customized vocabulary that is identifiable as one or more parameters. In some embodiments, such customized vocabulary includes the application name, and thus the application may be identified based on the presence of the customized vocabulary in the input. For example, a natural language user input may say "call robo using Skype". In response, the software application Skype can be the identified software application. In another embodiment, the natural language input may speak "play a cape on Spotify" and in response, the software application Spotify may be the identified software application.
The customized vocabulary may also include terms uniquely associated with the application. Thus, such terms may be identified as parameters and used to identify applications. In the embodiment "assign me an UberX," UberX is a term of the custom vocabulary of the software application Uber and as a result Uber is identified as a software application. In the embodiment "push me wishes the shark team to cup," Tweet "is a term for the custom vocabulary of the software application Twitter and as a result Twitter is identified as a software application.
In some embodiments, applications that are not configured to perform tasks according to the identified parameters may be accessed by the user device. As a result, the user device may access (e.g., download and/or install) an application configured to perform a task in accordance with the identified parameters. In some embodiments, the user device may identify a plurality of software applications and provide a list of software applications to the user. The user may select one or more applications and the user device may access the one or more selected applications.
At block 820, the intent and parameters are provided to the identified software application. In some embodiments, the intent and parameters are provided to the software application as intent objects.
In some embodiments, the intents and parameters may be selectively provided to the software application based on the state of the user device. As an embodiment, the intent and parameters may be selectively provided to the user device based on whether the user device is in a locked state. In some embodiments, the application may be allowed to receive specific intents and parameters while the device is in a locked state. In other embodiments, the application may be allowed to receive specific intents and parameters when the user device is not in a locked state. Whether an application can receive a particular intent for a particular state of a user device can be specified by a software application, for example, during a registration process with an application registration service.
At block 825, the user device may receive one or more answers from the software application. In some embodiments, the user device receives a reply for each parameter provided to the software application. Each answer may indicate, for example, whether the parameter is valid or whether additional user input is required. If the answer indicates that the parameter is valid, no further action is taken with respect to the parameter.
If the response provided by the software application does not indicate that the parameter is valid, the response may indicate that clarification of the parameter is required. For example, the parameters may not be appropriate (i.e., invalid) and the software application may request additional input from the user. As an example, consider a user input of "Send a blue car to 1200 avenues". Although the user requests a blue car, the ride reservation application (e.g., Uber, Lyft) may not allow the selection of a blue car (e.g., the blue car may not be a supported parameter or the application may determine that no blue car is currently available). Thus, in the event that a parameter is not appropriate (e.g., a user specifies an invalid type of car), the application may request that an appropriate (e.g., valid) value be provided for the parameter (e.g., car type). For example, referring to FIG. 10A, based on a response from the software application indicating that the parameters are inappropriate, the user device may provide a natural language query 1002 prompting the user to select valid parameters. In this embodiment, the user device providing the natural language query asks the user to select a valid car type. The natural language query 1002 may be provided to the user as text using a touch-sensitive display of the user device and/or may be provided to the user as speech using an audio output component of the user device. As shown, in some embodiments, the user device may provide (e.g., display) one or more candidate parameters 1004 to the user for selection. In the present embodiment, the candidate parameters 1004 include "budget", "black car", "SUV", and "share". In some embodiments, the candidate parameters 1004 provided in this manner may be provided by a software application. The user device may select one of the candidate parameters by providing a touch input and/or providing a natural language user input to the user device, and in response, the user device may provide the selected candidate parameter to the software application.
If the answer provided by the software application does not indicate that the parameter is valid, the answer may indicate that disambiguation of the parameter is required. As a result, the user device may request input from the user to disambiguate the parameters. In an embodiment where the user enters "call tom," the parameters of the contact tom may be provided to a software application configured to place a call. If the software application determines that there are multiple contacts named "tom" when determining contact information (e.g., phone numbers) for the contact tom, the software application may request the user to specify which "tom" is desired. As part of the request, the software application includes a disambiguation list having a plurality of candidate parameters in the reply. The user device may provide a natural language query asking the user to select a candidate parameter. Also, candidate parameters for the disambiguation list may be displayed to allow user selection. The user may select one of the candidate parameters, for example, by providing a touch input and/or providing a natural language user input, and the user device may provide the selected candidate parameter to the software application.
Performing a task may require a user to specify one or more parameters of a particular type. In some embodiments, the natural language user input may ignore one or more requested parameters. Thus, the software application optionally provides one or more parameters that indicate that the requirements, if any, are not specified. Consider the user input "dispatch a car to 1200 avenues". Although automobiles are typically requested using user input, a ride reservation application identified based on the user input may request selection of a particular type of automobile. Thus, in the event that a parameter is missing (e.g., the user does not specify the type of car), the application may request that an approval value be provided for the parameter (e.g., the type of car). Referring again to FIG. 10A, the user device may provide a natural language query 1002 prompting the user to specify valid parameters. Thereafter, the user may select a candidate parameter, for example, from a list of candidate parameters, and the selected parameter may be provided to the software application, as described above.
Once the software application indicates that each parameter is valid and no additional information is requested, the user device may confirm the intent to the software application at block 830. In particular, the user device may request a notification that, given an intent and parameters associated with the intent, the software application may successfully perform a task corresponding to the intent.
Once the software application provides the notification indicating that the software application may perform the task, optionally, the user device confirms the intent to the user. For example, the user device may provide a natural language query "can i get uberX in your location, do i want to request it? The user may confirm or reject the intent by touch input or natural language input. In some embodiments, the notification provided by the software application may include information provided to the user. The information may, for example, allow the user to make a more informed decision when prompted for confirmation. For example, the user device may provide a natural language query "can i have spurx get to your location after 9 minutes, do i want to request it? "
Thereafter, the user device causes (e.g., instructs) the software application to perform the task corresponding to the intent in accordance with the parameters.
At block 835, the user device receives a result reply from the software application indicating whether the software application successfully performed the task. The result answer indicating that the task was not performed may also indicate one or more reasons for the failure. In some embodiments, the user device may provide an output, such as a natural language output, to the user indicating one or more reasons for the failure.
The result answer indicating successful execution of the task may include one or more answer items. Each response item may be a result (e.g., received, generated) determined by the software application when performing the task. For example, the answer items corresponding to a car scheduled with a ride reservation application may include car type, license plate number, driver name, arrival time, current car location, pickup location, destination, estimated journey time, estimated cost and estimated journey route, service type (e.g., uberol vs. As another example, responsive items corresponding to initiating an exercise session with a fitness application may include confirming that the session has been initiated, an exercise duration, an activity type, and one or more goals.
In some embodiments, one or more response items may be provided to the user. Referring to FIG. 10B, for example, one or more response items may be provided to the user as natural language output 1012, text input, and/or audio output. The answering item may also be provided visually. For example, a map 1014 of the estimated trip route may be provided to the user. It should be appreciated that the response items may be provided to the user in any desired manner.
In some embodiments, the software application may specify the manner in which one or more response items are provided to the user. That is, the software application may determine the manner in which the response items are displayed and/or spoken to the user and the digital assistant may provide each response item accordingly.
In some embodiments, the software application may specify a manner in which to provide the response item using the UI extension of the digital assistant. The user device may, for example, provide a software application with a set of view controller parameters (e.g., fields that may be provided to the view controller for display), and in response, the software application may provide a set of view controller parameter values. The set of view controller parameter values may indicate which answer items are to be displayed in the various fields of the view controller and/or the manner in which the answer items are displayed in each field.
In other embodiments, the digital assistant can determine how to provide the response item. In other embodiments, the software application is invoked such that the user can interact directly with the software application. In some embodiments, invoking the application in this manner may terminate the session with the digital assistant.
In some embodiments, the license of the software application is verified, e.g., before the intent and parameters are provided to the software application. The user device may, for example, determine whether a software application is allowed to access data associated with a particular intent. The determination may be made based on permissions configured on the user device. In the embodiment "assign a black car at my location," location "may be a contextual parameter that requires contextual information (e.g., location data) of the user device. Thus, before resolving the location of the user device and providing the location as a parameter to the software application, the user device may first determine whether the software application is allowed to access the information. If the software application is allowed to access the data, operation proceeds as described. If the software application is not allowed to access the data, the intent and parameters are not provided to the software application and no task is performed.
In some embodiments, the natural language input may include a plurality of task requests. Accordingly, based on the natural language input, multiple intents and/or multiple applications may be identified. Optionally, parameters associated with each intent are also identified. The natural language input "please drive me to the airport and tell me the status of the flight" may include, for example, both an intent to book a car and an intent to acquire the status of a user's flight. In some embodiments, the tasks corresponding to each intent may be performed sequentially or simultaneously.
In some embodiments, the natural language input may include a plurality of related task requests. For example, in some embodiments, a request task input by a natural language user may be dependent on completion of another request task input by the natural language. In the example "email me directions to airport" two tasks are requested: the first task provides directions and the second task sends emails. The parameter "i me" specifies a particular contact and is a parameter that specifies an intent to send an email, and the parameter "airport" is a parameter that specifies a destination that provides the intent to point to. The second task (email) depends on the first task (providing directions) since sending directions by email requires providing directions first. Therefore, the task of acquiring the pointing is performed first.
In some embodiments, intent may be provided between applications. For example, an application may provide an intent object to another application to cause the application to perform a task. In this embodiment, both intentions (providing directions and email) may be provided to the map application to provide the requested directions. The second intent or the intent to send the email may be provided as a parameter to a mapping application, for example. The map application may provide the requested directions according to the first intent or the intent to provide directions. The map application may then provide the second intent to the email application, e.g., including the pointed-to intent object as a parameter. In response, the email application may direct the sending by email as requested.
As another example, a user may provide a user input "please drive me to a shark team race" while browsing a sports application (e.g., an ESPN application). In response, the sports application with information about the game may communicate intent (e.g., a scheduled car) and parameters (e.g., a game address) to the ride reservation application. In some embodiments, intents and parameters may be provided as intent objects.
As another example, the user may provide the user input "pay me brother $ 5" when using the ride reservation application. In response, the ride reservation application may pass the intent (e.g., payment) and parameters ($5) to a payment application (e.g., PayPal, Venmo, online payment service). As described, in some embodiments, intents and parameters may be provided as intent objects.
Fig. 9 illustrates a flow diagram of a process for operating a digital assistant, according to some embodiments. Process 900 may be used, for example, to implement at least a portion of process 800 of FIG. 8, including but not limited to block 815 and/or block 820 of FIG. 8. Process 900 is performed, for example, using one or more electronic devices (e.g., devices 104,108,200,400 or 600) implementing a digital assistant. In some embodiments, process 900 is performed using a client-server system (e.g., system 100), and the blocks of process 900 may be divided in any manner between the server (e.g., DA server 106) and the client devices. Thus, although portions of process 900 are described herein as being performed by a particular device of a client-server system, it should be understood that process 900 is not so limited. In other embodiments, process 900 is performed using only a client device (e.g., user device 104). In process 900, some blocks are optionally combined, the order of some blocks is optionally changed, and some blocks are optionally omitted. In some embodiments, additional steps may be performed in conjunction with process 900.
At block 905, natural language user input is received by a user device, such as user device 104 of FIG. 1. As described, the natural language input may include a request to the user device and/or another device to perform a task, and may also specify one or more parameters of the requested task.
At block 910, an intent and optionally one or more parameters associated with the intent are identified. The intent and parameters may be obtained from natural language user input. As previously mentioned, the intent may correspond to any kind of task performed by the user device, and in particular may correspond to a task performed by one or more applications of the user device. The parameters associated with the intent may identify portions of the natural language input that specify a manner in which a task corresponding to the intent is to be performed. In an embodiment, "please drive me to the airport," is intended to correspond to a mission for a scheduled car and "airport" is a parameter that specifies a destination. Assuming that the user's intent is a predetermined car, the user's location may also be a parameter (e.g., an inferred parameter).
At block 915, it is determined whether the task corresponding to the intent can be satisfied. In some embodiments, determining may include determining whether an application configured to perform a task according to a parameter is accessible from the user device. In this embodiment, it is determined whether the user device has access to an application configured to reserve a car at the location of the user device. As noted, accessible applications are those that are stored locally on the user device and are accessible remotely by the user device.
In accordance with a determination that the task corresponding to the intent can be satisfied, at block 920, the intent and parameters are provided to the software application. For example, if it is determined at block 915 that the user device has access to a software application configured to perform the requested task according to any identified parameters, then the intent and parameters are provided to the application to perform the task. In this embodiment, this includes determining whether a software application configured to reserve a car at the location of the user device is accessible from the user device. For example, the ride reservation application Lyft may be installed on a user device and may be used to reserve a car according to embodiments described herein.
In accordance with a determination that the task corresponding to the intent cannot be satisfied, at block 925, a list of one or more software applications is provided. The one or more software application lists, for example, can include one or more software applications configured to perform the task associated with the intent according to any identified parameters. In some embodiments, the one or more software applications of the list may be identified, for example, based on one or more domains associated with the intent (to which the cancellation application may register). Referring to fig. 10C, once the list of software applications (e.g., ride reservation applications) is determined, the user device may provide the list to the user. As shown, providing the list can include providing a natural language input 1022 requesting that the user select an application from a list of one or more software applications. In some embodiments, the list is generated by the user device. In other embodiments, the list of software applications is generated by the server and provided to the user device, which in turn may provide the list to the user, as described.
At block 930, the user device receives user input indicating a selection of one or more software applications in the one or more software application lists. The user input may be a touch input on a touch sensitive display of the user device and/or may be a natural language user input.
At block 935, the intent and parameters are provided to the software application selected by the user. In some embodiments, providing the intent and the parameter includes downloading and/or installing the software application such that the software application is locally accessible to the user device. In other embodiments, this includes remotely accessing the selected application.
As described, the software application may provide one or more responses in response to the intent and the parameters. Once the parameters are verified, the user device may confirm the intent with the software application and cause the software application to perform the task. The user device may thereafter receive the resulting response and optionally provide one or more response items of the resulting response to the user.
Reference is made herein to providing a natural language output and/or a natural language query to a user of a user device. In some embodiments, the manner in which the natural language output and query are provided to the user may depend on the type or state of the user device. If, for example, the user device is a mobile phone, the user device may provide the query using both text and audio. If on the other hand the user device is a speaker, the user device may provide the query using audio only. As another example, if the user device is a mobile phone that is not paired with a headset, the user device may provide the query using text and/or a relatively short natural language query. If the user device is paired with a headset, the user device may only provide queries using relatively long natural language queries.
Fig. 10D illustrates an exemplary data flow of a digital assistant system according to some embodiments. In some embodiments, the data flow of FIG. 10D may be implemented using one or more of the processes 800, 900. Fig. 10D illustrates an exemplary data flow of a digital assistant system 1030 according to some embodiments. Specifically, fig. 10D shows a data flow of the application registration process and a data flow of the execution task. Data flows 1031-.
Typically, the data flow associated with the application enrollment process involves registration of the application with an application enrollment service (e.g., a verification service), whereby the application and the customized vocabulary corresponding thereto may be accessed and/or utilized by a digital assistant performing the task.
In operation, in data stream 1031, an application is submitted to application browsing module 1032. Both the language model corresponding to the application and the intent of the application may also be submitted. The language model may include a customized vocabulary of applications. In turn, in data flow 1033, the application browsing module 1032 can provide the application, the customized vocabulary, and/or the intent of the application to the verification service 1034. The authentication service 1034 may determine whether to authenticate the application, e.g., based on whether the application is operational with the digital assistant. This may include, for example, ensuring that any intent of the application corresponds to one or more domains of the application. For example, the verification service may reject an instant messaging application associated with an intent to reserve a car because the domain and intent are mismatched. At data flow 1035, authentication service 1034 may provide an authentication reply indicating whether the application is valid.
If the verification service 1034 indicates that the application is valid, the application browsing module 1032 provides the application (as verified) to the application store 1036. Typically, applications are downloaded by user device 1040 via DA server 1038 and/or accessed at application storage 1036, as indicated by data flow 1039. In some embodiments, user device 1040 may be user device 104 of fig. 1, and DA server 1038 may be DA server 106 of fig. 1. Plist may, for example, result in an application list (e.g., info. plist) of user device 1040 being updated and/or synchronized with DA server 1038. In data flow 1041, the verification service can provide the customized vocabulary (e.g., runtime vocabulary) of the application to DA server 1038 to facilitate parsing of the natural language input, as described.
In general, the data flow associated with performing a task involves providing an intent, and optionally, one or more parameters of an application for performing the task corresponding to the intent.
In operation, in data flow 1043, user device 1040 can provide natural language input to DA server 1038. In some embodiments, the natural language input may be provided by the digital assistant 1042 of the user device 1040. Based on the natural language input, DA server 1038 can identify one or more tasks requested in the natural language input and one or more parameters associated with the intent. Further, DA server 1038 can identify an application for performing a task associated with the intent. In some embodiments, the name (or other form of identifier) of the identified application 1044 may be a parameter of intent. DA server 1038 thereafter provides the intent, parameters, and identification of the identified application 1044 to user device 1040 (e.g., digital assistant 1042 of user device 1040) at data flow 1045. In some embodiments, the intent and parameters may be provided to the user device 1040 as an intent object.
In response, the digital assistant determines whether to allow the identified application 1044 to access information associated with the identified parameter. For example, if the parameter is the location of user device 1040, the digital assistant queries data permissions 1046 to determine whether to allow application 1044 access to the location data.
Where the application is allowed to access the data for each parameter, the user device (e.g., the user device's digital assistant 1042) provides the intent to the application. As noted, the application may reside on user device 1040. In other embodiments, the application may reside on one or more other devices and the intent may be transmitted to the application over one or more networks. As noted, if the application determines that one or more parameters are missing, inappropriate, and/or unclear, the application 1044 may thereafter request input from a user of the user device 1040. In some embodiments, the user-entered query may be provided as a natural language query generated by DA server 1038. Thus, in data flow 1051, user device 1040 can request, and subsequently receive, one or more natural language queries. Once all parameters are resolved, the application 1044 can execute the task corresponding to the intent and provide a result answer indicating whether the task was successfully performed.
One or more of the data flows of fig. 10D are performed (e.g., generated), for example, using one or more electronic devices (e.g., devices 104,108,200,400, or 600) implementing a digital assistant. In particular, the data flow provided between DA server 1038 and digital assistant 1042 of user device 1040 is shown to be dependent on the client-server architecture. In other embodiments, DA server 1038 may be implemented as a process and/or service on user device 1040. Thus, in some embodiments, the data streams exchanged between DA server 1038 and digital assistant 1042 may be exchanged only on user device 1040.
Fig. 10E illustrates an exemplary data flow of the digital assistant system 1060, according to some embodiments. In particular, fig. 10E illustrates an exemplary data flow of an application registration process and may be used to implement the application registration process as discussed in fig. 10D. Also, several components of fig. 10E correspond to components of fig. 10D, respectively, and may be provided with the same reference numerals. For the sake of brevity, the description of its function and operation will not be repeated.
In data flow 1065, the verified vocabulary is provided from the verification service 1034 to the global application vocabulary store 1060. In general, the global application vocabulary storage may store language models and/or vocabularies for any number and/or version of software applications. In data streams 1061 and 1063, the speech training module 1062 and the natural language training module 1064 are trained to recognize natural language and to process application-specific vocabulary that accompanies the verification application. Based on the data, the global application vocabulary store may generate and/or train one or more language models that allow the digital assistant to recognize and process utterances that include application-specific vocabularies.
During operation of a user device, such as user device 1040 of fig. 10D, the runtime-specific global application vocabulary store may receive vocabulary and/or language models for one or more applications of the user device from the global application vocabulary store 1060. The vocabulary may be specific to a user ID of a user of the user device and/or may be specific to a version of an application and/or operating system of the user device. Based on the vocabulary, one or more terms of the natural language input may be identified as parameters, for example.
According to some embodiments, fig. 11 illustrates a functional block diagram of an electronic device 1100 configured according to the principles of various described embodiments, including those described with reference to fig. 8. The functional blocks of the device are optionally implemented by hardware, software, or a combination of hardware and software which embody the principles of the various described embodiments. Those skilled in the art will appreciate that the functional blocks described in fig. 11 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein.
As shown in fig. 11, electronic device 1100 includes a touch-sensitive display unit 1102 and a processing unit 1108 that is optionally coupled to touch-sensitive display unit 1102. In some embodiments, the processing unit 1108 includes a receiving unit 1110, an identifying unit 1112, a providing unit 1114, and optionally, an output unit 1116, a speaker unit 1118, a display enabling unit 1120, a determining unit 1122, a requesting unit 1124, an accessing unit 1126, and a facilitating unit 1128.
In some embodiments, the processing unit 1108 is configured to receive (e.g., with the receiving unit 1110) natural language user input (e.g., block 805 of fig. 8); identifying (e.g., with the identification unit 1112) an intent object in a set of intent objects and parameters associated with the intent object, wherein the intent object and parameters are obtained from a natural language user input (e.g., block 810 of FIG. 8); identifying (e.g., with the identifying unit 1112) a software application associated with an intent object in the set of intent objects (e.g., block 815 of FIG. 8); and providing (e.g., with providing unit 1114) the intent object and the parameters to the software application (e.g., block 820 of fig. 8).
In some embodiments, processing unit 1108 is further configured to receive (e.g., with receiving unit 1110) an acknowledgement from the software application, where the acknowledgement indicates whether the parameter is valid (e.g., block 825 of fig. 8).
In some embodiments, receiving (e.g., with receiving unit 1110) the reply from the software application includes receiving (e.g., with receiving unit 1110) a disambiguation list associated with the parameter from the software application, where the disambiguation list includes a plurality of candidate parameters.
In some embodiments, processing unit 1108 is further configured to output (e.g., with output unit 1116) the disambiguation list; receiving (e.g., with receiving unit 1110) a user input indicating a selection of a candidate parameter value of a disambiguation list; the candidate parameter values of the selected disambiguation list are provided (e.g., with providing unit 1114) to the software application.
In some embodiments, the natural language user input is a first natural language input, the parameter is a first parameter, the processing unit 1108 is further configured to receive (e.g., with the receiving unit 1110) a request from a software application for a second parameter associated with the intent object; providing (e.g., with the providing unit 1114) a natural language query based on the request; receiving (e.g., with receiving unit 1110) a second natural language user input; the second parameters are identified (e.g., with identification unit 1112), where the second parameters are obtained from the second natural language user input, and the second parameters are provided (e.g., with providing unit 1114) to the software application.
In some embodiments, the electronic device further includes an audio output component, and providing the natural language query includes speaking (e.g., with the speaker unit 1118) the natural language query through the audio output component via speech synthesis.
In some embodiments, providing the natural language query includes providing (e.g., with the providing unit 1114) the natural language query based on the type of electronic device, the state of the electronic device, or a combination thereof.
In some embodiments, the processing unit 1108 is further configured to, after providing the intent object and parameters to the software application, receive (e.g., with the receiving unit 1110) a resulting reply associated with the intent object from the software application (e.g., block 835 of fig. 8).
In some embodiments, receiving the resulting response includes receiving (e.g., with receiving unit 1110) a set of response items associated with the intent object from the software application and outputting (e.g., with outputting unit 1116) the set of response items.
In some embodiments, the processing unit 1108 is further configured to provide (e.g., with the providing unit 1114) a set of view controller parameters to the software application; and receiving (e.g., with receiving unit 1110) a set of view controller parameter values corresponding to the set of view controller parameters from the software application, wherein outputting the set of response items comprises enabling display of the set of response items in the user interface (e.g., with display enabling unit 1120) based on the received view controller parameter values.
In some embodiments, each intent object in the set of intent objects is associated with a software application.
In some embodiments, the intent object is a first intent object and wherein the parameter is associated with the first intent object and is not associated with the second intent object.
In some embodiments, the processing unit 1108 is further configured to determine (e.g., with the determining unit 1122) whether the electronic device is in a locked state; and determining (e.g., with the determining unit 1122) whether the intent object is allowed to be provided to the software application while the electronic device is in the locked state based on determining that the intent object is allowed to be provided to the software application while the electronic device is in the locked state, wherein the intent object and the parameter value are provided to the software application only based on determining that the intent object is allowed to be provided to the software application while the electronic device is in the locked state.
In some embodiments, the parameters are obtained from the natural language user input by analyzing the natural language user input with each of a plurality of detectors.
In some embodiments, the processing unit 1108 is further configured to determine (e.g., with the determining unit 1122) a user context of the electronic device, wherein the parameter is based at least in part on the user context.
In some embodiments, the processing unit 1108 is further configured to request (e.g., with the requesting unit 1124) an acknowledgement of the parameter; and receiving (e.g., with receiving unit 1110) user input corresponding to the confirmation of the parameter, wherein the parameter is provided to the software application in response to receiving the user input.
In some embodiments, identifying the software application includes determining (e.g., using determining unit 1122) whether the software application is resident on the electronic device; and in accordance with a determination that the software application is not resident on the electronic device, determining (e.g., with determining unit 1122) whether the software application is resident on an external device in communication with the electronic device. In some embodiments, providing the intent object and parameters to the software application includes, in accordance with a determination that the software application is resident on an external device in communication with the electronic device, providing (e.g., with the providing unit 1114) the intent object and parameters to the software application by providing the intent object and parameters to the external device.
In some embodiments, identifying the software application includes determining (e.g., using determining unit 1122) whether the software application is resident on the electronic device; in accordance with a determination that the software application is not resident on the electronic device, identifying (e.g., with identification unit 1112) a set of software applications associated with the intent object; enabling display (e.g., with display enabling unit 1120) of the set of software applications associated with the intent object on the user interface; receiving (e.g., with receiving unit 1110) a user input indicating a selection of one or more software applications of the set of software applications associated with the intent object; and accessing (e.g., with accessing unit 1126) the selected one or more of the set of software applications associated with the intent object.
In some embodiments, identifying the software application associated with the intent object of the set of intent objects includes identifying (e.g., with the identification unit 1112) the software application based on the parameters.
In some embodiments, the processing unit 1108 is further configured to identify (e.g., with the identifying unit 1112) a third intent object from the set of intent objects based on the natural language user input; identifying (e.g., with identification unit 1112) a second software application based on the identified third intent object; providing (e.g., with providing unit 1114) the third intent object and the at least one response item to the second software application; and receiving (e.g., with receiving unit 1110) a second answer item associated with the third intent object from the second software application.
In some embodiments, the software application is a first software application, and providing the intent object and parameters to the software application includes causing (e.g., with the facilitating unit 1128) a third software application to provide the intent object and parameters to the first software application.
In some embodiments, receiving the natural language user input includes receiving (e.g., with receiving unit 1110) a natural language user input including application-specific terms; and wherein identifying the software application associated with the intent object of the set of intent objects comprises identifying (e.g., with identification unit 1112) the software application based on the application-specific term.
In some embodiments, the processing unit 1108 is further configured to receive (e.g., with the receiving unit 1110) a command comprising an identification of the software application; and in response to the command, determine (e.g., with determining unit 1122) whether the software application is allowed to access the data associated with the intent object.
In some embodiments, the processing unit 1108 is further configured to cause (e.g., with the facilitating unit 1128) the software application to perform a task associated with the intent object based on the parameter.
The operations described above with respect to fig. 8 are optionally implemented by the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 11. For example, receive operations 805,825 and 835; identifying operations 810 and 815; providing operation 820 and confirming operation 830 are optionally implemented by processor 120. Those of ordinary skill in the art will clearly know how other processes may be implemented based on the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 11.
Those skilled in the art will appreciate that the functional blocks described in fig. 11 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein. For example, processing unit 1108 may have an associated "controller" unit operatively coupled to processing unit 1108 to initiate operations. The controller unit is not separately shown in fig. 11, but should be understood to be within the grasp of those skilled in the art of designing devices, such as device 1100, having a processing unit 1108. In some embodiments, as another example, one or more units, such as the receiving unit 1110, may be a hardware unit other than the processing unit 1108. Thus, the description herein optionally supports combinations, subcombinations, and/or further definitions of the functional blocks described herein.
According to some embodiments, fig. 12 illustrates a functional block diagram of an electronic device 1200, including those described with reference to fig. 8, configured according to the principles of various described embodiments. The functional blocks of the device are optionally implemented by hardware, software, or a combination of hardware and software which embody the principles of the various described embodiments. Those skilled in the art will understand that the functional blocks described in fig. 12 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein.
As shown in fig. 12, one or more electronic devices 1200 include one or more processing units 1208. In some embodiments, the one or more processing units 1208 include a receiving unit 1210, a determining unit 1212, an identifying unit 1214, and a providing unit 1216.
In some embodiments, the one or more processing units 1208 are configured to receive (e.g., with the receiving unit 1210) natural language user input (e.g., block 805 of fig. 8); determining (e.g., with determining unit 1212) an intent object of a set of intent objects and a parameter associated with the intent object (e.g., block 810 of FIG. 8) based on the natural language user input; identifying (e.g., with the identifying unit 1214) the software application (e.g., block 815 of fig. 8) based on at least one of the intent object or the parameter; and providing (e.g., with providing unit 1216) the intent object and parameters to the software application (e.g., block 820 of fig. 8).
In some embodiments, determining the intent object of the set of intent objects and the parameter associated with the intent object based on the natural language user input includes determining (e.g., with the determining unit 1212), with a first electronic device of the one or more electronic devices 1200, the intent object of the set of intent objects and the parameter associated with the intent object; and wherein providing the intent object and parameters to the software application comprises providing (e.g., with the providing unit 1216) the intent object and parameters to the software application with a second electronic device of the one or more electronic devices 1200.
In some embodiments, the one or more processing units 1208 are configured to provide (e.g., with the providing unit 1216) a command with a first electronic device of the one or more electronic devices 1200 to a second electronic device of the one or more electronic devices 1200; and determining (e.g., with determining unit 1212), with the second electronic device, whether the software application is allowed to access the data associated with the intent object in response to the command.
In some embodiments, the one or more processing units 1208 are configured to receive (e.g., with the receiving unit 1210) an acknowledgement from the software application, where the acknowledgement indicates whether the parameter is valid (e.g., block 825 of fig. 8).
In some embodiments, the parameter indicates a software application.
In some embodiments, the one or more processing units 1208 are configured to identify (e.g., with the identification unit 1214) the software application based on the natural language user input; and determining (e.g., with determining unit 1212) whether an intent object in the set of intent objects corresponds to a registered intent object of the software application; wherein the intent object and the parameter are determined only from determining that the intent object corresponds to a registered intent object of the software application.
In some embodiments, the parameter is a first parameter, and the one or more processing units are configured to determine (e.g., with determining unit 1212) a second parameter based on the natural language user input; and providing (e.g., with providing unit 1216) the second parameter to the software application.
In some embodiments, the answer is a first answer, and the one or more processing units are configured to receive (e.g., with receiving unit 1210) a second answer from the software application, wherein the second answer indicates whether the second parameter is valid.
The operations described above with respect to fig. 8 are optionally implemented by the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 12. For example, receive operations 805,825 and 835; identifying operations 810 and 815; providing operation 820 and confirming operation 830 are optionally implemented by processor 120. Those of ordinary skill in the art will clearly know how other processes may be implemented based on the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 12.
Those skilled in the art will understand that the functional blocks described in fig. 12 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein. For example, the one or more processing units 1208 can have an associated "controller" unit operatively coupled to at least one of the one or more processing units 1208 to initiate operations. The controller unit is not separately shown in fig. 12, but should be understood to be within the grasp of those skilled in the art of designing an apparatus, such as apparatus 1200, having one or more processing units 1208. In some embodiments, as another example, one or more units, such as receiving unit 1210, may be hardware units other than one or more processing units 1208. Thus, the description herein optionally supports combinations, subcombinations, and/or further definitions of the functional blocks described herein.
According to some embodiments, fig. 13 illustrates a functional block diagram of an electronic device 1300, including those described with reference to fig. 9, configured according to principles of various described embodiments. The functional blocks of the device are optionally implemented by hardware, software, or a combination of hardware and software which embody the principles of the various described embodiments. Those skilled in the art will appreciate that the functional blocks described in fig. 13 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein.
As shown in fig. 13, one or more electronic devices 1300 include one or more processing units 1308. In some embodiments, the one or more processing units 1308 include a receiving unit 1310, an identifying unit 1312, a determining unit 1314, a providing unit 1316, and optionally a facilitating unit 1318.
In some embodiments, the one or more processing units 1308 are configured to receive (e.g., with the receiving unit 1310) natural language user input (e.g., block 905 of fig. 9); identifying (e.g., with identification unit 1312) an intent object of a set of intent objects and parameters associated with the intent object based on the natural language user input (e.g., block 910 of FIG. 9); determining (e.g., with determining unit 1314) whether the task corresponding to the intent object can be satisfied based on at least one of the intent object or the parameter (e.g., block 915 of fig. 9); in accordance with a determination that the task corresponding to the intent object can be satisfied, providing (e.g., with the providing unit 1316) the intent object and parameters to the software application associated with the intent object (e.g., block 920 of FIG. 9); and in accordance with a determination that the task corresponding to the intent object cannot be satisfied, providing (e.g., with providing unit 1316) a list of one or more software applications associated with the intent object (e.g., block 925 of fig. 9).
In some embodiments, the one or more processing units 1308 are configured to, after providing the one or more lists of software applications associated with the intent object, receive (e.g., with the receiving unit 1310) user input indicating a selection of a software application of the one or more lists of software applications (e.g., block 930 of fig. 9); and providing (e.g., with the providing unit 1316) the intent object of the set of intent objects to the selected software application (e.g., block 935 of fig. 9) in response to the user input.
In some embodiments, the one or more processing units 1308 are configured to provide (e.g., with the providing unit 1316) the parameters to the selected software application in response to user input.
In some embodiments, the one or more processing units 1308 are configured to receive (e.g., with the receiving unit 1310) an acknowledgement from the selected software application, and the acknowledgement indicates whether the parameter is valid.
In some embodiments, the one or more processing units 1308 are configured to cause (e.g., with the facilitating unit 1318) the selected software application to perform a task corresponding to the intent object, and to receive (e.g., with the receiving unit 1310) a resulting response associated with the intent object from the selected software application after providing the intent object to the selected software application.
The operations described above with respect to fig. 9 are optionally implemented by the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 13. For example, determine operation 905; providing operations 910,915 and 925 and receiving operation 920 are optionally performed by processor 120. Those of ordinary skill in the art will clearly know how other processes may be implemented based on the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 13.
Those skilled in the art will appreciate that the functional blocks described in fig. 13 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein. For example, one or more processing units 1308 can have an associated "controller" unit operatively coupled to at least one of the one or more processing units 1308 for initiating operations. The controller unit is not separately shown in fig. 13, but should be understood to be within the grasp of those skilled in the art of designing devices, such as device 1300, having one or more processing units 1308. In some embodiments, as another example, one or more units, such as the receiving unit 1310, may be hardware units other than one or more processing units 1308. Thus, the description herein optionally supports combinations, subcombinations, and/or further definitions of the functional blocks described herein.
According to some embodiments, fig. 14 illustrates functional block diagrams of an electronic device 1400, including those described with reference to fig. 9, configured according to principles of various described embodiments. The functional blocks of the device are optionally implemented by hardware, software, or a combination of hardware and software which embody the principles of the various described embodiments. Those skilled in the art will appreciate that the functional blocks described in fig. 14 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein.
As shown in fig. 14, the electronic device 1400 includes a touch-sensitive display unit 1402 and a processing unit 1408 optionally coupled to the touch-sensitive display unit 1402. In some embodiments, the processing unit 1402 includes a receiving unit 1410, a providing unit 1412, an obtaining unit 1414, a display enabling unit 1416, and optionally an identifying unit 1418 and a facilitating unit 1420.
In some embodiments, the processing unit 1408 is configured to receive (e.g., with the receiving unit 1410) a natural language user input, wherein the natural language user input indicates an intent object of a set of intent objects (e.g., block 905 of fig. 9); providing (e.g., with providing unit 1412) the natural language user input to the second electronic device; receiving (e.g., with receiving unit 1410) an indication from the second electronic device that the software application associated with the intent object is not on the first electronic device (e.g., block 925 of FIG. 9); in response to the notification, obtain (e.g., with obtaining unit 1414) a list of applications associated with the intent object; enabling display (e.g., with display enabling unit 1416) of a list of applications associated with the intent object in the user interface with the touch-sensitive display of the first electronic device; receiving (e.g., with receiving unit 1410) a user input indicating a selection of an application in the application list (e.g., block 930 of FIG. 9); and providing (e.g., with providing unit 1412) the intent object of the set of intent objects to the application (e.g., with block 935 of fig. 9).
In some embodiments, the processing unit 1408 is further configured to identify (e.g., with the identifying unit 1418) an intent object in the set of intent objects.
In some embodiments, the processing unit 1408 is further configured to identify (e.g., with the identifying unit 1418) parameters associated with the intent objects in the set of intent objects; and provide (e.g., with providing unit 1412) the parameters to the application.
In some embodiments, the processing unit 1408 is further configured to receive (e.g., with the receiving unit 1410) an acknowledgement from the application, and the acknowledgement indicates whether the parameter is valid.
In some embodiments, the processing unit 1408 is configured to cause (e.g., with the facilitating unit 1420) the application to perform a task associated with the intent object; and receiving (e.g., with receiving unit 1410) a result response associated with the intent object from the application after providing the intent object to the application.
The operations described above with respect to fig. 9 are optionally implemented by the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 14. For example, determine operation 905; providing operations 910,915 and 925 and receiving operation 920 are optionally performed by processor 120. Those of ordinary skill in the art will clearly know how other processes may be implemented based on the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 14.
Those skilled in the art will appreciate that the functional blocks described in fig. 14 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein. For example, the processing unit 1408 may have an associated "controller" unit operatively coupled to the processing unit 1408 to initiate operations. The controller unit is not separately shown in fig. 14, but should be understood to be within the grasp of those skilled in the art of designing devices having a processing unit 1408, such as device 1400. In some embodiments, as another example, one or more units, such as receiving unit 1410, may be hardware units other than processing unit 1408. Thus, the description herein optionally supports combinations, subcombinations, and/or further definitions of the functional blocks described herein.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the technology and its practical applications. Those skilled in the art are thus well able to best utilize the techniques and various embodiments with various modifications as are suited to the particular use contemplated.
Although the present disclosure and examples have been fully described with reference to the accompanying drawings, it is to be noted that various changes and modifications will become apparent to those skilled in the art. It is to be understood that such changes and modifications are to be considered as included within the scope of the disclosure and examples as defined by the following claims.

Claims (7)

1. A method, comprising:
at a first electronic device having one or more processors:
receiving a natural language user input, wherein the natural language user input indicates an intent object in a set of intent objects;
providing the natural language user input to a second electronic device;
in response to determining, by the second electronic device, that the software application associated with the intent object is not located on the first electronic device, receiving an indication of the determination from the second electronic device;
obtaining, in response to the indication, a list of applications associated with the intent object;
displaying, with a touch-sensitive display of the first electronic device, a list of applications associated with the intent object on a user interface;
receiving a user input indicating a selection of an application in the application list; and
providing the intent object of the set of intent objects to the application, wherein providing the intent object includes downloading and/or installing the application such that the software application is locally accessible to the first electronic device, or wherein providing the intent object includes remotely accessing the application.
2. The method of claim 1, further comprising:
identifying the intent object in the set of intent objects.
3. The method of any of claims 1-2, further comprising:
identifying a parameter associated with the intent object in the set of intent objects; and
providing the parameters to the application.
4. The method of claim 3, further comprising:
receiving an acknowledgement from the application, wherein the acknowledgement indicates whether the parameter is valid.
5. The method of any of claims 1-2, further comprising:
causing the application to perform a task associated with the intent object; and
after providing the intent object to the application, receiving a resulting reply associated with the intent object from the application.
6. An electronic device, comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-5.
7. A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to perform the method of any of claims 1-5.
CN201710386931.3A 2016-06-11 2017-05-26 Application integration with digital assistant Active CN107491295B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010814076.3A CN111913778B (en) 2016-06-11 2017-05-26 Application integration with digital assistant

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US201662348929P 2016-06-11 2016-06-11
US62/348,929 2016-06-11
DKPA201670540A DK201670540A1 (en) 2016-06-11 2016-07-19 Application integration with a digital assistant
DKPA201670540 2016-07-19
DKPA201670562 2016-07-28
DKPA201670563A DK201670563A1 (en) 2016-06-11 2016-07-28 Application integration with a digital assistant
DKPA201670562A DK201670562A1 (en) 2016-06-11 2016-07-28 Application integration with a digital assistant
DKPA201670564A DK179301B1 (en) 2016-06-11 2016-07-28 Application integration with a digital assistant
DKPA201670564 2016-07-28
DKPA201670563 2016-07-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202010814076.3A Division CN111913778B (en) 2016-06-11 2017-05-26 Application integration with digital assistant

Publications (2)

Publication Number Publication Date
CN107491295A CN107491295A (en) 2017-12-19
CN107491295B true CN107491295B (en) 2020-08-18

Family

ID=60642137

Family Applications (5)

Application Number Title Priority Date Filing Date
CN201710386931.3A Active CN107491295B (en) 2016-06-11 2017-05-26 Application integration with digital assistant
CN202010814076.3A Active CN111913778B (en) 2016-06-11 2017-05-26 Application integration with digital assistant
CN201710395240.XA Active CN107493374B (en) 2016-06-11 2017-05-26 Application integration device with digital assistant and method
CN201710386355.2A Active CN107491468B (en) 2016-06-11 2017-05-26 Application integration with digital assistant
CN202110515238.8A Pending CN113238707A (en) 2016-06-11 2017-05-26 Application integration with digital assistant

Family Applications After (4)

Application Number Title Priority Date Filing Date
CN202010814076.3A Active CN111913778B (en) 2016-06-11 2017-05-26 Application integration with digital assistant
CN201710395240.XA Active CN107493374B (en) 2016-06-11 2017-05-26 Application integration device with digital assistant and method
CN201710386355.2A Active CN107491468B (en) 2016-06-11 2017-05-26 Application integration with digital assistant
CN202110515238.8A Pending CN113238707A (en) 2016-06-11 2017-05-26 Application integration with digital assistant

Country Status (1)

Country Link
CN (5) CN107491295B (en)

Families Citing this family (65)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
KR102380145B1 (en) 2013-02-07 2022-03-29 애플 인크. Voice trigger for a digital assistant
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
CN110442699A (en) 2013-06-09 2019-11-12 苹果公司 Operate method, computer-readable medium, electronic equipment and the system of digital assistants
CN105453026A (en) 2013-08-06 2016-03-30 苹果公司 Auto-activating smart responses based on activities from remote devices
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US10200824B2 (en) 2015-05-27 2019-02-05 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device
US20160378747A1 (en) 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10331312B2 (en) 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
US10740384B2 (en) 2015-09-08 2020-08-11 Apple Inc. Intelligent automated assistant for media search and playback
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10956666B2 (en) 2015-11-09 2021-03-23 Apple Inc. Unconventional virtual assistant interactions
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK180048B1 (en) 2017-05-11 2020-02-04 Apple Inc. MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770427A1 (en) 2017-05-12 2018-12-20 Apple Inc. Low-latency intelligent automated assistant
DK201770411A1 (en) 2017-05-15 2018-12-20 Apple Inc. Multi-modal interfaces
US20180336275A1 (en) 2017-05-16 2018-11-22 Apple Inc. Intelligent automated assistant for media exploration
US20180336892A1 (en) 2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10902211B2 (en) * 2018-04-25 2021-01-26 Samsung Electronics Co., Ltd. Multi-models that understand natural language phrases
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. Virtual assistant operation in multi-device environments
DK179822B1 (en) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
CN110874201B (en) * 2018-08-29 2023-06-23 斑马智行网络(香港)有限公司 Interactive method, device, storage medium and operating system
CN109065047B (en) * 2018-09-04 2021-05-04 出门问问信息科技有限公司 Method and device for awakening application service
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
CN109857487A (en) * 2019-02-02 2019-06-07 上海奔影网络科技有限公司 Data processing method and device for task execution
KR20200099380A (en) * 2019-02-14 2020-08-24 삼성전자주식회사 Method for providing speech recognition serivce and electronic device thereof
WO2020176353A1 (en) * 2019-02-25 2020-09-03 Liveperson, Inc. Intent-driven contact center
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
DK201970511A1 (en) 2019-05-31 2021-02-15 Apple Inc Voice identification in digital assistant systems
CN111399714A (en) * 2019-05-31 2020-07-10 苹果公司 User activity shortcut suggestions
DK180129B1 (en) 2019-05-31 2020-06-02 Apple Inc. User activity shortcut suggestions
US11468890B2 (en) 2019-06-01 2022-10-11 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11061543B1 (en) 2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context
US11183193B1 (en) 2020-05-11 2021-11-23 Apple Inc. Digital assistant hardware abstraction
US11490204B2 (en) 2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination
US11438683B2 (en) 2020-07-21 2022-09-06 Apple Inc. User identification using headphones

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5946647A (en) * 1996-02-01 1999-08-31 Apple Computer, Inc. System and method for performing an action on a structure in computer-generated data
CN103135916A (en) * 2011-11-30 2013-06-05 英特尔公司 Intelligent graphical interface in handheld wireless device
CN103744761A (en) * 2014-01-22 2014-04-23 广东欧珀移动通信有限公司 Method and system for controlling multiple mobile terminals to automatically execute tasks
CN105264524A (en) * 2013-06-09 2016-01-20 苹果公司 Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6839896B2 (en) * 2001-06-29 2005-01-04 International Business Machines Corporation System and method for providing dialog management and arbitration in a multi-modal environment
US8326630B2 (en) * 2008-08-18 2012-12-04 Microsoft Corporation Context based online advertising
US10276170B2 (en) * 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US8626511B2 (en) * 2010-01-22 2014-01-07 Google Inc. Multi-dimensional disambiguation of voice commands
US8731939B1 (en) * 2010-08-06 2014-05-20 Google Inc. Routing queries based on carrier phrase registration
US8589911B1 (en) * 2012-07-26 2013-11-19 Google Inc. Intent fulfillment
US9547647B2 (en) * 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
CN103077165A (en) * 2012-12-31 2013-05-01 威盛电子股份有限公司 Natural language dialogue method and system thereof
AU2014233517B2 (en) * 2013-03-15 2017-05-25 Apple Inc. Training an at least partial voice command system
CN113032084A (en) * 2013-06-07 2021-06-25 苹果公司 Intelligent automated assistant
CN105379234B (en) * 2013-06-08 2019-04-19 苹果公司 For providing the application gateway for being directed to the different user interface of limited dispersion attention scene and untethered dispersion attention scene
KR102223278B1 (en) * 2014-05-22 2021-03-05 엘지전자 주식회사 Glass type terminal and control method thereof
US9966065B2 (en) * 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
TWI647608B (en) * 2014-07-21 2019-01-11 美商蘋果公司 Remote user interface
CN104867492B (en) * 2015-05-07 2019-09-03 科大讯飞股份有限公司 Intelligent interactive system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5946647A (en) * 1996-02-01 1999-08-31 Apple Computer, Inc. System and method for performing an action on a structure in computer-generated data
CN103135916A (en) * 2011-11-30 2013-06-05 英特尔公司 Intelligent graphical interface in handheld wireless device
CN105264524A (en) * 2013-06-09 2016-01-20 苹果公司 Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
CN103744761A (en) * 2014-01-22 2014-04-23 广东欧珀移动通信有限公司 Method and system for controlling multiple mobile terminals to automatically execute tasks

Also Published As

Publication number Publication date
CN111913778B (en) 2024-09-27
CN107491468A (en) 2017-12-19
CN107491295A (en) 2017-12-19
CN111913778A (en) 2020-11-10
CN107491468B (en) 2021-06-01
CN113238707A (en) 2021-08-10
CN107493374B (en) 2020-06-19
CN107493374A (en) 2017-12-19

Similar Documents

Publication Publication Date Title
CN107491295B (en) Application integration with digital assistant
CN107608998B (en) Application integration with digital assistant
CN108733438B (en) Application integration with digital assistant
CN111418007B (en) Multi-round prefabricated dialogue
CN110019752B (en) Multi-directional dialog
CN110288994B (en) Detecting triggering of a digital assistant
CN110058834B (en) Intelligent device arbitration and control
CN107491469B (en) Intelligent task discovery
CN107195306B (en) Recognizing credential-providing speech input
CN112767929A (en) Privacy maintenance of personal information
CN112286428A (en) Virtual assistant continuity
CN110637339A (en) Optimizing dialog policy decisions for a digital assistant using implicit feedback
CN111524506B (en) Client server processing of natural language input to maintain privacy of personal information
AU2018100403A4 (en) Application integration with a digital assistant

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant