US20170118344A1 - Attentive assistant - Google Patents
Attentive assistant Download PDFInfo
- Publication number
- US20170118344A1 US20170118344A1 US15/298,475 US201615298475A US2017118344A1 US 20170118344 A1 US20170118344 A1 US 20170118344A1 US 201615298475 A US201615298475 A US 201615298475A US 2017118344 A1 US2017118344 A1 US 2017118344A1
- Authority
- US
- United States
- Prior art keywords
- link
- audio
- server
- user device
- audio stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/527—Centralised call answering arrangements not requiring operator intervention
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/1066—Session management
- H04L65/1069—Session establishment or de-establishment
-
- H04L65/608—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/65—Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
Definitions
- This invention relates to a communication assistant, and in particular to an automated assistant for use by an operator of a motor vehicle, or of other equipment, in performing communication related tasks.
- Mobile devices today may include voice-based interfaces, for instance, the SiriTM interface provided by Apple Inc., which may allow users to interface with their mobile devices using hands-free voice-based interactions. For example, a user may place a telephone call or dictate a text message by voice. Speech-recognition based telephone assistants have been attempted but are not ubiquitous. For example, a system developed by Wildfire Communication over twenty years ago attempted to provide telephone-based assistance, but did not relive the user of having to use a conventional telephone to interact with the system. However, drivers may be distracted using such interfaces even if a hands-free telephone is used.
- voice-based interfaces for instance, the SiriTM interface provided by Apple Inc.
- an approach to providing communication assistance to an operator of a vehicle makes use software having a first component executing on a personal device of the operator as well as a second component executing on a server in communication with the personal device.
- a method for assisting communication via a user device includes receiving at a server a voice-based call from a calling device for the user device, the voice-based call having been made to an address associated with the user device.
- a first two-way audio link between the server and the calling device is established.
- a second two-way audio link is also established between a server and the user device.
- the server responds to the call by sending a first audio stream over the first link to the calling device.
- the first audio stream includes a spoken message for alerting a calling party to the involvement of an automated assistant.
- the server receives a second audio stream over the first link from the calling device, and sends a third audio stream over the second link to the user device, where the third audio stream includes a portion of the second audio stream.
- Audio received over at least one of the first link and the second link is processed at the server. This processing includes waiting to receive a first voice response of a first predetermined type over the second link, and if the first voice response is received, causing the calling device and the user device to be joined by a two-way audio link.
- aspects may include one or more of the following features.
- the sending of the third audio stream is performed at least in part during receiving of the second audio stream.
- the third audio stream is a delay of the second audio stream.
- the voice response from the user device is not sent to the calling device.
- the first voice response consists of no spoken response (i.e., the user does not speak, for example, for a prescribed amount of time).
- Processing the audio further includes waiting to receive a second voice response of a second predetermined type over the second link, and if the second voice response is received, causing the calling device and a voice messaging server to be joined by a two-way audio link.
- Establishing the second link is performed prior to receiving the voice-based call.
- the second link comprises a packet-based link (e.g., a WebRTC based link).
- a packet-based link e.g., a WebRTC based link
- Causing the calling device and the user device to be joined by a two-way audio link comprises bridging the first link and the second link, or redirecting the voice-based call to the user device.
- in general method for assisting communication via a user device includes establishing a second two-way audio link between a server and a user device.
- Audio received at the user device from a user is processed, including receiving a first voice response of a first predetermined type, wherein first voice response causes the calling device and the user device to be joined by a two-way audio link.
- aspects may include one or more of the following features.
- the receiving of the third audio stream is performed at least in part during receiving of the second audio stream at the server.
- the third audio stream is a delay of the second audio stream.
- Establishing the second link is performed to the server receiving the second audio stream.
- An advantage of one or more embodiments is that the there is little if any distraction to the user to cause a call to be either competed from a calling device to the user device or directed to a voice messaging system.
- the requirement that the user is merely silent to cause the call to be redirected or to utter a simple command to complete the call provides a high degree of functionality with minimal distraction. More complex command input by the user can provide increased functionality without increasing distraction significantly.
- FIG. 1 is a block diagram of a communication assistance system
- FIG. 2 is a block diagram of components of the system of FIG. 1 .
- FIG. 1 shows a schematic block diagram of a communication assistance system 100 .
- a representative vehicle 120 is illustrated in FIG. 1 , as are a set of representative remote telephones 175 (or other communication devices), but it should be understood that the system described herein is intended to support a large population of users.
- a user 110 generally an operator of a vehicle 120 , makes use of a personal device 125 , such as a “smartphone”.
- the device 125 includes a processor that can execute applications, and in particular, executes a client application 127 , which is used in providing communication assistance to the user.
- the vehicle 120 may optionally include a built-in station 130 , which communicates with the personal device 125 (e.g., via a Bluetooth radio frequency communication link 126 ) and extends interface functions of the personal device via a speaker 134 , microphone 133 , and/or touchscreen 132 .
- a built-in station 130 which communicates with the personal device 125 (e.g., via a Bluetooth radio frequency communication link 126 ) and extends interface functions of the personal device via a speaker 134 , microphone 133 , and/or touchscreen 132 .
- the personal device 125 is linked to a telephone and data network 140 , for example, that includes a cellular based “3G” or “4G”/“LTE” network that provides communication services to the device, including call-based voice communication (i.e., a dedicate channel for voice data) and/or packet or message based communication.
- a telephone and data network 140 for example, that includes a cellular based “3G” or “4G”/“LTE” network that provides communication services to the device, including call-based voice communication (i.e., a dedicate channel for voice data) and/or packet or message based communication.
- the system 100 makes use of one or more server computers 150 , which execute a server application 155 .
- the client application 127 executing on the user's personal device 125 is in data and/or voice based communication with the server application 155 during the providing of communication assistance to the user.
- the user's device is associated a conventional telephone number and/or other destination address (e.g., email address, Session Initiation Protocol (SIP) Uniform Resource Identifier (URI), etc.) based on which other devices, such as remote telephone 175 can initiate communication to the user's personal device 125 .
- destination address e.g., email address, Session Initiation Protocol (SIP) Uniform Resource Identifier (URI), etc.
- SIP Session Initiation Protocol
- URI Uniform Resource Identifier
- inbound communication for example, from a remote telephone 175 is redirected to the server application 155 at the server 150 .
- redirection is selected by the user 110 when the user is operating the vehicle 120 , or in some examples, redirection is initiated automatically when the personal device is used in the vehicle (e.g., paired with the built-in station 130 ).
- One way that this redirection is accomplished is for the client application 127 executed on the personal device 125 , and to communicate with a component 145 (e.g., a switch, signaling node, gateway, etc.) of the telephone network to cause the redirection on inbound communication to the personal device.
- a component 145 e.g., a switch, signaling node, gateway, etc.
- the redirection may be turned on and off using dialing codes, such as “*72” to turn on forwarding and “*73” to turn it off.
- the user may use built-in capabilities of the personal device 125 to cause the redirection, for example, using a “Settings>Phone>Call Forwarding” setting of a smartphone.
- calls and optionally text messages are directed to the server application 155 as a result.
- the server application 155 does not necessarily have a separate physical telephone line for each user 110 .
- dialed number information may be provided by the telephone network 140 when delivering a call for the user to the server application 155 in order to identify the destination (i.e., the user) for the call.
- inbound communication may pass through a Voice-over-IP (VoIP) gateway in or at the edge of the network 140 , and call setup as well as voice data may be provided to the server application 155 over a data network connection (e.g., as Internet Protocol communication).
- VoIP Voice-over-IP
- a persistent data connection Prior to receiving communication at the server application 155 for the user 110 , a persistent data connection is established between the server application 155 and the client application 127 , or alternatively, the client application 127 can accept new data connections that are initiated on demand by the server application 155 over a data network linking the server 150 and the personal device 125 (e.g., over a data network service of the mobile network 140 ).
- the server When a voice call is received at the server application 155 for a particular user 110 , the server accepts the call and establishes a voice communication channel between the server application and the remote telephone 175 , making use of speech synthesis (either from recorded utterances, or using computer-implemented text-to-speech (TTS)) and speech recognition and/or telephone tone (DTMF) decoding capabilities at the server application 155 .
- Handling of a received voice call by the server application generally involves audio communication between the server application and the calling telephone 175 on a first communication link, as well as audio communication between the user 110 and the server application 155 on a second communication link.
- audio communication between the server application 155 and the user 110 makes use of a peer-to-peer audio protocol (e.g., WebRTC and/or RTP) to pass audio between the server application 155 and the client application 127 .
- the client application 127 interacts with the user via a microphone and speaker of the device 125 and/or the station 130 .
- the calling telephone 175 and the personal device 125 may at some point in the flow be linked by a bidirectional voice channel, for example, with the channel being bridged at the server application 155 , or bridged or redirected via capabilities provided by the telephone network 140 .
- handling of an inbound telephone call involves the server application 155 performing steps including: (1) answering the call; (2) communicating with the caller advising the caller of its assistant nature; (3) announcing the call to the user 110 , generally including forwarding of at least some audio of the communication with the caller to the user; and (4) causing the caller and the user to be in direct audio communication (e.g., bridging the call to include the caller, the server, and the in-vehicle user) or forwarding to to a voicemail repository, depending on the actions of the driver.
- direct audio communication e.g., bridging the call to include the caller, the server, and the in-vehicle user
- a call made to the user's telephone number while the user is using the system in the user's vehicle is delivered to the server application 155 .
- the server application implements the assistant function, and upon answering the call, the assistant announces itself, for instance, by saying “this is the assistant for [driver's ID]. May I help you?”
- the caller may respond by saying “I'd like to speak with [driver's ID]”, whereupon the assistant generates an audio response that says “He is driving. I'll see if he can take your call”.
- the server application forwards the audio to the client application 127 in the vehicle, and the client application plays the audio (e.g., both the server application synthesized prompts as well as the caller's audio answers).
- the assistant waits a few seconds for the driver to speak.
- This functionality may be implemented at the client application 127 , or alternatively, the monitored audio from within the vehicle may be passed to the server application 155 , which makes this determination. In any case, this audio from the vehicle is not generally passed back to the caller.
- the assistant not hearing any response from the driver, the assistant then generates another audio response that says “[driver ID] is busy; may I forward your call to his voicemail?” If the caller speaks, the assistant detects the caller's verbal response and processes the response. If the driver speaks in response to the assistant's prompt indicating that the call should be completed, then the assistant connects the device 125 to the call, and the phone call proceeds normally. If the driver does not speak, or indicates that he cannot accept the call, the call is directed to voicemail.
- connection of the call to the user may be performed in a variety of ways, including making a voice link using an Internet Protocol (e.g., SIP, WebRTP, etc.) connection, or using a cellular voice connection, for instance, with the personal device initiating a call to the server or the server initiating a voice call to the personal device (in a manner that is not subject to the forwarding setting for other calls made to the device) or using a call transfer function of the telephone network thereby removing the server application from the call.
- An Internet Protocol e.g., SIP, WebRTP, etc.
- a cellular voice connection for instance, with the personal device initiating a call to the server or the server initiating a voice call to the personal device (in a manner that is not subject to the forwarding setting for other calls made to the device) or using a call transfer function of the telephone network thereby removing the server application from the call.
- a typical interaction might involve the following exchange:
- a remove calling device 175 makes a call via the Public Switched Telephone Network (PSTN) 240 to a Voice-over-IP (VoIP) gateway 245 .
- PSTN Public Switched Telephone Network
- VoIP Voice-over-IP
- the user has previously redirected the telephone number of the user's personal device so that calls to it are redirected, in this case to the VoIP gateway.
- the server application 155 Prior to the call being made, the server application 155 has registered with the VoIP gateway to be notified of call's made to the user's number.
- the VoIP gateway uses a Session Initiation Protocol (SIP) to interact with the server application 155 with the public Internet 250 .
- SIP Session Initiation Protocol
- the server application 155 accepts the call, at which point a Real-Time Protocol (RTP) audio connection is made between the VoIP gateway 245 and the server application 155 for the call.
- RTP Real-Time Protocol
- the client application 127 has registered with the server application 155 using a WebRTC protocol over a mobile IP network 260 (e.g., a 4G cellular network) and over the public Internet 260 , and upon receiving the call for the user, the server application initiates WebRTC audio communication with the client application (e.g., using a Secure RTP (SRTP) protocol set up as part of the WebRTC interaction between the server application and the client application).
- SRTP Secure RTP
- the server application When the server application “transfers” the call to the client, it either stays in the audio path (e.g., bridging the SIP-RTP connection and the WebRTC-SRTP connection), or alternatively, the server application sends a SIP command (e.g., REFER) to the VoIP gateway causing a redirection of the audio connection to pass directly between the VoIP gateway and the user's device 125 .
- a SIP command e.g., REFER
- the user interacts with the system (i.e., implemented at the client application 127 and/or the server application 155 ), generally using recognized speech input (or in some embodiments, a limited number of manual inputs, for example, using predefined buttons). For example, in response to hearing the initial exchange with the caller, the user may provide a command that causes one of a number of different actions to be taken.
- Such actions may include, for example, completing the call (e.g., in a response such as “please put her through”), providing the caller with a predefined synthesized response, or a text message (i.e., a Short Message Service (SMS) message), providing a recorded response, forwarding the call to a predefined or selected alternate destination (e.g., to the user's secretary), etc.
- SMS Short Message Service
- the system also accepts text messages (e.g., SMS messages, email etc.) at the server on behalf of the user, and announces the arrival in a similar manner as with incoming voice calls. For instance, the arrival of the text message is announced audio to the user, and optionally (e.g., according to input from the user) the full content of the message is read to the user, and a response may be sent in return (either by default, such as “Dan is driving and can't answer right now”, or by voice input (by speech-to-text or selection of predefined responses).
- text messages e.g., SMS messages, email etc.
- the server when a text message is received for the user at the server, the server causes audio to be played to the user: “You have a text message from ZZZ. Shall I read it to you?” where ZZZ is the identity of the sender of the text message.
- the assistant listens for a reply from the driver, and if the reply is not heard, the assistant leaves the message in the message queue on the cell phone. However, if the driver says something (“play me the message”, for instance), then the assistant reads the message to the driver using a text-to-speech system, while marking the message in the message queue as “read”.
- the assistant If the message is played to the driver, the assistant then asks “would you like me to send a delivery receipt?”. Upon hearing a response from the driver, the assistant returns a text message to the sender saying “This message was delivered by [driver ID]'s voice assistant”. If the driver does not respond, then the assistant simply terminates the transaction, leaving the message in the message inbox for later retrieval.
- the assistant may be configured for more detailed replies, as described below.
- the assistant can market itself to the caller as well.
- the assistant announces itself to the caller and opens the channel to the user.
- the assistant could also announce to the caller: “I am an automated assistant, freely available at YYYY.com”.
- the assistant could say: “I'm an automated assistant. Stay on the line after the call and I can tell you about myself and send a link to download me to your phone for free.” or “This automated assistant is available—press 1 for more information”.
- the assistant could provide some basic information on how the assistant works and, if the caller agrees, send an SMS with a WWW link to download the app.
- the notifications are returned to the sender in text form.
- the assistant may modify its actions based on the history of a particular user and on a record of past interactions. For instance, if a particular user is always shunted to voicemail, the assistant may “learn” to recognize this situation, and if this caller calls it can automatically pass the call to voicemail (possibly subject to override by the driver). It may learn this circumstance using standard machine learning protocols, or with a neural network system.
- buttons are not ordinarily used in user interactions involving the attentive assistant, they may provide “emergency” services. For instance, a call that has been connected through inadvertent miss-communication between the driver and the assistant may be terminated using the “hang up” button on the driver's steering wheel (as he might do after a standard Bluetooth enabled phone call). On the other hand, if the driver did not respond verbally to an offer to connect a call, but wanted the call connected, a push of the “call” button on the steering wheel could be interpreted as a signal to the application that the driver wanted to take the call. Other uses of the steering wheel buttons may enhance the non-standard use of this attentive assistant.
- the assistant also uses machine learning to better handle calls. It starts by creating a profile for each caller based the incoming phone number.
- All available metadata contacts in the user's address book, information in the user's social graph, lookups of where the phone is based on exchange, etc
- This information along with any context about the current call (date, time, location, how fast the user is driving, etc.), is used to predict the way a new call should be handled, using machine learning models.
- the assistant detects that the caller is from an unrecognized number and introduces herself and explain how she works (“Hi. Dan is currently driving. I'm his AI assistant and help him answer his calls and take messages. Can you let me know what this is regarding?”).
- the assistant identifies the caller and recognizes that in a similar situation the user wanted to speak immediately, so does not ask what the call is in regards to: “Hi, Steve. It's nice to talk to you again. Let me see if Dan's able to talk”
- the AI assistant becomes better at predicting what the appropriate action is and simply does it automatically.
- no software is required in the vehicle with the user's phone being set to automatically answer calls from the server, with the audio link between the server and the user device being formed over a cellular telephone connection rather than being form, for example, over the WebRTC connection described above.
- certain communication functions are described as using the Public Switched Telephone Network or the public Internet.
- Alternative implementations may use different communication infrastructure, for example, with the system being entirely hosted within a cellular telephone/communication infrastructure (e.g., within an LTE based infrastructure).
- the software may include instructions for causing a processor at the user device or server computer to perform functions described above, with the software being stored on a non-transitory machine-readable medium, or transmitted (e.g., to the user device) from a storage to the user device or server computer over a communication network (e.g., downloading an application (“app”) to the user's smartphone).
- a communication network e.g., downloading an application (“app”) to the user's smartphone.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
- This application claims the benefit of U.S. Provisional Application No. 62/244,417, filed Oct. 21, 2015, titled “THE ATTENTIVE ASSISTANT.” This application is incorporated herein by reference.
- This invention relates to a communication assistant, and in particular to an automated assistant for use by an operator of a motor vehicle, or of other equipment, in performing communication related tasks.
- Mobile devices are ubiquitous in today's connected environment. There are more cell phones in the United States than there are people. Drivers often use mobile communications to transact business, to provide access to social media, or for other personal communications tasks. Some states have legislated for the use only of hands-free communication devices in cars, but scientific studies of distracted driving suggest that this constraint does not free the driver of substantial distraction. The growing rise of text communications among younger people has further exacerbated the problem, with findings that as many as 30% of traffic accidents are caused by texting-while-driving users.
- Mobile devices today may include voice-based interfaces, for instance, the Siri™ interface provided by Apple Inc., which may allow users to interface with their mobile devices using hands-free voice-based interactions. For example, a user may place a telephone call or dictate a text message by voice. Speech-recognition based telephone assistants have been attempted but are not ubiquitous. For example, a system developed by Wildfire Communication over twenty years ago attempted to provide telephone-based assistance, but did not relive the user of having to use a conventional telephone to interact with the system. However, drivers may be distracted using such interfaces even if a hands-free telephone is used.
- In a general aspect, an approach to providing communication assistance to an operator of a vehicle makes use software having a first component executing on a personal device of the operator as well as a second component executing on a server in communication with the personal device.
- In one aspect, a method for assisting communication via a user device includes receiving at a server a voice-based call from a calling device for the user device, the voice-based call having been made to an address associated with the user device. A first two-way audio link between the server and the calling device is established. A second two-way audio link is also established between a server and the user device. The server responds to the call by sending a first audio stream over the first link to the calling device. The first audio stream includes a spoken message for alerting a calling party to the involvement of an automated assistant. The server receives a second audio stream over the first link from the calling device, and sends a third audio stream over the second link to the user device, where the third audio stream includes a portion of the second audio stream. Audio received over at least one of the first link and the second link is processed at the server. This processing includes waiting to receive a first voice response of a first predetermined type over the second link, and if the first voice response is received, causing the calling device and the user device to be joined by a two-way audio link.
- Aspects may include one or more of the following features.
- The sending of the third audio stream is performed at least in part during receiving of the second audio stream.
- The third audio stream is a delay of the second audio stream.
- The voice response from the user device is not sent to the calling device.
- The first voice response consists of no spoken response (i.e., the user does not speak, for example, for a prescribed amount of time).
- Processing the audio further includes waiting to receive a second voice response of a second predetermined type over the second link, and if the second voice response is received, causing the calling device and a voice messaging server to be joined by a two-way audio link.
- Establishing the second link is performed prior to receiving the voice-based call.
- The second link comprises a packet-based link (e.g., a WebRTC based link).
- Causing the calling device and the user device to be joined by a two-way audio link comprises bridging the first link and the second link, or redirecting the voice-based call to the user device.
- In another aspect, in general method for assisting communication via a user device includes establishing a second two-way audio link between a server and a user device. A call made to the user device (e.g., from a calling device to a number for the user device) at the user device, including by receiving a third audio stream over the second link, where the third audio stream includes a portion of the second audio stream received from a calling device at the server. Audio received at the user device from a user is processed, including receiving a first voice response of a first predetermined type, wherein first voice response causes the calling device and the user device to be joined by a two-way audio link.
- Aspects may include one or more of the following features.
- The receiving of the third audio stream is performed at least in part during receiving of the second audio stream at the server.
- The third audio stream is a delay of the second audio stream.
- Establishing the second link is performed to the server receiving the second audio stream.
- An advantage of one or more embodiments is that the there is little if any distraction to the user to cause a call to be either competed from a calling device to the user device or directed to a voice messaging system. In a particularly simple embodiment, in response to “eavesdropping” on an interaction between the assistant and the caller, the requirement that the user is merely silent to cause the call to be redirected or to utter a simple command to complete the call provides a high degree of functionality with minimal distraction. More complex command input by the user can provide increased functionality without increasing distraction significantly.
- Other features and advantages of the invention are apparent from the following description, and from the claims.
-
FIG. 1 is a block diagram of a communication assistance system; -
FIG. 2 is a block diagram of components of the system ofFIG. 1 . -
FIG. 1 shows a schematic block diagram of acommunication assistance system 100. Arepresentative vehicle 120 is illustrated inFIG. 1 , as are a set of representative remote telephones 175 (or other communication devices), but it should be understood that the system described herein is intended to support a large population of users. Generally, auser 110, generally an operator of avehicle 120, makes use of apersonal device 125, such as a “smartphone”. Thedevice 125 includes a processor that can execute applications, and in particular, executes aclient application 127, which is used in providing communication assistance to the user. Thevehicle 120 may optionally include a built-instation 130, which communicates with the personal device 125 (e.g., via a Bluetooth radio frequency communication link 126) and extends interface functions of the personal device via aspeaker 134, microphone 133, and/ortouchscreen 132. - The
personal device 125 is linked to a telephone anddata network 140, for example, that includes a cellular based “3G” or “4G”/“LTE” network that provides communication services to the device, including call-based voice communication (i.e., a dedicate channel for voice data) and/or packet or message based communication. - The
system 100 makes use of one ormore server computers 150, which execute aserver application 155. In general, theclient application 127 executing on the user'spersonal device 125 is in data and/or voice based communication with theserver application 155 during the providing of communication assistance to the user. - The user's device is associated a conventional telephone number and/or other destination address (e.g., email address, Session Initiation Protocol (SIP) Uniform Resource Identifier (URI), etc.) based on which other devices, such as
remote telephone 175 can initiate communication to the user'spersonal device 125. Communication based on a conventional telephone number is described as a typical example. - In general, inbound communication, for example, from a
remote telephone 175 is redirected to theserver application 155 at theserver 150. In one approach, such redirection is selected by theuser 110 when the user is operating thevehicle 120, or in some examples, redirection is initiated automatically when the personal device is used in the vehicle (e.g., paired with the built-in station 130). One way that this redirection is accomplished is for theclient application 127 executed on thepersonal device 125, and to communicate with a component 145 (e.g., a switch, signaling node, gateway, etc.) of the telephone network to cause the redirection on inbound communication to the personal device. Various approaches to causing this redirection may be used, at least in part dependent on the capabilities of thetelephone network 140. For example, in certain networks, the redirection may be turned on and off using dialing codes, such as “*72” to turn on forwarding and “*73” to turn it off. In embodiments, rather than theclient application 127 causing the redirection, the user may use built-in capabilities of thepersonal device 125 to cause the redirection, for example, using a “Settings>Phone>Call Forwarding” setting of a smartphone. In any case, calls and optionally text messages are directed to theserver application 155 as a result. Theserver application 155 does not necessarily have a separate physical telephone line for eachuser 110. For example, dialed number information (DNIS) or other signaling information may be provided by thetelephone network 140 when delivering a call for the user to theserver application 155 in order to identify the destination (i.e., the user) for the call. In some implementations (not shown inFIG. 1 ), inbound communication may pass through a Voice-over-IP (VoIP) gateway in or at the edge of thenetwork 140, and call setup as well as voice data may be provided to theserver application 155 over a data network connection (e.g., as Internet Protocol communication). - Prior to receiving communication at the
server application 155 for theuser 110, a persistent data connection is established between theserver application 155 and theclient application 127, or alternatively, theclient application 127 can accept new data connections that are initiated on demand by theserver application 155 over a data network linking theserver 150 and the personal device 125 (e.g., over a data network service of the mobile network 140). - When a voice call is received at the
server application 155 for aparticular user 110, the server accepts the call and establishes a voice communication channel between the server application and theremote telephone 175, making use of speech synthesis (either from recorded utterances, or using computer-implemented text-to-speech (TTS)) and speech recognition and/or telephone tone (DTMF) decoding capabilities at theserver application 155. Handling of a received voice call by the server application generally involves audio communication between the server application and the callingtelephone 175 on a first communication link, as well as audio communication between theuser 110 and theserver application 155 on a second communication link. In one implementation, audio communication between theserver application 155 and theuser 110 makes use of a peer-to-peer audio protocol (e.g., WebRTC and/or RTP) to pass audio between theserver application 155 and theclient application 127. Theclient application 127 interacts with the user via a microphone and speaker of thedevice 125 and/or thestation 130. Depending on the flow of call handling, as described more fully below, the callingtelephone 175 and thepersonal device 125 may at some point in the flow be linked by a bidirectional voice channel, for example, with the channel being bridged at theserver application 155, or bridged or redirected via capabilities provided by thetelephone network 140. - In general, handling of an inbound telephone call involves the
server application 155 performing steps including: (1) answering the call; (2) communicating with the caller advising the caller of its assistant nature; (3) announcing the call to theuser 110, generally including forwarding of at least some audio of the communication with the caller to the user; and (4) causing the caller and the user to be in direct audio communication (e.g., bridging the call to include the caller, the server, and the in-vehicle user) or forwarding to to a voicemail repository, depending on the actions of the driver. - In an example of handling of an inbound call, a call made to the user's telephone number while the user is using the system in the user's vehicle is delivered to the
server application 155. The server application implements the assistant function, and upon answering the call, the assistant announces itself, for instance, by saying “this is the assistant for [driver's ID]. May I help you?” The caller may respond by saying “I'd like to speak with [driver's ID]”, whereupon the assistant generates an audio response that says “He is driving. I'll see if he can take your call”. During this exchange with the caller (or optionally with a delay or after the completion of the interaction), the server application forwards the audio to theclient application 127 in the vehicle, and the client application plays the audio (e.g., both the server application synthesized prompts as well as the caller's audio answers). After this initial exchange, the assistant waits a few seconds for the driver to speak. This functionality may be implemented at theclient application 127, or alternatively, the monitored audio from within the vehicle may be passed to theserver application 155, which makes this determination. In any case, this audio from the vehicle is not generally passed back to the caller. Not hearing any response from the driver, the assistant then generates another audio response that says “[driver ID] is busy; may I forward your call to his voicemail?” If the caller speaks, the assistant detects the caller's verbal response and processes the response. If the driver speaks in response to the assistant's prompt indicating that the call should be completed, then the assistant connects thedevice 125 to the call, and the phone call proceeds normally. If the driver does not speak, or indicates that he cannot accept the call, the call is directed to voicemail. As introduced above, the connection of the call to the user may be performed in a variety of ways, including making a voice link using an Internet Protocol (e.g., SIP, WebRTP, etc.) connection, or using a cellular voice connection, for instance, with the personal device initiating a call to the server or the server initiating a voice call to the personal device (in a manner that is not subject to the forwarding setting for other calls made to the device) or using a call transfer function of the telephone network thereby removing the server application from the call. A typical interaction might involve the following exchange: -
- [Assistant]: Hi. I'm Dan's assistant Samantha.
- [Caller]: This is Cora. I wanted to talk to Dan about the press release we're working on.
- [Assistant]: He's currently in his car. Would you like me to see if he's available to speak with you?
- [Caller]: That would be great.
- [Assistant]: ok. Hold on a second and I'll see.
- Referring to
FIG. 2 , in an embodiment of thesystem 100 described above, aremove calling device 175 makes a call via the Public Switched Telephone Network (PSTN) 240 to a Voice-over-IP (VoIP)gateway 245. As discussed above, the user has previously redirected the telephone number of the user's personal device so that calls to it are redirected, in this case to the VoIP gateway. Prior to the call being made, theserver application 155 has registered with the VoIP gateway to be notified of call's made to the user's number. When the call comes in, in this example, the VoIP gateway uses a Session Initiation Protocol (SIP) to interact with theserver application 155 with thepublic Internet 250. Theserver application 155 accepts the call, at which point a Real-Time Protocol (RTP) audio connection is made between theVoIP gateway 245 and theserver application 155 for the call. Previously, theclient application 127 has registered with theserver application 155 using a WebRTC protocol over a mobile IP network 260 (e.g., a 4G cellular network) and over thepublic Internet 260, and upon receiving the call for the user, the server application initiates WebRTC audio communication with the client application (e.g., using a Secure RTP (SRTP) protocol set up as part of the WebRTC interaction between the server application and the client application). At this point the server application passes audio data between the caller and the client application. When the server application “transfers” the call to the client, it either stays in the audio path (e.g., bridging the SIP-RTP connection and the WebRTC-SRTP connection), or alternatively, the server application sends a SIP command (e.g., REFER) to the VoIP gateway causing a redirection of the audio connection to pass directly between the VoIP gateway and the user'sdevice 125. - In other somewhat more complex call handling, the user interacts with the system (i.e., implemented at the
client application 127 and/or the server application 155), generally using recognized speech input (or in some embodiments, a limited number of manual inputs, for example, using predefined buttons). For example, in response to hearing the initial exchange with the caller, the user may provide a command that causes one of a number of different actions to be taken. Such actions may include, for example, completing the call (e.g., in a response such as “please put her through”), providing the caller with a predefined synthesized response, or a text message (i.e., a Short Message Service (SMS) message), providing a recorded response, forwarding the call to a predefined or selected alternate destination (e.g., to the user's secretary), etc. - The system also accepts text messages (e.g., SMS messages, email etc.) at the server on behalf of the user, and announces the arrival in a similar manner as with incoming voice calls. For instance, the arrival of the text message is announced audio to the user, and optionally (e.g., according to input from the user) the full content of the message is read to the user, and a response may be sent in return (either by default, such as “Dan is driving and can't answer right now”, or by voice input (by speech-to-text or selection of predefined responses).
- As an example interaction, when a text message is received for the user at the server, the server causes audio to be played to the user: “You have a text message from ZZZ. Shall I read it to you?” where ZZZ is the identity of the sender of the text message. The assistant then listens for a reply from the driver, and if the reply is not heard, the assistant leaves the message in the message queue on the cell phone. However, if the driver says something (“play me the message”, for instance), then the assistant reads the message to the driver using a text-to-speech system, while marking the message in the message queue as “read”.
- If the message is played to the driver, the assistant then asks “would you like me to send a delivery receipt?”. Upon hearing a response from the driver, the assistant returns a text message to the sender saying “This message was delivered by [driver ID]'s voice assistant”. If the driver does not respond, then the assistant simply terminates the transaction, leaving the message in the message inbox for later retrieval. The assistant may be configured for more detailed replies, as described below.
- The assistant can market itself to the caller as well. When a call or message is handled, the assistant announces itself to the caller and opens the channel to the user. Optionally, while waiting for the driver to respond, the assistant could also announce to the caller: “I am an automated assistant, freely available at YYYY.com”. Alternatively, it might say: “I'm an automated assistant. Stay on the line after the call and I can tell you about myself and send a link to download me to your phone for free.” or “This automated assistant is available—press 1 for more information”. At the end of the call, the assistant could provide some basic information on how the assistant works and, if the caller agrees, send an SMS with a WWW link to download the app. Of course, for the messaging application, the notifications are returned to the sender in text form.
- The assistant may modify its actions based on the history of a particular user and on a record of past interactions. For instance, if a particular user is always shunted to voicemail, the assistant may “learn” to recognize this situation, and if this caller calls it can automatically pass the call to voicemail (possibly subject to override by the driver). It may learn this circumstance using standard machine learning protocols, or with a neural network system.
- While buttons are not ordinarily used in user interactions involving the attentive assistant, they may provide “emergency” services. For instance, a call that has been connected through inadvertent miss-communication between the driver and the assistant may be terminated using the “hang up” button on the driver's steering wheel (as he might do after a standard Bluetooth enabled phone call). On the other hand, if the driver did not respond verbally to an offer to connect a call, but wanted the call connected, a push of the “call” button on the steering wheel could be interpreted as a signal to the application that the driver wanted to take the call. Other uses of the steering wheel buttons may enhance the non-standard use of this attentive assistant.
- The assistant also uses machine learning to better handle calls. It starts by creating a profile for each caller based the incoming phone number.
- All available metadata (contacts in the user's address book, information in the user's social graph, lookups of where the phone is based on exchange, etc) and the responses the user gives are associated with this profile. This information, along with any context about the current call (date, time, location, how fast the user is driving, etc.), is used to predict the way a new call should be handled, using machine learning models.
- For example, the first time Steve calls into the system, the assistant detects that the caller is from an unrecognized number and introduces herself and explain how she works (“Hi. Dan is currently driving. I'm his AI assistant and help him answer his calls and take messages. Can you let me know what this is regarding?”). The next time Steve calls, the assistant identifies the caller and recognizes that in a similar situation the user wanted to speak immediately, so does not ask what the call is in regards to: “Hi, Steve. It's nice to talk to you again. Let me see if Dan's able to talk”
- Over time, as more data is fed into the system to create better models, the AI assistant becomes better at predicting what the appropriate action is and simply does it automatically.
- It should be understood that various alternative implementations can provide the functionality described above. For example, some of all of the functions described above as being implemented at the server may be hosted in the vehicle, for example, on the user's communication device. Therefore, there may not be separate client and server software. An example of some but not all of the functionality described above for the server being hosted in the vehicle involves speech synthesis to the user and speech recognition of speech of the user being performed in the vehicle, and encoded information (e.g., text rather than audio) being passed between the client and the server. In some implementations, no software is required in the vehicle with the user's phone being set to automatically answer calls from the server, with the audio link between the server and the user device being formed over a cellular telephone connection rather than being form, for example, over the WebRTC connection described above. Furthermore, certain communication functions are described as using the Public Switched Telephone Network or the public Internet. Alternative implementations may use different communication infrastructure, for example, with the system being entirely hosted within a cellular telephone/communication infrastructure (e.g., within an LTE based infrastructure).
- As described above, many features of the system are implemented in software that executes at a user device and/or at a server computer. The software may include instructions for causing a processor at the user device or server computer to perform functions described above, with the software being stored on a non-transitory machine-readable medium, or transmitted (e.g., to the user device) from a storage to the user device or server computer over a communication network (e.g., downloading an application (“app”) to the user's smartphone).
- It is to be understood that the foregoing description is intended to illustrate and not to limit the scope of the invention, which is defined by the scope of the appended claims. Other embodiments are within the scope of the following claims.
Claims (17)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/298,475 US20170118344A1 (en) | 2015-10-21 | 2016-10-20 | Attentive assistant |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562244417P | 2015-10-21 | 2015-10-21 | |
US15/298,475 US20170118344A1 (en) | 2015-10-21 | 2016-10-20 | Attentive assistant |
Publications (1)
Publication Number | Publication Date |
---|---|
US20170118344A1 true US20170118344A1 (en) | 2017-04-27 |
Family
ID=57233882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/298,475 Abandoned US20170118344A1 (en) | 2015-10-21 | 2016-10-20 | Attentive assistant |
Country Status (2)
Country | Link |
---|---|
US (1) | US20170118344A1 (en) |
WO (1) | WO2017070323A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023030619A1 (en) | 2021-09-01 | 2023-03-09 | Cariad Se | Telephone service device for requesting services, vehicle and method |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2800198T3 (en) * | 2017-12-22 | 2020-12-28 | Corevas Gmbh & Co Kg | Apparatus, method and system for obtaining information on an emergency situation |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080181373A1 (en) * | 2007-01-31 | 2008-07-31 | Brown Jr Thomas W | Call Messaging System |
US20090104898A1 (en) * | 2001-02-09 | 2009-04-23 | Harris Scott C | A telephone using a connection network for processing data remotely from the telephone |
US9412394B1 (en) * | 2015-03-09 | 2016-08-09 | Jigen Labs, LLC | Interactive audio communication system |
US20160255200A1 (en) * | 2008-12-18 | 2016-09-01 | At&T Intellectual Property I, L.P. | Personalized interactive voice response system |
US20160323326A1 (en) * | 2015-04-29 | 2016-11-03 | Yallo Technologies (Israel) Ltd. | Systems and methods for screening communication sessions |
US20170178664A1 (en) * | 2014-04-11 | 2017-06-22 | Analog Devices, Inc. | Apparatus, systems and methods for providing cloud based blind source separation services |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2425913B (en) * | 2005-05-04 | 2009-07-08 | Arona Ltd | Call handling |
-
2016
- 2016-10-20 US US15/298,475 patent/US20170118344A1/en not_active Abandoned
- 2016-10-20 WO PCT/US2016/057876 patent/WO2017070323A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090104898A1 (en) * | 2001-02-09 | 2009-04-23 | Harris Scott C | A telephone using a connection network for processing data remotely from the telephone |
US20080181373A1 (en) * | 2007-01-31 | 2008-07-31 | Brown Jr Thomas W | Call Messaging System |
US20160255200A1 (en) * | 2008-12-18 | 2016-09-01 | At&T Intellectual Property I, L.P. | Personalized interactive voice response system |
US20170178664A1 (en) * | 2014-04-11 | 2017-06-22 | Analog Devices, Inc. | Apparatus, systems and methods for providing cloud based blind source separation services |
US9412394B1 (en) * | 2015-03-09 | 2016-08-09 | Jigen Labs, LLC | Interactive audio communication system |
US20160323326A1 (en) * | 2015-04-29 | 2016-11-03 | Yallo Technologies (Israel) Ltd. | Systems and methods for screening communication sessions |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023030619A1 (en) | 2021-09-01 | 2023-03-09 | Cariad Se | Telephone service device for requesting services, vehicle and method |
Also Published As
Publication number | Publication date |
---|---|
WO2017070323A1 (en) | 2017-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102303810B1 (en) | Handling calls on a shared speech-enabled device | |
US9948772B2 (en) | Configurable phone with interactive voice response engine | |
US9241059B2 (en) | Callee rejection information for rejected voice calls | |
US9319528B2 (en) | Method for announcing a calling party from a communication device | |
JP2008539629A (en) | Call control system and method | |
US10542147B1 (en) | Automated intelligent personal representative | |
US7333803B2 (en) | Network support for voice-to-text memo service | |
MX2011001919A (en) | Method and system for scheduling phone call using sms. | |
US8433041B2 (en) | Method and system to enable touch-free incoming call handling and touch-free outgoing call origination | |
US20130128881A1 (en) | Method and System of Voice Carry Over for Instant Messaging Relay Services | |
GB2578121A (en) | System and method for hands-free advanced control of real-time data stream interactions | |
CN112887194A (en) | Interactive method, device, terminal and storage medium for realizing communication of hearing-impaired people | |
US20170118344A1 (en) | Attentive assistant | |
CN106161716A (en) | A kind of method of audio call, device and server | |
US10827068B1 (en) | Method and apparatus of processing caller responses | |
JP5265587B2 (en) | Call device and call method | |
WO2007033459A1 (en) | Method and system to enable touch-free incoming call handling and touch-free outgoing call origination | |
JP2006186893A (en) | Voice conversation control apparatus | |
JP2024530891A (en) | Method and device for making a call based on analysis of call connection tones | |
US9386157B2 (en) | Method and system for telecommunication | |
JP2020061703A (en) | Call support device | |
EP1534034A1 (en) | Communication system and method for informing of a delay of a call establishment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SEMANTIC MACHINES, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COHEN, JORDAN RIAN;ROTH, DANIEL LAWRENCE;WRIGHT HALL, DAVID LEO;AND OTHERS;SIGNING DATES FROM 20170302 TO 20170404;REEL/FRAME:042077/0578 |
|
AS | Assignment |
Owner name: SEMANTIC MACHINES, INC., MASSACHUSETTS Free format text: CONFIRMATORY ASSIGNMENT;ASSIGNORS:COHEN, JORDAN RIAN;ROTH, DANIEL LAWRENCE;HALL, DAVID LEO WRIGHT;AND OTHERS;SIGNING DATES FROM 20141114 TO 20181012;REEL/FRAME:048012/0430 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SEMANTIC MACHINES, INC.;REEL/FRAME:053904/0601 Effective date: 20200626 |