JP7318064B2

JP7318064B2 - Adaptive management of casting requests and/or user input in rechargeable devices

Info

Publication number: JP7318064B2
Application number: JP2022084509A
Authority: JP
Inventors: アンドレイ・パスコヴィッチ; ヴィクター・リン; ジアンハイ・ジュ; ポール・ギュギ; シュロミ・レゲフ
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2019-04-01
Filing date: 2022-05-24
Publication date: 2023-07-31
Anticipated expiration: 2039-04-01
Also published as: JP2023145561A; JP2022116145A

Description

ポータブル電子デバイスに組み込まれたバッテリの充電容量の制限は、特に、そのようなデバイスの各サブシステムが機能するために少なくともある量のエネルギーに依拠することを考えると、そのようなポータブル電子デバイスの有用性に影響を与え得る。さらに、バッテリ給電式デバイスが自動化されたアシスタントへのアクセスを提供するとき、デバイスが自動化されたアシスタントを呼び出す呼び出し信号の検出のためにオーディオデータおよび/またはその他のデータを常に処理するタスクを課せられる場合、エネルギーリソースはさらに限られ得る。バッテリ給電式アシスタントデバイスがデバイスシステムオンチップ(SoC)を含む場合、デバイスSoCは、ユーザが自動化されたアシスタントとインタラクションしているとき、その他のサブシステム(たとえば、ネットワークプロセッサ、デジタル信号プロセッサ(DSP)など)に比べてエネルギーのかなりの割合を消費し得る。たとえば、かなりのバッテリ充電が、プロセッサがエコー、空電(static)、および/またはその他の雑音などの様々なオーディオデータのアーティファクトを除去することを含み得る音声処理を実行することに費やされ得る。 The limited charge capacity of batteries incorporated in portable electronic devices is a challenge for such portable electronic devices, especially given that each subsystem of such devices relies on at least some amount of energy to function. May affect usability. Additionally, when a battery-powered device provides access to an automated assistant, the device is tasked with constantly processing audio and/or other data for detection of ringing signals that invoke the automated assistant. In that case, energy resources may be even more limited. If the battery-powered assistant device contains a device system-on-chip (SoC), the device SoC will be responsible for other subsystems (e.g., network processor, digital signal processor (DSP), etc.) when the user is interacting with the automated assistant. , etc.). For example, a significant amount of battery charge may be spent by the processor performing audio processing, which may include removing various audio data artifacts such as echo, static, and/or other noise. .

別のデバイスにおいてレンダリングするためのあるデバイスからのコンテンツの供給またはストリーミングは、「キャスティング」と呼ばれることがある。キャスティング要求に応答することができるバッテリ給電式ポータブル電子デバイスは、ローカルネットワークデバイスからのキャスティング要求を絶えず処理するために動作しているとき、かなりの量のバッテリ充電を費やし得る。たとえば、キャスティングに関連するpingおよび/またはバッテリ給電式デバイスへのメディアの「キャスティング」の要求を受け付けるバッテリ給電式デバイスは、到着する要求によって包含されるデータを処理するためにデバイスSoCを使用し得る。しかし、そのような要求がより頻繁および/または冗長になるとき、要求を処理するためにデバイスSoCを使用することは、バッテリ給電式デバイスの充電容量の制限を厳しくし得る。結果として、バッテリ給電式デバイスはキャスティングされたデータをまだレンダリングすることができるが、デバイスSoCがキャスティングに関連する要求を処理することをどれくらいの頻度で求められるかの結果として、利用可能なキャスティング時間の総量が減らされる。 Serving or streaming content from one device for rendering on another device is sometimes referred to as "casting." A battery-powered portable electronic device capable of responding to casting requests can expend a significant amount of battery charge when operating to constantly process casting requests from local network devices. For example, a battery-powered device that accepts casting-related pings and/or requests to "cast" media to the battery-powered device may use the device SoC to process the data contained by the incoming request. . However, when such requests become more frequent and/or redundant, using the device SoC to process the requests can severely limit the charging capacity of the battery-powered device. As a result, battery-powered devices are still able to render casted data, but the available casting time is limited as a result of how often the device SoC is asked to handle casting-related requests. is reduced.

本明細書において説明される実装は、別個のコンピューティングデバイスによって提供されたキャスティングされたデータをレンダリングするための自動化されたアシスタントおよび/または1つもしくは複数のインターフェースへのアクセスを提供しながらキャスト要求および/またはユーザ入力を適応的に管理する充電式デバイスに関する。充電式デバイスは、概してキャスト要求およびユーザ入力(たとえば、呼び出しフレーズなどの発話)を頻繁に処理するために充電式デバイスを動作させることによって枯渇させられ得るバッテリなどの有限の電源を有することによって制限され得る。充電と充電との間の時間を延ばし、また、その他の計算リソースの浪費をなくすために、充電式デバイスは、そのような要求および入力を管理するように適応される様々な異なるサブシステム動作方式を使用し得る。 The implementations described herein provide access to an automated assistant and/or one or more interfaces for rendering casted data provided by a separate computing device while processing a cast request. and/or to rechargeable devices that adaptively manage user input. Rechargeable devices are generally limited by having a finite power source, such as a battery, that can be depleted by operating the rechargeable device to frequently process cast requests and user input (e.g., speech such as call phrases). can be In order to extend the time between charges and otherwise waste computational resources, rechargeable devices employ a variety of different subsystem operating schemes adapted to manage such requests and inputs. can be used.

たとえば、一部の実装において、充電式デバイスは、デジタル信号プロセッサ(DSP)などの第1のプロセッサと、充電式デバイスの動作モードに応じて様々な入力を処理するためのデバイスシステムオンチップ(SoC)などの第2のプロセッサとを含み得る。動作モードは、デバイスSoCが電源を落とされるか、またはそうではなくデバイスSoCが別の動作モード(たとえば、自動化されたアシスタントが充電式デバイスを介してユーザとアクティブにインタラクションすることができる動作モード)によって動作しているとした場合よりも少ない電力を消費しているスリープモードなどの複数の動作モードのうちの1つであることが可能である。充電式デバイスがスリープモードで動作している間、DSPは、充電式デバイスへのユーザ入力をユーザからの許可の下で監視するために電源をオンにされ得る。例として、充電式デバイスは、1つまたは複数のマイクロフォンを含むことが可能であり、充電式デバイスがスリープモードで動作しているとき、DSPは、マイクロフォンのうちの1つまたは複数によって提供される任意の出力(たとえば、ユーザからマイクロフォンへの発話を特徴付ける出力)を監視することができる。DSPは、ユーザが自動化されたアシスタントを呼び出すための呼び出しフレーズ(たとえば、「Assistant...」)に対応する発話を与えたかどうかを判定するために音声認識モデル(たとえば、呼び出しフレーズモデル)を動作させ得る。DSPが、音声認識モデルを使用して、ユーザが自動化されたアシスタントを呼び出すための呼び出しフレーズを与えたと判定するとき、DSPは、さらなる処理のためにデバイスSoCに初期化をさせることができる。たとえば、デバイスSoCは、ユーザからのさらなる命令および/または入力を待つための「ウェイク時間(wake time)」の特定の期間に関して初期化をし得る。 For example, in some implementations, a rechargeable device includes a first processor, such as a digital signal processor (DSP), and a device system-on-chip (SoC) for handling various inputs depending on the operating mode of the rechargeable device. ) and a second processor such as The mode of operation may be that the device SoC is powered down, or otherwise the device SoC is in another mode of operation (e.g., a mode of operation in which an automated assistant can actively interact with the user via the rechargeable device). It can be in one of several modes of operation, such as a sleep mode, that consumes less power than if it were operating in full. While the rechargeable device is operating in sleep mode, the DSP may be powered on to monitor user input to the rechargeable device with permission from the user. As an example, a rechargeable device may include one or more microphones, and DSP is provided by one or more of the microphones when the rechargeable device is operating in sleep mode. Any output (eg, output characterizing speech from the user to the microphone) can be monitored. The DSP runs a speech recognition model (e.g., a calling phrase model) to determine whether the user has given an utterance corresponding to a calling phrase (e.g., "Assistant...") to invoke an automated assistant. can let When the DSP, using the speech recognition model, determines that the user has given a call phrase to summon the automated assistant, the DSP can cause the device SoC to initialize for further processing. For example, the device SoC may initialize with respect to a certain period of "wake time" to wait for further instructions and/or input from the user.

デバイスSoCがアクティブなままでいる時間の量は、1人または複数のユーザと自動化されたアシスタントとの間のインタラクションに関連する様々な特徴に応じて経時的に変わり得る。時間の量は、計算リソースおよび電力の無駄を低減するために適応および/または決定されることが可能であり、それらの計算リソースおよび電力は、そうでなければ、デバイスSoCにおいて音声認識モデルを動作させることに費やされる可能性がある。たとえば、デバイスSoCは、DSPによって動作させられる音声認識モデル(たとえば、第2の呼び出しフレーズモデルおよび/またはその他の音声活動検出器(voice activity detector))と異なり、DSPによって動作させられる音声認識モデルに比べてより多くの計算リソースおよび/または電力を必要とする可能性がある別の音声認識モデル(たとえば、第1の呼び出しフレーズモデルおよび/または音声活動検出器)を動作させ得る。したがって、デバイスSoCの「ウェイク時間」の量を適応させることによって、効果的な方法で充電式デバイスが自動化されたアシスタントの機能を提供することができることを引き続き保証しながら、バッテリ電力の不必要な消費が防止され得る。 The amount of time a device SoC remains active may change over time depending on various characteristics associated with interactions between one or more users and the automated assistant. The amount of time can be adapted and/or determined to reduce waste of computational resources and power that would otherwise run the speech recognition model in the device SoC. It may be spent on letting For example, the device SoC may support a speech recognition model operated by a DSP as opposed to a speech recognition model operated by a DSP (e.g., a second call phrase model and/or other voice activity detector). A separate speech recognition model (eg, the first call phrase model and/or the voice activity detector) may be operated that may require more computational resources and/or power than the others. Therefore, by adapting the amount of "wake time" of the device SoC, unnecessary battery power consumption is maintained while still ensuring that the rechargeable device is able to provide automated assistant functionality in an effective manner. Consumption can be prevented.

一部の実装において、デバイスSoCによって動作させられる別の音声認識モデルは、ユーザが最初の発話を与えることに応じて生成されるオーディオデータなどのDSPによって提供されるデータに基づいてユーザの声の特徴を決定し得る。決定された声の特徴に基づいて、デバイスSoCは、デバイスSoCがユーザからの任意の後続の入力の処理を進めるために動作可能なままでいるウェイク時間を選択し得る。例として、第1のユーザが、呼び出しフレーズ(たとえば、「Assistant...」)を与えることとコマンドフレーズ(たとえば、「...play my study playlist.」)を与えることとの間に概しておよび/または平均して数秒(たとえば、3秒)遅延し得る。デバイスSoCは、この遅延を認め、ユーザの平均遅延を大きく超えないデバイスSoCのウェイク時間を選択し得る。たとえば、デバイスSoCの選択されるウェイク時間は、(ウェイク時間=(ユーザの決定された平均遅延) x (1 + N))であることが可能であるがこれに限定されず、式中、「N」は、0.2、0.5などであるがこれらに限定されない任意の数である。呼び出しフレーズを与えることとコマンドフレーズを与えることとの間に概しておよび/または平均して数秒(たとえば、2秒)遅延する異なるユーザのために、同じまたは異なるウェイク時間が選択され得る。このようにして、DSPおよびデバイスSoCを含む充電式デバイスが、電力および/または計算リソースを浪費することなく応答性を保証するためにユーザ毎に「ウェイク時間」を適応的に管理し得る。 In some implementations, a separate speech recognition model run by the device SoC is a speech recognition model of the user's voice based on data provided by the DSP, such as audio data generated in response to the user giving an initial utterance. characteristics can be determined. Based on the determined voice characteristics, the device SoC may select wake times during which the device SoC remains operational to proceed with processing any subsequent input from the user. By way of example, the first user generally and /or may be delayed by a few seconds (eg, 3 seconds) on average. The device SoC may allow for this delay and choose a wake time for the device SoC that does not significantly exceed the user's average delay. For example, the selected wake time for the device SoC can be, but is not limited to, (wake time = (user determined average delay) x (1 + N)), where: N" is any number such as, but not limited to, 0.2, 0.5. The same or different wake times may be selected for different users who typically and/or on average have a delay of several seconds (eg, 2 seconds) between giving a call phrase and giving a command phrase. In this way, rechargeable devices, including DSPs and device SoCs, can adaptively manage "wake times" per user to ensure responsiveness without wasting power and/or computational resources.

一部の実装において、DSPによって動作させられる音声認識モデルおよびデバイスSoCによって動作させられる別の音声認識モデルは、両方とも、ユーザが自動化されたアシスタントを呼び出すための特定の呼び出しフレーズを与えたかどうかを判定するために使用され得る。しかし、DSPによって動作させられる音声認識モデルは、デバイスSoCによって動作させられる別の音声認識モデルによって施行される標準よりも厳格でない、ユーザが呼び出しフレーズを与えたかどうかを判定するべき標準を適用し得る。言い換えると、音声認識モデルは、特定の発話が呼び出しフレーズに対応するかどうかを判定するための第1の正確性の閾値に関連付けられることが可能であり、別の音声認識モデルは、特定の発話が呼び出しフレーズに対応するかどうかを判定するための第2の正確性の閾値に関連付けられることが可能である。例として、第1の正確性の閾値は、ユーザが呼び出しフレーズの少なくとも一部を含むがいくらかの量の背景雑音も含むと判定される特定の発話を与えるときに満たされ得る。しかし、特定の発話は、第2の正確性の閾値が--少なくとも第1の正確性の閾値を満たすための相関の度合いに比べて--特定の発話と呼び出しフレーズとの間のより高い度合いの相関を必要とする可能性があるので、第2の正確性の閾値を満たさない可能性がある。 In some implementations, a speech recognition model operated by the DSP and another speech recognition model operated by the device SoC both determine whether the user has given a specific call phrase to invoke the automated assistant. can be used to determine However, a speech recognition model run by a DSP may apply less stringent standards for determining whether a user has given a call phrase than the standards enforced by another speech recognition model run by the device SoC. . In other words, a speech recognition model can be associated with a first accuracy threshold for determining whether a particular utterance corresponds to a calling phrase; corresponds to the call phrase can be associated with a second accuracy threshold. By way of example, a first accuracy threshold may be met when a user gives a particular utterance that is determined to contain at least part of a call phrase but also some amount of background noise. Certain utterances, however, have a higher degree of correlation between the particular utterance and the call-phrase than the second accuracy threshold -- at least as much as the degree of correlation to meet the first accuracy threshold. , may not meet the second accuracy threshold.

一部の実装において、DSPは、デバイスSoCが別の音声認識モデルによって使用するのに比べてより少ない電力、より少ないデータ、オーディオのより少ないチャネル、より低いサンプリングレートのオーディオ、および/またはより低い品質のオーディオを使用して音声認識モデルを動作させ得る。たとえば、DSPは、ユーザが充電式デバイスに発話を与えるとき、オーディオデータの単一のチャネルを受け取ることが可能であり、デバイスSoCは、ユーザが充電式デバイスに発話を与えるとき、オーディオデータの複数のチャネルを受け取ることが可能である。追加的にまたは代替的に、DSPは、音声認識モデルを使用しているとき、平均的な量の電力を使用して動作することが可能であり、デバイスSoCは、別の音声認識モデルを使用しているとき、平均的な量の電力よりも多い電力を使用して動作することが可能である。 In some implementations, the DSP uses less power, less data, fewer channels of audio, lower sampling rate audio, and/or lower than the device SoC uses by another speech recognition model. Quality audio can be used to run speech recognition models. For example, the DSP may receive a single channel of audio data when the user speaks to the rechargeable device, and the device SoC may receive multiple channels of audio data when the user speaks to the rechargeable device. channels can be received. Additionally or alternatively, the DSP can operate using an average amount of power when using the speech recognition model and the device SoC uses a different speech recognition model. When powered, it is possible to operate using more than the average amount of power.

一部の実装において、DSPおよび/またはデバイスSoCによって決定された相関の度合いは、デバイスSoCがユーザからのさらなる入力を処理するためにアクティブなままでいるウェイク時間の量を選択するために使用され得る。たとえば、デバイスSoCが発話と呼び出しフレーズとの間の相関の第1の度合いを決定するとき、第1のウェイク時間が、デバイスSoCによって選択され得る。しかし、デバイスSoCが別の発話と呼び出しフレーズとの間の相関の第2の度合いを決定し、相関の第2の度合いが相関の第1の度合いよりも大きいとき、デバイスのSoCは、第1のウェイク時間よりも長い第2のウェイク時間を選択し得る。このようにして、デバイスSoCがユーザからのさらなる入力を期待して待つためにアクティブなままでいる時間の量が、ユーザからの発話と自動化されたアシスタントを呼び出すために使用される呼び出しフレーズとの間の正確性および/または相関の度合いに応じて適応され得る。これは、ユーザ入力のいかなる内容および/またはコンテキストも区別しない標準的な「ウェイク時間」を避けることによって充電式デバイスにおいて計算リソースを節約することができる。 In some implementations, the degree of correlation determined by the DSP and/or device SoC is used to select the amount of wake time that the device SoC remains active to process further input from the user. obtain. For example, a first wake time may be selected by the device SoC when the device SoC determines a first degree of correlation between the utterance and the call phrase. However, when the device SoC determines a second degree of correlation between another utterance and the calling phrase, and the second degree of correlation is greater than the first degree of correlation, the device SoC determines the first may select a second wake time that is longer than the wake time of . In this way, the amount of time the device SoC remains active waiting in anticipation of further input from the user is determined by the interaction between the utterance from the user and the call phrase used to invoke the automated assistant. It can be adapted according to the degree of accuracy and/or correlation between. This can save computational resources in rechargeable devices by avoiding standard "wake times" that do not distinguish between any content and/or context of user input.

これは、特に、ユーザからの発話と呼び出しフレーズとの間の相関が自動化されたアシスタントを呼び出すために必要とされる相関に達しない場合に特に有益である可能性がある。これは、「惜しい失敗(near miss)」(つまり、アシスタントを呼び出すのにほとんど十分であるが完全に十分ではない相関)が(アシスタントを呼び出すために必要とされる相関とはほど遠い相関を有する発話よりも)自動化されたアシスタントを呼び出す実際の試みから生じた可能性がより高く、したがって、ユーザがアシスタントを呼び出そうと再び試みることが後に続く可能性がより高いからである。「惜しい失敗」が検出されるときにSoCをより長くアクティブのままにしておくことは、デバイスがより短いレイテンシーで後続の呼び出しの試みを処理することを可能にする可能性がある。言い換えると、デバイスSoCは、発話が特定の度合いだけ呼び出しフレーズと相関するのに満たないと判定することができ、したがって、デバイスSoCが特定の度合いに基づいてある量の時間(たとえば、特定の度合いに基づいておよび/または比例して選択される秒を単位とするある量の時間)そのままの状態であり続けることができる。しかし、デバイスSoCが別の発話が呼び出しフレーズと相関するのにより一層満たない(たとえば、より一層大きな度合いだけ満たない)と判定するとき、デバイスSoCは、電力および計算リソースを節約するためにずっと迅速にシャットダウンすることができる。 This can be particularly beneficial if the correlation between the utterances from the user and the calling phrase does not reach the correlation required to call the automated assistant. This means that a "near miss" (i.e., an utterance with a correlation that is almost but not quite enough to invoke the assistant) is far from the correlation required to invoke the assistant. This is because it is more likely to have arisen from an actual attempt to invoke an automated assistant than it is) and thus more likely to be followed by the user attempting to invoke the assistant again. Keeping the SoC active longer when a "sad failure" is detected may allow the device to process subsequent call attempts with shorter latencies. In other words, the device SoC may determine that the utterance is less than correlated with the call phrase by a certain degree, and thus the device SoC may determine that the utterance is less than A certain amount of time in seconds that is selected based on and/or proportional to the time). However, when the device SoC determines that another utterance is less (e.g., less than a greater degree) correlated with the call phrase, the device SoC is much faster to save power and computational resources. can be shut down to

発話が充電式デバイスによって検出され、デバイスSoCがさらなる処理のために初期化されるとき、DSPにおいて動作しているクロックおよびデバイスSoCにおいて動作している別のクロックのクロック設定に違いが存在し得る。充電式デバイスにおいて受け取られる発話を期待して待つことおよび/またはそのような発話に応答することに含まれる計算リソースの浪費をさらになくすために、デバイスSoCがDSPにおいて生成され、タイムスタンプを付けられたオーディオデータを処理するために、時間同期がDSPおよび/またはデバイスSoCにおいて実行され得る。そのような時間同期は、たとえば、発話が受け取られるときにSoCがオーディオを出力しているとき、特に有用である可能性がある。確かに、時間同期をしないと、発話に対応するデータからSoCによって出力されたオーディオを削除するためにキャプチャされたオーディオデータを処理することは、問題となり得る。 When speech is detected by the rechargeable device and the device SoC is initialized for further processing, there may be differences in the clock settings of the clock running in the DSP and another clock running in the device SoC. . To further eliminate the waste of computational resources involved in waiting and/or responding to speech to be received at the rechargeable device, the device SoC is generated and timestamped at the DSP. Time synchronization may be performed in the DSP and/or device SoC to process the audio data. Such time synchronization can be particularly useful, for example, when the SoC is outputting audio when speech is received. Indeed, without time synchronization, processing captured audio data to remove audio output by the SoC from data corresponding to speech can be problematic.

一部の実装において、時間同期は、デバイスSoCにおいて生成された1つまたは複数のタイムスタンプおよびDSPにおいて生成された1つまたは複数のその他のタイムスタンプを使用してデバイスSoCによって実行され得る。例として、DSPは、第1のクロックを使用してDSPに関連するローカル時間に対応する第1のタイムスタンプを生成し得る。さらに、DSPは、たとえば、ユーザが呼び出しフレーズを与えたとDSPが判定することに応じてDSPがデバイスSoCに初期化をさせるときに第2のタイムスタンプを生成し得る。DSPから信号(たとえば、ウェイクおよび/または割り込みコマンド(wake and/or interrupt command))を受信すると、デバイスSoCは、第2のクロックを使用して第3のタイムスタンプを生成することができ、第3のタイムスタンプは、デバイスSoCに関連するローカル時間に対応し得る。 In some implementations, time synchronization may be performed by the device SoC using one or more timestamps generated at the device SoC and one or more other timestamps generated at the DSP. As an example, a DSP may use a first clock to generate a first timestamp corresponding to a local time associated with the DSP. Additionally, the DSP may generate a second timestamp, for example, when the DSP causes the device SoC to initialize in response to the DSP determining that the user provided the call phrase. Upon receiving a signal (e.g., wake and/or interrupt command) from the DSP, the device SoC can generate a third timestamp using the second clock, The 3 timestamps may correspond to local time relative to the device SoC.

時間同期を実行するために、デバイスSoCは、第1のタイムスタンプ、第2のタイムスタンプ、および第3のタイムスタンプを使用して時間オフセットを生成し、その後、DSPにおいて生成されたオーディオデータを処理しているときに時間オフセットを使用することができる。一部の実装において、デバイスSoCは、第1のタイムスタンプおよび第2のタイムスタンプの平均値を決定し、それから、平均値と第3のタイムスタンプとの間の差に対応する差分(delta)値を決定することができる。差分値は、その後、デバイスSoCがエコー除去を実行しているときなど、オーディオデータを処理しているときに使用され得る。エコー除去中に、デバイスSoCは、充電式デバイスによって出力されているオーディオのインスタンスをマイクロフォンによって記録されたオーディオから削除するために差分値を使用することができる。例として、デバイスSoCが音楽再生に対応するオーディオ出力を生成しており、ユーザが音楽再生中にマイクロフォンに発話を与えるとき、発話を特徴付けるオーディオデータが、音楽再生のインスタンスを削除するためにデバイスSoCによって処理され得る。さらに、オーディオデータから音楽再生のインスタンスを削除するこのプロセスは、デバイスSoCおよび/またはDSPによって決定された差分値を使用して正確に実行されることが可能であり、それによって、デバイスSoCの「ウェイク時間」が正確なデータから決定されることを可能にする。言い換えると、DSPによって生成されたタイムスタンプが、エコー除去などの特定のオーディオプロセスを実行する目的で、デバイスSoCによって生成されたタイムスタンプと相関するように変換され得る。追加的にまたは代替的に、デバイスSoCによって生成されたタイムスタンプは、それらのオーディオプロセスを実行する目的で、DSPによって生成されたタイムスタンプと相関するように変換され得る。 To perform time synchronization, the device SoC uses the first time stamp, the second time stamp, and the third time stamp to generate a time offset, and then converts the audio data generated in the DSP to A time offset can be used when processing. In some implementations, the device SoC determines an average value of the first timestamp and the second timestamp, then a delta corresponding to the difference between the average value and the third timestamp. value can be determined. The difference value may then be used when the device SoC is processing audio data, such as when performing echo cancellation. During echo cancellation, the device SoC can use the difference value to remove instances of audio being output by the rechargeable device from audio recorded by the microphone. As an example, when the device SoC is generating audio output corresponding to music playback, and the user speaks into the microphone while the music is playing, the audio data characterizing the utterances is transferred to the device SoC to remove instances of music playback. can be processed by Furthermore, this process of removing instances of music playback from the audio data can be accurately performed using the difference values determined by the device SoC and/or DSP, thereby allowing the device SoC's " Allows the "wake time" to be determined from accurate data. In other words, timestamps generated by the DSP may be transformed to correlate with timestamps generated by the device SoC for the purpose of performing certain audio processes such as echo cancellation. Additionally or alternatively, timestamps generated by the device SoC may be transformed to correlate with timestamps generated by the DSP for purposes of executing their audio processes.

一部の実装において、充電式デバイスは、別のコンピューティングデバイスからのキャスティング要求に応じてオーディオ、ビジュアル、触覚、および/または任意のその他の種類の出力をレンダリングすべき1つまたは複数のインターフェースを含み得る。しかし、そのようなキャスティング要求がセル電話および/またはラップトップコンピュータなどのその他の充電式デバイスによって与えられる可能性があるが、キャスティング要求を与えるコンピューティングデバイスは、充電式デバイスにおける利用可能な電力を考慮せずにそのような要求を与える可能性がある。再充電可能な電力の浪費をやはりなくしながら頻繁なキャスト要求を処理するために、充電式デバイスは、デバイスSoCではない充電式デバイスのサブシステムに特定の要求の処理をオフロードすることができる。たとえば、充電式デバイスのWiFiチップが、充電式デバイスおよびキャスティングデバイスが接続されるローカルエリアネットワーク(LAN)上で受信された特定の要求を処理することを任され得る。一部の実装において、WiFiチップは、電力および計算リソースの浪費をなくすために、デバイスSoCがスリープモードのままである間、特定のキャスト要求を処理し得る。さらなる処理のためにデバイスSoCを呼び出すことのない、処理のためにWiFiチップに任せられた要求は、1つまたは複数の特定のポートを指定するキャスティング要求であることが可能である。追加的にまたは代替的に、WiFiチップは、デバイスSoCを呼び出すことなくmDNSのブロードキャストされたデータを処理することを任され得る。 In some implementations, the rechargeable device has one or more interfaces that should render audio, visual, haptic, and/or any other kind of output in response to casting requests from another computing device. can contain. However, while such casting requests may be provided by cell phones and/or other rechargeable devices such as laptop computers, the computing device providing the casting request may use the available power in the rechargeable device. It is possible to give such requests without consideration. To handle frequent cast requests while still eliminating wasted rechargeable power, the rechargeable device can offload the processing of certain requests to subsystems of the rechargeable device that are not the device SoC. For example, the WiFi chip of the rechargeable device may be tasked with handling certain requests received over the local area network (LAN) to which the rechargeable device and casting device are connected. In some implementations, the WiFi chip may process certain cast requests while the device SoC remains in sleep mode to avoid wasting power and computational resources. Requests left to the WiFi chip for processing that do not invoke the device SoC for further processing can be casting requests that specify one or more specific ports. Additionally or alternatively, the WiFi chip may be tasked with processing mDNS broadcast data without invoking the device SoC.

例として、ユーザは、音楽をストリーミングするためにそのユーザのセルラデバイスにおいて音楽アプリケーションを操作することができ、音楽の再生中に、ユーザは、充電式デバイスへの音楽のキャスティングを初期化することができる。セルラデバイスは、充電式デバイスのあるLANに接続される様々な異なるデバイスに、mDNSのブロードキャストされたデータを含み得るキャスティング要求を送信することができる。充電式デバイスは、充電式デバイスがスリープモードによって動作しているときにキャスティング要求を受信することができ、スリープモードにおいては、デバイスSoCが、スリープしているか、オフであるか、またはそうでなければ充電式デバイスがスリープモードによって動作していないとした場合に比べてより低電力のモードである。充電式デバイスのWiFiチップは、最初に、キャスティング要求が特定のポートを指定するおよび/または特定のプロパティを含むかどうかを判定するためにキャスティング要求を処理することができる。 As an example, a user may operate a music application on their cellular device to stream music, and while music is playing, the user may initiate casting of music to the rechargeable device. can. A cellular device can send casting requests that may include mDNS broadcast data to a variety of different devices that are connected to a LAN with rechargeable devices. The rechargeable device can receive casting requests when the rechargeable device is operating through sleep mode, in which the device SoC is asleep, off, or otherwise. It is a lower power mode than if the rechargeable device were not working due to sleep mode. The WiFi chip of the rechargeable device may first process the casting request to determine if the casting request specifies a particular port and/or contains particular properties.

キャスティング要求が1つまたは複数の所定のポートに対応する特定のポートを指定するとき、WiFiチップは、キャスティング要求に応答するためにデバイスSoCを呼び出すことを回避し得る。より正確に言えば、WiFiチップは、LAN上でセルラデバイスに送り返す応答データを生成するためにWiFiチップのメモリに記憶されたキャッシュされたデータに頼ることができる。追加的にまたは代替的に、WiFiチップは、キャスティング要求とともに含まれるmDNSのブロードキャストされたデータがキャスティング要求の特定のパラメータを指定する場合、デバイスSoCを呼び出すことを回避し得る。たとえば、セルラデバイスによって提供されたmDNSのブロードキャストされたデータは、オーディオ再生サービスが要求されていることおよび/または特定のアプリケーションがキャスティング要求を初期化したことを示し得る。WiFiチップのキャッシュされたデータは、1つまたは複数のその他のデバイスとの以前のインタラクションに基づいて、充電式デバイスがオーディオ再生サービスおよび/または特定のアプリケーションをサポートすることを示し得る。したがって、利用可能なキャッシュされたデータに基づいて、WiFiチップは、さらなる情報のためにデバイスSoCを呼び出すことなく、キャッシュされたデータを使用してセルラデバイスへの応答を生成し得る。このようにして、充電式デバイスは、それ以外の方法ではデバイスSoCが処理のために初期化される場合の数を減らすことができ、それによって、充電式の電源(たとえば、1つまたは複数のバッテリおよび/または蓄電器)ならびに計算リソースの無駄をなくす。 The WiFi chip may avoid calling the device SoC to respond to the casting request when the casting request specifies specific ports that correspond to one or more predetermined ports. Rather, the WiFi chip can rely on cached data stored in the WiFi chip's memory to generate response data to send back to the cellular device over the LAN. Additionally or alternatively, the WiFi chip may avoid calling the device SoC if the mDNS broadcast data included with the casting request specifies certain parameters of the casting request. For example, mDNS broadcast data provided by a cellular device may indicate that an audio playback service is being requested and/or that a particular application has initiated a casting request. The WiFi chip's cached data may indicate that the rechargeable device supports audio playback services and/or specific applications based on previous interactions with one or more other devices. Therefore, based on the available cached data, the WiFi chip may use the cached data to generate a response to the cellular device without calling the device SoC for further information. In this way, the rechargeable device can reduce the number of times the device SoC would otherwise be initialized for processing, thereby allowing the rechargeable power source (e.g., one or more batteries and/or capacitors) and waste of computational resources.

上記説明は、本開示の一部の実装の概要として与えられた。それらの実装およびその他の実装のさらなる説明が、下により詳細に示される。 The above description is provided as an overview of some implementations of the disclosure. Further discussion of those and other implementations are presented in more detail below.

その他の実装は、上におよび/または本明細書の他の箇所に記載の方法のうちの1つまたは複数などの方法を実行するために1つまたは複数のプロセッサ(たとえば、中央演算処理装置(CPU)、グラフィックス処理ユニット(GPU)、および/またはテンソル処理ユニット(TPU: tensor processing unit))によって実行可能な命令を記憶する非一時的コンピュータ可読ストレージ媒体を含む可能性がある。さらにその他の実装は、上におよび/または本明細書の他の箇所に記載の方法のうちの1つまたは複数などの方法を実行するために記憶された命令を実行するように動作可能な1つまたは複数のプロセッサを含む1つもしくは複数のコンピュータおよび/または1つもしくは複数のロボットのシステムを含む可能性がある。 Other implementations employ one or more processors (e.g., central processing units) to perform methods such as one or more of the methods described above and/or elsewhere herein. CPUs), graphics processing units (GPUs), and/or tensor processing units (TPUs)). Still other implementations are operable to execute stored instructions to perform a method such as one or more of the methods described above and/or elsewhere herein. It may include a system of one or more computers and/or one or more robots including one or more processors.

上述の概念および本明細書においてより詳細に説明される追加的な概念のすべての組合せは、本明細書において開示される対象の一部であると考えられることを理解されたい。たとえば、本開示の最後に現れる特許請求の対象のすべての組合せは、本明細書において開示される対象の一部であると考えられる。 It should be understood that all combinations of the above concepts and the additional concepts described in more detail herein are considered part of the subject matter disclosed herein. For example, all combinations of claimed subject matter appearing at the end of this disclosure are considered part of the subject matter disclosed herein.

ローカルネットワーク上のすべてのクライアントデバイスをそれらのクライアントデバイスのスリープ状態から遷移させることなくメディアを別個のクライアントデバイスにブロードキャストするようにユーザが第1のクライアントデバイスを制御する図である。FIG. 10 illustrates a user controlling a first client device to broadcast media to separate client devices without causing all client devices on the local network to transition out of their sleep state. ローカルネットワーク上のすべてのクライアントデバイスをそれらのクライアントデバイスのスリープ状態から遷移させることなくメディアを別個のクライアントデバイスにブロードキャストするようにユーザが第1のクライアントデバイスを制御する図である。FIG. 2B illustrates a user controlling a first client device to broadcast media to separate client devices without waking all client devices on the local network from their sleep states. 本明細書において検討される実装による、デバイスSoCがスリープモードのままにされる筋書きの図である。FIG. 2 is a diagram of a scenario in which a device SoC is left in sleep mode according to implementations discussed herein; 本明細書において検討される実装による、デバイスSoCがスリープモードから遷移させられる筋書きの図である。FIG. 2 is a diagram of a scenario in which a device SoC is transitioned out of sleep mode according to implementations discussed herein; 本明細書において検討される実装による、デバイスSoCがスリープモードから遷移させられる筋書きの図である。FIG. 2 is a diagram of a scenario in which a device SoC is transitioned out of sleep mode according to implementations discussed herein; ユーザとクライアントデバイスとの間のインタラクションに関連する1つまたは複数のプロパティに基づいてデバイスSoCのウェイク時間を決定することができるクライアントデバイスの図である。1 is a diagram of a client device capable of determining wake times for the device SoC based on one or more properties associated with interactions between a user and the client device; FIG. デバイスSoCのウェイク時間を生成し、DSPのクロックとデバイスSoCのクロックとの間の時間オフセットを生成し、および/またはデバイスSoCをスリープモードから遷移させることなくキャスティング要求に応答するためにWiFiチップを使用するためのシステムを示す図である。Generate a wake time for the device SoC, generate a time offset between the DSP's clock and the device SoC's clock, and/or use the WiFi chip to respond to casting requests without bringing the device SoC out of sleep mode. Figure 1 shows a system for use; ユーザとコンピューティングデバイスとの間のインタラクションの1つまたは複数の特徴に基づいて選択される時間の量のためにコンピューティングデバイスの特定のプロセッサを初期化するための方法を示す図である。1 illustrates a method for initializing a particular processor of a computing device for a selected amount of time based on one or more characteristics of interaction between the user and the computing device; FIG. 第1のプロセッサと第2のプロセッサとの間の動作の違いに対応する決定された時間オフセットを使用してオーディオデータを処理するための方法を示す図である。FIG. 4 illustrates a method for processing audio data using determined time offsets corresponding to differences in operation between a first processor and a second processor; バッテリ給電式であるコンピューティングデバイスに含まれるWiFiチップを使用してブロードキャストデバイス(broadcasting device)に応答データを提供するための方法を示す図である。FIG. 3 illustrates a method for providing response data to a broadcasting device using a WiFi chip included in a battery-powered computing device; 例示的なコンピュータシステムのブロック図である。1 is a block diagram of an exemplary computer system; FIG.

図1Aは、ローカルネットワーク上のすべてのクライアントデバイスをそれらのクライアントデバイスのスリープ状態から遷移させることなくメディアを別個のクライアントデバイスにブロードキャストするようにユーザ124が第1のクライアントデバイス134を制御する図100を示す。第1のクライアントデバイス134は、メディアを別のデバイスにキャスティングすることができるセルラ電話126および/または任意のその他のデバイスなどのコンピューティングデバイスであることが可能である。第1のクライアントデバイス134は、アシスタントインターフェース138への入力によって呼び出され得る自動化されたアシスタント136へのアクセスを提供することができる。第1のクライアントデバイス134は、第1のクライアントデバイス134から別のクライアントデバイスにキャストされ得るメディアにアクセスすることができる1つまたは複数のアプリケーション144も含むことが可能である。一部の実装において、第1のクライアントデバイス134は、ユーザ124から自動化されたアシスタント136への入力に応じて別個のクライアントデバイスにメディアをキャスティングすることができる。たとえば、ユーザがアプリケーション144に関連するメディアをキャスティングするために利用可能なその他のデバイスを調べることに応じて、第1のクライアントデバイス134は、複数の異なるクライアントデバイスが接続されるローカルエリアネットワーク上でmDNSデータを送信することができる。 FIG. 1A illustrates a diagram 100 of a user 124 controlling a first client device 134 to broadcast media to separate client devices without causing all client devices on the local network to transition out of their sleep state. indicates First client device 134 may be a computing device such as cellular phone 126 and/or any other device capable of casting media to another device. First client device 134 can provide access to automated assistant 136 that can be invoked by input to assistant interface 138 . The first client device 134 can also include one or more applications 144 that can access media that can be cast from the first client device 134 to another client device. In some implementations, first client device 134 can cast media to separate client devices in response to input from user 124 to automated assistant 136 . For example, in response to a user examining other devices available for casting media related to application 144, first client device 134 may be configured to connect to multiple different client devices over a local area network. Can send mDNS data.

第1のクライアントデバイス134によってブロードキャストされるmDNSデータは、WiFiネットワークなどのローカルエリアネットワークを介して第2のクライアントデバイス102および/または第3のクライアントデバイス112に送信され得る。たとえば、mDNSデータ130が、第1のクライアントデバイス134から第2のクライアントデバイス102に送信されることが可能であり、mDNSデータ132が、第1のクライアントデバイス134から第3のクライアントデバイス112に送信されることが可能である。一部の実装において、第2のクライアントデバイス102は、ポータブル電源110によって給電されるポータブルコンピューティングデバイスであることが可能である。さらに、第3のクライアントデバイス112は、ポータブル電源、および/または公益サービスによって供給される電源などの任意のその他の電源によって給電され得る。第2のクライアントデバイス102および第3のクライアントデバイス112は、各デバイスがそれぞれのmDNSデータを受信するとき、スリープモードで動作し得る。言い換えると、デバイスがスリープモードで動作しているので、各デバイスにおいて利用可能なWiFiチップが、それぞれのデバイスをスリープ状態から遷移させることなくmDNSデータを処理し得る。たとえば、WiFiチップ106およびWiFiチップ116がmDNSデータを受信し、mDNSデータに応答するとき、デバイスSoC 108およびデバイスSoC 118は、(グラデーション塗りつぶしパターンによって示されるように)スリープモードで動作し得る。一部の実装において、コンピューティングデバイスは、少なくともコンピューティングデバイスのデバイスSoCが電源を落とされるか、またはそうではなくデバイスSoCが別の動作モードによって動作しているとした場合よりも少ない電力を消費しているとき、「スリープモード」であると考えられ得る。 The mDNS data broadcast by first client device 134 may be sent to second client device 102 and/or third client device 112 over a local area network, such as a WiFi network. For example, mDNS data 130 can be sent from first client device 134 to second client device 102, and mDNS data 132 can be sent from first client device 134 to third client device 112. It is possible to be In some implementations, second client device 102 may be a portable computing device powered by portable power source 110 . Additionally, the third client device 112 may be powered by a portable power source and/or any other power source such as power supplied by a utility service. Second client device 102 and third client device 112 may operate in sleep mode as each device receives its respective mDNS data. In other words, since the devices are operating in sleep mode, the WiFi chips available in each device can process mDNS data without waking the respective device out of sleep. For example, device SoC 108 and device SoC 118 may operate in sleep mode (as indicated by the gradient fill pattern) when WiFi chip 106 and WiFi chip 116 receive and respond to mDNS data. In some implementations, the computing device consumes at least less power than if the device SoC of the computing device were powered down or otherwise the device SoC was operating according to another mode of operation. can be considered to be in "sleep mode".

第2のクライアントデバイス102のWiFiチップ106は、WiFiチップ106のメモリ140内で利用可能なキャッシュされたデータを使用してmDNSデータ130を処理することができる。さらに、第3のクライアントデバイス112のWiFiチップ116は、WiFiチップ116のメモリ142内で利用可能なキャッシュされたデータを使用してmDNSデータ132を処理することができる。第1のクライアントデバイス134によってブロードキャストされるmDNSデータは、ブロードキャストに関連付けられるアプリケーション、ブロードキャストを送信するためのポート、第1のクライアントデバイス134によって要求されているサービス、および/またはコンピューティングデバイスがキャスティングを初期化しているときに指定し得る任意のその他の特徴を特定することができる。 WiFi chip 106 of second client device 102 may process mDNS data 130 using cached data available in memory 140 of WiFi chip 106 . Additionally, the WiFi chip 116 of the third client device 112 can process the mDNS data 132 using cached data available within the memory 142 of the WiFi chip 116 . The mDNS data broadcast by the first client device 134 may include the application associated with the broadcast, the port for sending the broadcast, the service being requested by the first client device 134, and/or the computing device performing the casting. Any other characteristics that may be specified during initialization can be specified.

図1Bは、第2のクライアントデバイス102および第3のクライアントデバイス112が各クライアントデバイスに送信されたmDNSデータに応答する図150を示す。応答データの生成および送信中、デバイスSoC 108およびデバイスSoC 118の各々は、スリープモードのままであることが可能であり、それによって、電力および計算リソースの無駄をなくす。応答データ148は、第2のクライアントデバイス102が第1のクライアントデバイス134によって要求されている1つまたは複数の特徴を含むかどうかを示すことができ、応答データ146は、第3のクライアントデバイス112が第1のクライアントデバイス134によって要求されている1つまたは複数の特徴を含むかどうかを示すことができる。第1のクライアントデバイス134がローカルネットワークを介して応答データ148および応答データ146を受信することに応じて、第1のクライアントデバイス134は、要求を満たす1つまたは複数のクライアントデバイスを特定するグラフィカルインターフェースを提供することができる。そのとき、ユーザ124は、メディアをキャスティングするためのクライアントデバイスのうちの1つを選択することができる。たとえば、アプリケーション144がクライアントデバイスにキャスティングすることを要求し、ユーザがキャスティングのために選択するクライアントデバイスのリストを提示されるとき、ユーザは、第1のクライアントデバイス134を選択し得る。 FIG. 1B shows a diagram 150 of second client device 102 and third client device 112 responding to mDNS data sent to each client device. While generating and transmitting response data, each of device SoC 108 and device SoC 118 can remain in sleep mode, thereby wasting power and computational resources. Response data 148 may indicate whether the second client device 102 includes one or more characteristics requested by the first client device 134 and response data 146 may indicate whether the third client device 112 contains one or more features requested by the first client device 134 . In response to first client device 134 receiving response data 148 and response data 146 over the local network, first client device 134 provides a graphical interface identifying one or more client devices that satisfy the request. can be provided. User 124 can then select one of the client devices to cast the media. For example, when the application 144 requests a client device to cast and the user is presented with a list of client devices to select for casting, the user may select the first client device 134 .

選択に応じて、アプリケーション144は、ローカルネットワーク上で第1のクライアントデバイス134と直接通信することが可能であり、またはアプリケーションは、第1のクライアントデバイス134によって特定のメディアデータをレンダリングするために別個のサーバに第1のクライアントデバイス134へ命令を伝達させるために別個のサーバと通信することが可能である。一部の実装において、第1のクライアントデバイス134は、オーディオおよび/またはビジュアルデータをレンダリングすることができるスタンドアロンのスピーカデバイス122および/またはディスプレイデバイスであることが可能である。代替的にまたは追加的に、第3のクライアントデバイス112は、コンピュータモニタおよび/またはテレビなどのディスプレイデバイスであることが可能である。第2のクライアントデバイス102および第3のクライアントデバイス112は、それぞれデジタル信号プロセッサを含むことが可能であり、デジタル信号プロセッサは、それぞれのデバイスSoCがスリープモードで動作しているときに自動化されたアシスタントにアクセスするそれぞれのデバイスインターフェースを監視することができる。さらに、クライアントデバイスのデジタル信号プロセッサ(DSP)、WiFiチップ、デバイスSoC、および/または任意のその他のサブシステムは、本明細書において検討される実装のいずれかによって動作し得る。 Depending on the choice, the application 144 can communicate directly with the first client device 134 over the local network, or the application can communicate separately to render specific media data by the first client device 134. It is possible to communicate with a separate server to have the server communicate instructions to the first client device 134 . In some implementations, first client device 134 may be a standalone speaker device 122 and/or display device capable of rendering audio and/or visual data. Alternatively or additionally, third client device 112 may be a display device such as a computer monitor and/or television. The second client device 102 and the third client device 112 can each include a digital signal processor, which is an automated assistant when the respective device SoC is operating in sleep mode. can monitor each device interface that accesses the Additionally, the client device's digital signal processor (DSP), WiFi chip, device SoC, and/or any other subsystems may operate with any of the implementations discussed herein.

図2Aは、ユーザ220がクライアントデバイス202に発話218を与え、デバイスSoC 208をスリープモードから遷移させることなくクライアントデバイス202にデジタル信号プロセッサを使用して発話218を処理させる図200を示す。クライアントデバイス202は、バッテリ、蓄電器、および/または任意のその他の充電式エネルギー源などの充電式の電源を含み得る電源210によって動作するコンピューティングデバイス222であることが可能である。クライアントデバイス202がデバイスSoC 208がスリープモードであるようにして動作しているとき、ユーザ220は、発話218を与えることができ、発話218は、クライアントデバイス202にデバイスSoC 208をスリープモードから遷移させる呼び出しフレーズと異なることが可能である。たとえば、ユーザ220は、クライアントデバイス202に接続された1つまたは複数のマイクロフォンにおいて受け取られ得る発話218「Hello...」を与えることができる。 FIG. 2A shows a diagram 200 in which a user 220 gives an utterance 218 to a client device 202 and causes the client device 202 to process the utterance 218 using a digital signal processor without causing the device SoC 208 to transition out of sleep mode. Client device 202 can be a computing device 222 operated by power source 210, which can include a rechargeable power source such as a battery, capacitor, and/or any other rechargeable energy source. When the client device 202 is operating with the device SoC 208 in sleep mode, the user 220 can provide an utterance 218 that causes the client device 202 to transition the device SoC 208 out of sleep mode. It can be different from the calling phrase. For example, user 220 may give utterance 218 “Hello...” that may be received at one or more microphones connected to client device 202 .

クライアントデバイス202に接続されたマイクロフォンは、ユーザ220が発話218を与えることに応じて出力を提供することができる。デバイスSoC 208がスリープモードで動作するにもかかわらず、クライアントデバイス202のデジタル信号プロセッサDSP 204は、1つまたは複数の異なるアクションを実行するためにクライアントデバイス202を呼び出すことができる1つまたは複数の呼び出しフレーズのうちの呼び出しフレーズをユーザが与えたかどうかを判定するためにマイクロフォンの出力を監視し得る。一部の実装において、DSP 204は、オーディオデータを処理するためにデバイスSoC 208によって使用されるサンプリングレートよりも低いサンプリングレートを利用するプロセスによって発話218を特徴付けるオーディオデータ212を処理することができる。代替的にまたは追加的に、DSP 204は、デバイスSoC 208によって処理されるオーディオを生成するために使用されるマイクロフォンの数に比べてより少ない数のマイクロフォンからの出力に基づいて生成されたオーディオデータ212を処理することができる。言い換えると、DSP 204は、デバイスSoC 208によって利用されるチャネルの量に比べてオーディオデータのより少ないチャネルを利用し得る。より低いサンプリングレートおよび/またはより少ないチャネルを利用することは、計算効率が高く、電力消費(および結果として生じるバッテリの消費)を最小化することができる。代替的にまたは追加的に、DSP 204は、ユーザ220が呼び出しフレーズを与えたかどうかを判定するためにオーディオデータ212を処理するために第1のモデル214にアクセスし得る。 A microphone connected to client device 202 can provide output in response to user 220 giving speech 218 . Although the device SoC 208 operates in sleep mode, the digital signal processor DSP 204 of the client device 202 can call the client device 202 to perform one or more different actions. The output of the microphone may be monitored to determine if the user provided one of the call phrases. In some implementations, DSP 204 may process audio data 212 that characterizes speech 218 by a process that utilizes a lower sampling rate than the sampling rate used by device SoC 208 to process the audio data. Alternatively or additionally, the DSP 204 generates audio data based on output from a smaller number of microphones compared to the number of microphones used to generate the audio processed by the device SoC 208. 212 can be processed. In other words, DSP 204 may utilize fewer channels of audio data compared to the amount of channels utilized by device SoC 208 . Utilizing lower sampling rates and/or fewer channels can be computationally efficient and minimize power consumption (and consequent battery consumption). Alternatively or additionally, DSP 204 may access first model 214 to process audio data 212 to determine if user 220 provided a call phrase.

第1のモデル214は、ユーザ220が呼び出しフレーズを言ったかどうかを判定するためにデバイスSoC 208によって使用される第2のモデル216と異なることが可能である。たとえば、第1のモデル214は、オーディオデータが呼び出しフレーズを特徴付けるかどうかを判定するために訓練されたモデルであることが可能である。オーディオデータと呼び出しフレーズとの間の対応は、1つまたは複数の値として特徴付けられることが可能であり、オーディオデータと呼び出しフレーズとの間の類似性の閾値の度合いは、第2のモデル216に対応する別の閾値の度合いに比べてより低いことが可能である。言い換えると、発話は、第1のモデル214の閾値を満たすが第2のモデル216の閾値を満たさないと判定され得るが、発話は、第2のモデル216を満たし、第1のモデル214を満たさないと判定され得ない。 The first model 214 can be different than the second model 216 used by the device SoC 208 to determine if the user 220 said the call phrase. For example, first model 214 can be a model trained to determine whether audio data characterizes a call phrase. The correspondence between the audio data and the call phrase can be characterized as one or more values, and the threshold degree of similarity between the audio data and the call phrase is determined by the second model 216 can be lower than another threshold magnitude corresponding to . In other words, it may be determined that the utterance satisfies the threshold of the first model 214 but not the threshold of the second model 216, but the utterance satisfies the second model 216 and satisfies the first model 214. It cannot be determined that there is no

様々な実装において、第2のモデル216は、第1のモデル214に比べて(ビットで見て)より大きく、(たとえば、オーディオデータのより多くのチャネルを処理するための)より大きな入力の次元および/またはより大きな量の訓練されたノードを持ち得る。結果として、第2のモデル216を利用してオーディオデータを処理することは、第1のモデル214を利用してオーディオデータを処理することに比べてより計算コストが高くなり得る。しかし、一部の実装において、第2のモデル216を利用してオーディオデータを処理することは、第2のモデル216がより大きいこと、オーディオデータのより多くのチャネルが処理されること、より高い精度のサンプル、および/またはオーディオデータのより高いサンプリングレートが処理されることの結果として、ユーザ220が呼び出しフレーズを言ったかどうかのより正確な判定をもたらし得る。したがって、DSP 204は、オーディオデータが呼び出しフレーズの存在に関する「初期検査」に合格するかどうかを判定するためにより効率的な第1のモデル214を利用することが可能であり、「初期検査」に合格する場合にのみ、SoC 208およびより効率の低い(しかし、より正確な)第2のモデル216が、利用されることが可能である。これは、SoC 208および第2のモデル216のみを利用するよりもリソースの点で効率的である。 In various implementations, the second model 216 is larger (in terms of bits) than the first model 214 and has a larger input dimension (eg, for processing more channels of audio data). and/or have a larger amount of trained nodes. As a result, processing audio data using the second model 216 may be more computationally expensive than processing audio data using the first model 214 . However, in some implementations, processing the audio data utilizing the second model 216 may result in the second model 216 being larger, the more channels of audio data being processed, the Accurate samples and/or higher sampling rates of audio data are processed, which may result in a more accurate determination of whether user 220 said a call phrase. Therefore, the DSP 204 can utilize the more efficient first model 214 to determine whether the audio data passes the "initial check" for the presence of the call phrase, and the "initial check" Only if it passes can the SoC 208 and the less efficient (but more accurate) second model 216 be utilized. This is more efficient in terms of resources than utilizing only the SoC 208 and the second model 216.

一部の実装において、DSP 204は、デバイスSoC 208がオーディオデータを処理するビット深度に比べて異なるビット深度でオーディオデータを処理し得る。たとえば、DSP 204は、オーディオデータを24ビットオーディオとしてキャプチャするが、オーディオデータを16ビットオーディオデータに変換し、それから、オーディオデータがユーザによって与えられた呼び出しフレーズを特徴付けるかどうかを判定するときに16ビットオーディオデータを使用することができる。16ビットオーディオデータが呼び出しフレーズを特徴付けるとDSP 204が判定するとき、DSP 204は、キャプチャされた24ビットオーディオデータがデバイスSoC 208に転送されるようにし得る。そのとき、デバイスSoC 208は、転送されたオーディオデータを処理するために異なるビット深度に変換するのではなく24ビットオーディオデータを処理することができる。 In some implementations, DSP 204 may process audio data at a different bit depth than the bit depth at which device SoC 208 processes audio data. For example, the DSP 204 captures audio data as 24-bit audio, but converts the audio data to 16-bit audio data and then converts the audio data to 16-bit audio data when determining whether the audio data characterizes a call phrase given by the user. Bit audio data can be used. DSP 204 may cause the captured 24-bit audio data to be transferred to device SoC 208 when DSP 204 determines that the 16-bit audio data characterizes the call phrase. The device SoC 208 can then process the 24-bit audio data rather than converting to different bit depths to process the transferred audio data.

ユーザ220が発話218を与えることに応じて、DSP 204は、第1のモデル214を使用してオーディオデータ212を処理し、発話218が1つまたは複数の呼び出しフレーズのうちの呼び出しフレーズに対応しないと判定し得る。それに応じて、DSP 204は、さらなる処理のためにデバイスSoC 208をスリープから復帰させることを回避し得る。このようにして、デバイスSoC 208は、オーディオデータ212をさらに処理するために頻繁に初期化される必要がなく、スリープモードのままであることができる。これは、クライアントデバイス202が電源210によって提供されるエネルギーおよびクライアントデバイス202において利用可能な計算リソースの無駄をなくすことを可能にする。 In response to user 220 providing utterance 218, DSP 204 processes audio data 212 using first model 214, utterance 218 does not correspond to one of the one or more call phrases. can be determined. Accordingly, the DSP 204 may avoid waking the device SoC 208 from sleep for further processing. In this way, the device SoC 208 does not need to be initialized frequently to further process the audio data 212 and can remain in sleep mode. This allows client device 202 to conserve energy provided by power source 210 and computing resources available at client device 202 .

図2Bは、ユーザがクライアントデバイス202に発話234を与え、クライアントデバイス202のDSP 204にクライアントデバイス202のデバイスSoC 208をウェイクアップさせる図230を示す。発話234は、クライアントデバイス202の1つまたは複数のマイクロフォンによってキャプチャされることが可能であり、クライアントデバイス202は、ポータブルおよび/または充電式の電源210により動作するコンピューティングデバイス222であることが可能である。最初に、デバイスSoC 208は、電力および計算リソースを節約するためにスリープモードで動作し得る。デバイスSoC 208がスリープモードで動作している間、DSP 204は、ユーザ220が1つまたは複数の呼び出しフレーズに対応する発話を与えるときを検出するように動作し得る。 FIG. 2B shows a diagram 230 in which the user gives an utterance 234 to the client device 202 causing the DSP 204 of the client device 202 to wake up the device SoC 208 of the client device 202 . Speech 234 can be captured by one or more microphones of client device 202, which can be a computing device 222 powered by a portable and/or rechargeable power source 210. is. Initially, device SoC 208 may operate in sleep mode to save power and computational resources. While device SoC 208 is operating in sleep mode, DSP 204 may operate to detect when user 220 gives an utterance corresponding to one or more call phrases.

例として、ユーザ220は、DSP 204によって検出されるときにDSP 204にデバイスSoC 208をスリープから復帰させることが可能である呼び出しフレーズに対応し得る「Assistant」などの発話234を与えることができる。呼び出しフレーズを検出するために、DSP 204は、クライアントデバイス202の1つまたは複数のマイクロフォンからの出力をオーディオデータ232に変換し得る。DSP 204は、発話234が呼び出しフレーズに対応するかどうかを判定するためにオーディオデータ232を処理するために第1のモデル214を使用することができる。発話234が呼び出しフレーズに対応するとDSP 204が判定するとき、DSP 204は、デバイスSoC 208をウェイクアップするかまたはそうでなければデバイスSoC 208をスリープモードから遷移させるためにデバイスSoC 208にコマンドを送信することができる。 As an example, the user 220 can give an utterance 234, such as "Assistant," which can correspond to a call phrase that can wake the device SoC 208 from sleep to the DSP 204 when detected by the DSP 204. To detect call phrases, DSP 204 may convert output from one or more microphones of client device 202 into audio data 232 . DSP 204 can use first model 214 to process audio data 232 to determine if utterance 234 corresponds to a calling phrase. When the DSP 204 determines that the utterance 234 corresponds to the call phrase, the DSP 204 sends a command to the device SoC 208 to wake up the device SoC 208 or otherwise transition the device SoC 208 out of sleep mode. can do.

DSP 204がデバイスSoC 208をスリープモードから動作モードに遷移させるとき、DSP 204は、さらなる処理のためにデバイスSoC 208にオーディオデータを送信することもできる。そのとき、デバイスSoC 208は、発話234が呼び出しフレーズに対応するかどうかを確認するために第2のモデル216を使用してオーディオデータを処理することができる。発話234が呼び出しフレーズに対応しなかったとデバイスSoC 208が判定するとき、デバイスSoC 208は、計算リソースおよび電力を節約するためにスリープモードに遷移して戻ることができる。代替的にまたは追加的に、発話234が呼び出しフレーズに対応しないが、DSP 204が発話234が呼び出しフレーズに対応すると判定したとデバイスSoC 208が判定するとき、デバイスSoC 208は、少なくともユーザ220からのさらなる入力を見込んで、ある期間の間、アクティブまたはアウェイク(awake)のままであることができる。一部の実装において、ウェイク時間は、発話234と呼び出しフレーズとの間の相関の度合い、ユーザ220の声の識別、および/または本明細書において検討される任意のその他の実装の特徴に基づき得る。例として、ウェイク時間は、デバイスSoC 208によって検出された相関の度合いと相関の閾値の度合いとの間の比較に基づいて決定され得る。たとえば、デバイスSoC 208によって検出された相関の度合いが0.87であり、相関の閾値の度合いが0.9であるとき、デバイスSoC 208のウェイク時間は、時間の期間Xに設定され得る。しかし、デバイスSoC 208によって検出された相関の度合いが0.79であり、相関の閾値の度合いが0.9である場合、デバイスSoC 208のウェイク時間は、時間の期間Yに設定されることが可能であり、Yは、Xよりも短い。 When DSP 204 transitions device SoC 208 from sleep mode to operational mode, DSP 204 may also send audio data to device SoC 208 for further processing. The device SoC 208 can then process the audio data using the second model 216 to see if the utterance 234 corresponds to the call phrase. When device SoC 208 determines that utterance 234 did not correspond to a call phrase, device SoC 208 can transition back to sleep mode to conserve computational resources and power. Alternatively or additionally, when device SoC 208 determines that utterance 234 does not correspond to a call phrase, but DSP 204 determines that utterance 234 does It can remain active or awake for a period of time in anticipation of further input. In some implementations, the wake time may be based on the degree of correlation between the utterance 234 and the calling phrase, the identification of the user's 220 voice, and/or any other implementation features discussed herein. . As an example, the wake time may be determined based on a comparison between the degree of correlation detected by device SoC 208 and a threshold degree of correlation. For example, when the degree of correlation detected by the device SoC 208 is 0.87 and the threshold degree of correlation is 0.9, the wake time of the device SoC 208 may be set to X period of time. However, if the degree of correlation detected by the device SoC 208 is 0.79 and the threshold degree of correlation is 0.9, the wake time of the device SoC 208 can be set to a period of time Y, Y is shorter than X.

図2Cは、発話244がユーザ220によって与えられ、コンピューティングデバイス202のDSP 204にデバイスSoC 208をスリープモードから遷移させ、さらなる動作のために自動化されたアシスタントをさらに初期化させる図240を示す。クライアントデバイス202は、自動化されたアシスタントとインタラクションするための1つまたは複数の異なるインターフェースを含むコンピューティングデバイス222であることが可能である。自動化されたアシスタントを初期化するために、ユーザ220は、デバイスSoC 208がスリープモードであるときにDSP 204によって検出される呼び出しフレーズを与え得る。呼び出しフレーズは、DSP 204によって検出されるときにオーディオデータ242および第1のモデル214を使用して処理される発話244に包含され得る。オーディオデータ242が呼び出しフレーズを特徴付けるとDSP 204が第1のモデルを使用して判定するとき、DSP 204は、デバイスSoC 208にウェイクコマンド(wake command)を与えることができる。 FIG. 2C shows a diagram 240 in which an utterance 244 is given by the user 220, causing the DSP 204 of the computing device 202 to transition the device SoC 208 out of sleep mode and further initialize the automated assistant for further operation. Client device 202 can be a computing device 222 that includes one or more different interfaces for interacting with the automated assistant. To initialize the automated assistant, user 220 may provide a call phrase that is detected by DSP 204 when device SoC 208 is in sleep mode. The call phrase may be included in speech 244 that is processed using audio data 242 and first model 214 when detected by DSP 204 . DSP 204 may provide a wake command to device SoC 208 when DSP 204 determines using the first model that audio data 242 characterizes a call phrase.

デバイスSoC 208がウェイクコマンドを受信することに応じて、デバイスSoC 208は、第2のモデル216を使用して発話244に対応するオーディオデータを処理することができる。第2のモデル216を使用するオーディオデータの処理に基づいて、デバイスSoC 208は、発話244が呼び出しフレーズを含んでいたと判定し得る。したがって、ユーザ220が呼び出しフレーズを与えたとデバイスSoC 208が判定することに基づいて、デバイスSoC 208は、自動化されたアシスタントをローカルで初期化すること、および/またはサーバデバイスを介して自動化されたアシスタントを初期化するネットワーク要求を与えることができる。たとえば、デバイスSoC 208は、自動化されたアシスタントを初期化するためにクライアントデバイス202のWiFiチップ106にデータを送信し得る。データは、ユーザからの後続の要求がクライアントデバイス202を介して自動化されたアシスタントサーバに送信され得るように、インターネットなどのネットワークを介して自動化されたアシスタントサーバに送信され得る。一部の実装において、自動化されたアシスタントは、クライアントデバイス202にホストされることが可能であり、したがって、ユーザ220からの、自動化されたアシスタントが特定の動作を実行する要求は、クライアントデバイス202において処理されることが可能である。デバイスSoC 208が電力およびその他のリソースを節約するためにスリープし、ユーザ220からの特定の発話を検証するためにスリープから復帰することを可能にすることによって、クライアントデバイス202は、計算および電力リソースを節約することができ、これは、充電式の電源210を使用して動作するクライアントデバイス202にとって特に有利であり得る。 In response to device SoC 208 receiving the wake command, device SoC 208 can process audio data corresponding to utterance 244 using second model 216 . Based on processing the audio data using the second model 216, the device SoC 208 may determine that the utterance 244 included a call phrase. Thus, based on the device SoC 208 determining that the user 220 provided the calling phrase, the device SoC 208 initializes the automated assistant locally and/or calls the automated assistant via the server device. can be given a network request to initialize the . For example, device SoC 208 may send data to WiFi chip 106 of client device 202 to initialize an automated assistant. Data may be sent to the automated assistant server over a network, such as the Internet, such that subsequent requests from the user may be sent to the automated assistant server via the client device 202. In some implementations, the automated assistant can be hosted on the client device 202, and thus a request from the user 220 for the automated assistant to perform a particular action is processed on the client device 202. can be processed. By allowing device SoC 208 to sleep to conserve power and other resources, and wake from sleep to validate specific utterances from user 220, client device 202 saves computational and power resources. can be saved, which can be particularly advantageous for client devices 202 that operate using a rechargeable power source 210 .

図3は、ユーザ320とクライアントデバイス302との間のインタラクションに関連する1つまたは複数のプロパティに基づいてデバイスSoC 308のウェイク時間を決定することができるクライアントデバイス302の図300を示す。ユーザ320は、1つまたは複数の異なる機能を実行するために自動化されたアシスタントを呼び出すためにクライアントデバイス302とインタラクションすることができる。たとえば、クライアントデバイス302は、音楽などのオーディオをレンダリングする、および/またはクライアントデバイス302のある共通のネットワークに接続される様々なその他のクライアントデバイスを制御することができるスタンドアロンのスピーカデバイス322であることが可能である。クライアントデバイス302は、異なる話し方および/またはクライアントデバイス302とのインタラクションの仕方をする複数の異なるユーザによって制御され得る。電力および計算リソースの無駄をやはりなくしながらユーザ間のそのような違いに対応するために、クライアントデバイス302は、デバイスSoC 308がユーザ320からの入力を監視する時間の量を制限するためのデバイスSoC 308のウェイク時間324を決定することができる。 FIG. 3 shows a diagram 300 of a client device 302 that can determine wake times for the device SoC 308 based on one or more properties associated with interactions between a user 320 and the client device 302. FIG. User 320 can interact with client device 302 to invoke an automated assistant to perform one or more different functions. For example, client device 302 may be a stand-alone speaker device 322 capable of rendering audio, such as music, and/or controlling various other client devices connected to a common network with client device 302. is possible. The client device 302 may be controlled by multiple different users who speak and/or interact with the client device 302 differently. To accommodate such differences between users while still avoiding wasting power and computational resources, the client device 302 uses the device SoC to limit the amount of time that the device SoC 308 monitors input from the user 320 . A wake time 324 of 308 can be determined.

例として、ユーザ320は、「Assistant, could you...」などの発話318を与え、その後、発話をどのようにして続けるべきかを考えるために短く間をおき得る。ユーザ320は、クライアントデバイス302とインタラクションしているときにそのような間を示す癖または歴を持ち得る。したがって、ユーザ320とクライアントデバイスとの間の以前のインタラクションを特徴付けるデータが、クライアントデバイス302のリソースを浪費せずにユーザ320からのさらなる入力をどれぐらい長く監視すべきかを決定するために使用され得る。たとえば、発話318に応じて、クライアントデバイス302のDSP 304は、オーディオデータ312が「Assistant」などの呼び出しフレーズを特徴付けるかどうかを判定するために発話318を特徴付けるオーディオデータ312を処理し得る。発話318が呼び出しフレーズを含むとDSP 304が判定するとき、DSP 304は、デバイスSoC 308がスリープモードから動作モードに遷移するようにするためにデバイスSoC 308と通信し得る。一部の実装において、DSP 304は、ユーザ320が呼び出しフレーズを与えたことを確認するためにデバイスSoC 308にオーディオデータ312を送信することもできる。 As an example, the user 320 may give an utterance 318 such as "Assistant, could you..." and then pause briefly to consider how to continue the utterance. User 320 may have habits or histories that indicate such intervals while interacting with client device 302 . Thus, data characterizing previous interactions between the user 320 and the client device can be used to determine how long to monitor for further input from the user 320 without wasting client device 302 resources. . For example, in response to utterance 318, DSP 304 of client device 302 may process audio data 312 that characterizes utterance 318 to determine whether audio data 312 characterizes a call phrase, such as "Assistant." When DSP 304 determines that utterance 318 includes a call phrase, DSP 304 may communicate with device SoC 308 to cause device SoC 308 to transition from sleep mode to operational mode. In some implementations, the DSP 304 may also send audio data 312 to the device SoC 308 to confirm that the user 320 gave the call phrase.

一部の実装においては、ユーザ320が呼び出しフレーズを確かに与えたとデバイスSoC 308が判定するとき、デバイスSoC 308は、発話318を与えたユーザを特定するためにオーディオデータ312をさらに処理し得る。たとえば、デバイスSoC 308は、オーディオデータ312によって包含される1つまたは複数の声の特徴を特定するために、ユーザからの許可の下で声識別モデルにアクセスすることができる。オーディオデータ312によって包含される声の特徴に基づいて、デバイスSoC 308は、1人または複数の異なるユーザを、発話318がそれらのユーザの特定の声の特徴に対応するかどうかに従ってランク付けし得る。そして、最も高いランク付けのユーザが、発話318を与えたユーザとして選択されることが可能であり、デバイスSoC 308は、最も高いランク付けのユーザを特定することに基づいてウェイク時間324を決定することができる。代替的にまたは追加的に、ユーザは、発話318の発生源の予測を生成するために使用され得る1つまたは複数のモデルを使用してデバイスSoC 308によって選択されることが可能である。代替的にまたは追加的に、オーディオデータ312は、ウェイク時間324を生成するためにやはり使用され得る1つまたは複数のモデルを使用して処理されることが可能である。 In some implementations, when device SoC 308 determines that user 320 did give the call phrase, device SoC 308 may further process audio data 312 to identify the user who gave utterance 318 . For example, device SoC 308 may access a voice identification model, with permission from the user, to identify one or more voice characteristics encompassed by audio data 312 . Based on the vocal characteristics contained by the audio data 312, the device SoC 308 may rank one or more different users according to whether the utterances 318 correspond to their particular vocal characteristics. . The highest ranked user can then be selected as the user who gave the utterance 318, and the device SoC 308 determines the wake time 324 based on identifying the highest ranked user. be able to. Alternatively or additionally, a user can be selected by device SoC 308 using one or more models that can be used to generate a prediction of the origin of utterance 318 . Alternatively or additionally, audio data 312 can be processed using one or more models that can also be used to generate wake times 324 .

ユーザ320が呼び出しフレーズを与えたと判定することに応じて、デバイスSoC 308は、インターネットなどの広域ネットワークを介して自動化されたアシスタントを初期化するためにWiFiチップ306と通信することができる。しかし、一部の実装において、デバイスSoC 308は、ローカルエリアネットワークを介してクライアントデバイス302と通信するローカルデバイスによって自動化されたアシスタントを初期化し得る。自動化されたアシスタントが初期化している間、デバイスSoC 308は、少なくともウェイク時間324に等しい時間の量の間、クライアントデバイス302の1つまたは複数のインターフェースを監視し得る。ウェイク時間324が経過するとき、デバイスSoC 308は、スリープモードに戻ることができ、DSP 304は、クライアントデバイス302の1つまたは複数のインターフェースからの出力の監視を引き継ぐことができる。 In response to determining that the user 320 provided the call phrase, the device SoC 308 can communicate with the WiFi chip 306 to initialize the automated assistant over a wide area network such as the Internet. However, in some implementations, device SoC 308 may initiate a local device automated assistant that communicates with client device 302 over a local area network. While the automated assistant is initializing, device SoC 308 may monitor one or more interfaces of client device 302 for an amount of time at least equal to wake time 324 . When wake time 324 elapses, device SoC 308 can return to sleep mode and DSP 304 can take over monitoring output from one or more interfaces of client device 302 .

一部の実装において、ウェイク時間324は、発話318と呼び出しフレーズとの間の相関の決定された度合いに基づき得る。たとえば、デバイスSoC 308および/またはDSP 304は、発話318と呼び出しフレーズとの間の相関の度合いを特徴付ける値を生成し得る。ウェイク時間324の量は、相関の度合いが高くなるにつれて減ることが可能であり、ウェイク時間324量は、相関の度合いが低くなるにつれて増えることが可能である。言い換えると、デバイスSoC 308は、発話318が呼び出しフレーズを含むと確認するための閾値の10%の許容範囲内に発話318が入っていると判定し、ウェイク時間324は、1分であることが可能である。しかし、発話318が呼び出しフレーズを確かに含み、したがって、閾値を満たすとデバイスSoC 308が判定するとき、ウェイク時間324は、5秒に設定され得る。ウェイク時間は、プロセッサの動作が基づくことができる任意の量のミリ秒、秒、分、および/または任意のその他の時間の値であることが可能であることに留意されたい。たとえば、呼び出しフレーズとより密接に相関する発話は、呼び出しフレーズとより密接でなく相関する異なる発話の結果として得られるウェイク時間よりも少ない合計のミリ秒を有するウェイク時間をもたらし得る。 In some implementations, wake time 324 may be based on a determined degree of correlation between utterance 318 and the calling phrase. For example, device SoC 308 and/or DSP 304 may generate a value that characterizes the degree of correlation between utterance 318 and the call phrase. The amount of wake time 324 can decrease as the degree of correlation increases, and the amount of wake time 324 can increase as the degree of correlation decreases. In other words, the device SoC 308 determines that the utterance 318 falls within a 10% tolerance of the threshold for confirming that the utterance 318 contains a call phrase, and the wake time 324 is 1 minute. It is possible. However, when device SoC 308 determines that utterance 318 does contain a call phrase and thus meets the threshold, wake time 324 may be set to 5 seconds. Note that the wake time can be any amount of milliseconds, seconds, minutes, and/or any other time value on which processor operations can be based. For example, an utterance that correlates more closely with the calling phrase may result in a wake time that has fewer total milliseconds than a wake time resulting from a different utterance that correlates less closely with the calling phrase.

図4は、デバイスSoC 444のウェイク時間を生成すること、DSP 442のクロックとデバイスSoC 444のクロックとの間の時間オフセットを生成すること、ならびに/またはデバイスSoC 444をスリープモードから遷移させることなくキャスティング要求に応答するためにWiFiチップ434を使用することによって計算リソースの無駄をなくすようにコンピューティングデバイス418を動作させるためのシステム400を示す。自動化されたアシスタント404は、コンピューティングデバイス418および/またはサーバデバイス402などの1つまたは複数のコンピューティングデバイスにおいて提供されるアシスタントアプリケーションの一部として動作し得る。ユーザは、マイクロフォン、カメラ、タッチスクリーンディスプレイ、ユーザインターフェース、および/またはユーザとアプリケーションとの間のインターフェースを提供することができる任意のその他の装置であることが可能であるアシスタントインターフェースを介して自動化されたアシスタント404とインタラクションすることができる。 Figure 4 illustrates generating a wake time for the device SoC 444, generating a time offset between the DSP 442 clock and the device SoC 444 clock, and/or without transitioning the device SoC 444 out of sleep mode. A system 400 is shown for operating a computing device 418 to conserve computing resources by using a WiFi chip 434 to respond to casting requests. Automated assistant 404 may operate as part of an assistant application provided on one or more computing devices, such as computing device 418 and/or server device 402 . The user is automated through an assistant interface, which can be a microphone, camera, touch screen display, user interface, and/or any other device capable of providing an interface between the user and the application. The assistant 404 can be interacted with.

たとえば、ユーザは、自動化されたアシスタント404に機能を実行させる(たとえば、データを提供させる、周辺デバイスを制御させる、エージェントにアクセスさせる、入力および/または出力を生成させるなど)ためにアシスタントインターフェースに口頭の、テキストの、および/またはグラフィカルな入力を与えることによって自動化されたアシスタント404を初期化し得る。コンピューティングデバイス418は、タッチインターフェースを介してユーザがコンピューティングデバイス418のアプリケーションを制御することを可能にするためにタッチ入力および/またはジェスチャを受け取るためのタッチインターフェースを含むディスプレイパネルであることが可能であるディスプレイデバイスを含み得る。一部の実装において、コンピューティングデバイス418は、ディスプレイデバイスを持たないことが可能であり、それによって、グラフィカルユーザインターフェース出力を提供せずに可聴ユーザインターフェース出力を提供する。さらに、コンピューティングデバイス418は、ユーザから口で言われた自然言語入力を受け取るためのマイクロフォンなどのユーザインターフェースを提供し得る。一部の実装において、コンピューティングデバイス418は、タッチインターフェースを含むことが可能であり、カメラを持たないことが可能であるが、任意で1つまたは複数のその他のセンサーを含み得る。 For example, a user may verbally enter the assistant interface to cause the automated assistant 404 to perform functions (e.g., provide data, control peripheral devices, access agents, generate input and/or output, etc.). Automated assistant 404 may be initialized by providing physical, textual, and/or graphical input. Computing device 418 can be a display panel that includes a touch interface for receiving touch input and/or gestures to allow a user to control applications on computing device 418 via the touch interface. may include a display device that is In some implementations, computing device 418 may not have a display device, thereby providing audible user interface output without providing graphical user interface output. In addition, computing device 418 may provide a user interface, such as a microphone, for receiving spoken natural language input from a user. In some implementations, computing device 418 may include a touch interface, may not have a camera, but may optionally include one or more other sensors.

コンピューティングデバイス418および/またはその他のコンピューティングデバイス434は、インターネットなどのネットワーク440を介してサーバデバイス402と通信することができる。さらに、コンピューティングデバイス418およびその他のコンピューティングデバイス434は、WiFiネットワークなどのローカルエリアネットワーク(LAN)を介して互いに通信することができる。コンピューティングデバイス418は、コンピューティングデバイス418における計算リソースを節約するために計算タスクをサーバデバイス402にオフロードすることができる。たとえば、サーバデバイス402は、自動化されたアシスタント404をホストすることができ、コンピューティングデバイス418は、1つまたは複数のアシスタントインターフェース420において受け取られた入力をサーバデバイス402に送信することができる。しかし、一部の実装において、自動化されたアシスタント404は、コンピューティングデバイス418においてクライアントの自動化されたアシスタント422としてホストされ得る。 Computing device 418 and/or other computing device 434 may communicate with server device 402 over a network 440, such as the Internet. Additionally, computing device 418 and other computing device 434 may communicate with each other via a local area network (LAN), such as a WiFi network. Computing device 418 can offload computing tasks to server device 402 to conserve computing resources on computing device 418 . For example, server device 402 can host automated assistant 404 and computing device 418 can transmit input received at one or more assistant interfaces 420 to server device 402 . However, in some implementations, automated assistant 404 may be hosted as client automated assistant 422 on computing device 418 .

様々な実装において、自動化されたアシスタント404のすべてのまたは一部の態様は、コンピューティングデバイス418に実装され得る。それらの実装の一部において、自動化されたアシスタント404の態様は、コンピューティングデバイス418のクライアントの自動化されたアシスタント422によって実施され、自動化されたアシスタント404のその他の態様を実施するサーバデバイス402とインターフェースを取る。サーバデバイス402は、任意で、複数のスレッドによって複数のユーザおよびそれらのユーザの関連するアシスタントアプリケーションにサービスを提供し得る。自動化されたアシスタント404のすべてのまたは一部の態様がコンピューティングデバイス418のクライアントの自動化されたアシスタント422によって実施される実装において、クライアントの自動化されたアシスタント422は、コンピューティングデバイス418のオペレーティングシステムと別れている(たとえば、オペレーティングシステムの「上に」インストールされる)アプリケーションであることが可能であり--または代替的にコンピューティングデバイス418のオペレーティングシステムによって直接実施される(たとえば、オペレーティングシステムの、ただしオペレーティングシステムと一体的なアプリケーションと考えられる)ことが可能である。 In various implementations, all or some aspects of automated assistant 404 may be implemented on computing device 418 . In some of those implementations, aspects of automated assistant 404 are performed by automated assistant 422 on a client of computing device 418 and interface with server device 402 to perform other aspects of automated assistant 404. I take the. The server device 402 may optionally serve multiple users and their associated assistant applications via multiple threads. In implementations in which all or some aspects of automated assistant 404 are performed by client automated assistant 422 of computing device 418, client automated assistant 422 includes the computing device 418 operating system and It can be an application that is separate (e.g., installed "on top of" the operating system)--or alternatively is implemented directly by the operating system of computing device 418 (e.g., the operating system's However, it can be considered an application integrated with the operating system).

一部の実装において、自動化されたアシスタント404および/またはクライアントの自動化されたアシスタント422は、コンピューティングデバイス418および/またはサーバデバイス402のための入力および/または出力を処理するために複数の異なるモジュールを使用し得る入力処理エンジン406を含むことが可能である。たとえば、入力処理エンジン406は、オーディオデータに包含されるテキストを特定するためにアシスタントインターフェース420において受け取られたオーディオデータを処理することができる音声処理モジュール408を含み得る。オーディオデータは、コンピューティングデバイス418の計算リソースを節約するために、たとえば、コンピューティングデバイス418からサーバデバイス402に送信され得る。 In some implementations, automated assistant 404 and/or client automated assistant 422 have multiple different modules to process input and/or output for computing device 418 and/or server device 402. can include an input processing engine 406 that can use the For example, input processing engine 406 may include speech processing module 408 that can process audio data received at assistant interface 420 to identify text contained in the audio data. Audio data may be sent from the computing device 418 to the server device 402 , for example, to conserve computing resources of the computing device 418 .

オーディオデータをテキストに変換するためのプロセスは、単語またはフレーズに対応するオーディオデータのグループを特定するためにニューラルネットワークおよび/または統計モデルを使用することができる音声認識アルゴリズムを含み得る。オーディオデータから変換されたテキストは、データ解析モジュール410によって解析され、ユーザからのコマンドフレーズを生成するおよび/または特定するために使用され得るテキストデータとして自動化されたアシスタントが利用可能であるようにされ得る。一部の実装において、データ解析モジュール410によって提供される出力データは、ユーザが自動化されたアシスタント404によって実行され得る特定のアクションおよび/もしくはルーチンならびに/または自動化されたアシスタント404によってアクセスされ得るアプリケーションもしくはエージェントに対応する入力を与えたかどうかを判定するためにパラメータモジュール412に提供され得る。たとえば、アシスタントデータ416が、サーバデバイス402に記憶される、および/またはクライアントデータ432としてコンピューティングデバイス418に記憶されることが可能であり、自動化されたアシスタント404および/またはクライアントの自動化されたアシスタント422によって実行され得る1つまたは複数のアクションを定義するデータと、アクションを実行するために必要なパラメータとを含み得る。 A process for converting audio data to text may include speech recognition algorithms that may use neural networks and/or statistical models to identify groups of audio data that correspond to words or phrases. The text converted from the audio data is parsed by the data analysis module 410 and made available to automated assistants as text data that can be used to generate and/or identify command phrases from the user. obtain. In some implementations, the output data provided by the data analysis module 410 is used by the user to identify specific actions and/or routines that may be performed by the automated assistant 404 and/or applications or routines that may be accessed by the automated assistant 404. It can be provided to the parameter module 412 to determine if it has provided the corresponding input to the agent. For example, assistant data 416 can be stored on server device 402 and/or stored on computing device 418 as client data 432, automated assistant 404 and/or client automated assistant. It may contain data defining one or more actions that may be performed by 422 and the parameters required to perform the action.

一部の実装において、コンピューティングデバイスは、メモリ436の少なくとも1つもしくは複数の部分および/またはブロードキャストエンジン438を含み得るWiFiチップ434を含むことが可能である。ブロードキャストエンジン438は、ネットワーク440を介して1つまたは複数のその他のクライアントデバイスからブローキャスとされたデータを受信し、メモリ436に記憶されたキャッシュされたデータを使用して応答データを生成することができる。WiFiチップ434は、コンピューティングデバイス418に関連付けられ得る利用可能なサービス、アプリケーション、ハードウェアの特徴、ならびに/または任意のその他のプロパティおよび/もしくは機能を特徴付けるデータを記憶し得る。コンピューティングデバイス418がウェイクモード(wake mode)で動作しているときに比べてより少ない電力および/または計算リソースをデバイスSoC 444が消費しているスリープモードでコンピューティングデバイス418が動作しているとき、WiFiチップ434は、デバイスSoC 444をスリープモードから遷移させることなくその他のクライアントデバイスからのキャスティング要求に応答し得る。 In some implementations, a computing device may include a WiFi chip 434 that may include at least one or more portions of memory 436 and/or broadcast engine 438 . Broadcast engine 438 receives broadcast data from one or more other client devices over network 440 and generates response data using cached data stored in memory 436. can be done. WiFi chip 434 may store data characterizing available services, applications, hardware features, and/or any other properties and/or capabilities that may be associated with computing device 418 . When the computing device 418 is operating in sleep mode in which the device SoC 444 consumes less power and/or computational resources than when the computing device 418 is operating in wake mode , the WiFi chip 434 may respond to casting requests from other client devices without causing the device SoC 444 to transition out of sleep mode.

たとえば、クライアントデバイスからの要求がWiFiチップ434において受信され、要求がメモリ436に記憶されたデータによってやはり特徴付けられる目標のサービスを特定するとき、ブロードキャストエンジン438は、メモリ436からのキャッシュされたデータを使用して応答データを生成し、応答データをクライアントデバイスに提供することができる。クライアントデバイスが目標のサービスを使用するためにコンピューティングデバイス418を選択したならば、クライアントデバイスは、コンピューティングデバイス418にコマンドを送信することができ、WiFiチップ434は、コマンドを処理し、デバイスSoC 444をウェイクモードから動作モードに遷移させることができる。しかし、コンピューティングデバイス418が特定のサービスを提供する、特定のアプリケーションを初期化する、および/またはそれ以外の方法で要求元のクライアントデバイスにサービスを提供することができるかどうかを判定するのに十分なデータをメモリ436が含まないとブロードキャストエンジン438が判定する場合、WiFiチップ434は、要求を処理するためにデバイスSoC 444と通信することができる。この場合、デバイスSoC 444が、応答データを生成し、応答データをWiFiチップ434に提供することができ、WiFiチップ434は、応答データをクライアントデバイスに送信することができる。 For example, when a request from a client device is received at WiFi chip 434 and the request identifies a target service that is also characterized by data stored in memory 436, broadcast engine 438 retrieves cached data from memory 436. can be used to generate the response data and provide the response data to the client device. Once the client device has selected the computing device 418 to use the target service, the client device can send commands to the computing device 418, the WiFi chip 434 processes the commands, and the device SoC The 444 can be transitioned from wake mode to run mode. However, to determine whether the computing device 418 is capable of providing a particular service, initializing a particular application, and/or otherwise providing a service to a requesting client device. If the broadcast engine 438 determines that the memory 436 does not contain enough data, the WiFi chip 434 can communicate with the device SoC 444 to process the request. In this case, the device SoC 444 can generate response data and provide the response data to the WiFi chip 434, which can transmit the response data to the client device.

一部の実装において、コンピューティングデバイス418は、クライアントの自動化されたアシスタント422および/または自動化されたアシスタント404へのアクセスを提供することができる1つまたは複数のアシスタントインターフェース420を含む。ユーザは、クライアントの自動化されたアシスタント422および/または自動化されたアシスタント404を呼び出すために1つまたは複数の異なる種類の入力を与えることができる。そのような入力は、口で言われた入力を含むことが可能であり、口で言われた入力は、デバイスSoC 444がスリープモードで動作しているとき、デジタル信号プロセッサ442によって処理され得る。コンピューティングデバイス418において利用可能な1つまたは複数の音声認識モデル440が、口で言われた入力を特徴付けるオーディオデータが自動化されたアシスタントを初期化するための呼び出しフレーズを包含するかどうかを判定するために使用され得る。さらに、1つまたは複数の音声認識モデル440が、ユーザからの後続の入力を検出するためにデバイスSoC 444がアウェイクのままであるべき時間の量を決定するためにウェイク時間エンジン448によって使用され得る。一部の実装において、ウェイク時間の量は、ユーザの発話と自動化されたアシスタントを呼び出すための呼び出しフレーズとの間の類似性の度合いに基づき得る。代替的にまたは追加的に、ウェイク時間の量は、オーディオ処理エンジン430が発話に対応するオーディオデータを処理し、発話を与えたユーザを特定することに基づき得る。たとえば、オーディオ処理エンジン430は、クライアントデータ432および/またはアシスタントデータ416を使用して、ユーザが自動化されたアシスタントとのインタラクション中に通常どのくらい長く間をおくかなどの、ユーザと自動化されたアシスタントとの間のインタラクションの特徴を決定し得る。ウェイク時間エンジン448は、この情報を使用して、ユーザと自動化されたアシスタントとの間の特定のインタラクション中のデバイスSoC 444のウェイク時間を生成することができる。 In some implementations, computing device 418 includes one or more assistant interfaces 420 that can provide access to client automated assistant 422 and/or automated assistant 404 . A user may provide one or more different types of input to invoke client automated assistant 422 and/or automated assistant 404 . Such input can include spoken input, which can be processed by digital signal processor 442 when device SoC 444 is operating in sleep mode. One or more speech recognition models 440 available at the computing device 418 determine whether the audio data characterizing the spoken input contains a call phrase for initializing the automated assistant. can be used for Additionally, one or more speech recognition models 440 may be used by the wake time engine 448 to determine the amount of time the device SoC 444 should remain awake to detect subsequent input from the user. . In some implementations, the amount of wake time may be based on the degree of similarity between the user's utterance and the calling phrase for calling the automated assistant. Alternatively or additionally, the amount of wake time may be based on audio processing engine 430 processing the audio data corresponding to the utterance to identify the user who gave the utterance. For example, the audio processing engine 430 uses the client data 432 and/or the assistant data 416 to communicate between the user and the automated assistant, such as how long the user typically pauses during an interaction with the automated assistant. can determine characteristics of interactions between The wake time engine 448 can use this information to generate wake times for the device SoC 444 during certain interactions between the user and the automated assistant.

追加的にまたは代替的に、コンピューティングデバイス418の電力エンジン426が、電源446の推定された充電を決定し、推定された充電および/または動作時間の量をウェイク時間エンジン448に伝達し得る。電力エンジンによって推定された充電の量および/または動作時間の量は、デバイスSoC 444のウェイク時間を決定するためにウェイク時間エンジン448によって使用され得る。たとえば、ユーザが自動化されたアシスタントとインタラクションしているときに平均的なユーザよりも概して長く間をとり、電源446がいっぱいまで充電されているとき、ウェイク時間エンジン448は、少なくともそうではなく推定された充電が50%未満であるとした場合に割り振られるウェイク時間に比べて延長されたウェイク時間を割り振ることができる。代替的にまたは追加的に、ユーザがコンピューティングデバイス418とインタラクションしている平均的なユーザよりも概して短く間をとり、電源446がいっぱいまで充電されているとき、ウェイク時間エンジン448は、少なくともユーザの履歴的なインタラクションに基づいて、電力を節約するために、延長されたウェイク時間に比べてより短いウェイク時間を割り振ることができる。 Additionally or alternatively, power engine 426 of computing device 418 may determine the estimated charge of power source 446 and communicate the estimated amount of charge and/or operating time to wake time engine 448 . The amount of charge and/or amount of operating time estimated by the power engine may be used by wake time engine 448 to determine wake time for device SoC 444 . For example, when a user generally pauses longer than the average user when interacting with an automated assistant, and the power supply 446 is fully charged, the wake time engine 448 estimates at least otherwise. An extended wake time may be allocated compared to the wake time allocated if the battery was less than 50% charged. Alternatively or additionally, when the user is interacting with the computing device 418 for generally shorter intervals than the average user and the power source 446 is fully charged, the wake time engine 448 may at least can be allocated shorter wake times relative to extended wake times to conserve power.

一部の実装において、コンピューティングデバイス418は、コンピューティングデバイス418によって使用されるクロックの間のオフセットを決定するための時間オフセットエンジン424を含み得る。たとえば、DSP 442が、第1のクロックを動作させることができ、デバイスSoC 444が、コンピューティングデバイス418の動作中に第1のクロックからオフセットされ得る第2のクロックを動作させることができる。このオフセットは、特に、オーディオ処理エンジン430がアシスタントインターフェース420への口で言われた入力に対してエコー除去を実行しているときにオーディオ処理エンジン430における動作に影響を与え得る。 In some implementations, computing device 418 may include time offset engine 424 for determining offsets between clocks used by computing device 418 . For example, DSP 442 may run a first clock and device SoC 444 may run a second clock that may be offset from the first clock during operation of computing device 418 . This offset can affect operation in audio processing engine 430 , particularly when audio processing engine 430 is performing echo cancellation on spoken input to assistant interface 420 .

一部の実装において、DSP 442が動作する第1のクロックとデバイスSoC 444が動作する第2のクロックとの間のオフセットは、タイムスタンプを使用して決定され得る。タイムスタンプは、第1のクロックを使用してキャプチャされたクロック値および第2のクロックでキャプチャされた別のクロック値を含むクロック値のペアに対応し得る。DSP 442が呼び出しフレーズが検出されたかどうかを判定するために動作しており、デバイスSoC 444がスリープモードであるとき、DSP 442は、呼び出しフレーズが検出されたときの「ウェイク」時間に対応するクロック値を記録し得る。DSP 442がデバイスSoC 444をスリープモードから遷移させるとき、タイムスタンプが、第1のクロックおよび第2のクロックを使用して記録され得る。しかし、第2のクロックに関連して表される「ウェイク」時間を決定するために、タイムスタンプの第2のクロック値が、第1のクロックと第2のクロックとの間の決定された時間オフセットに従って「スケーリングされる」および/またはそれ以外の方法で調整されることが可能である。 In some implementations, the offset between a first clock on which DSP 442 operates and a second clock on which device SoC 444 operates may be determined using timestamps. A timestamp may correspond to a pair of clock values including a clock value captured using a first clock and another clock value captured using a second clock. When the DSP 442 is working to determine if a call phrase has been detected, and the device SoC 444 is in sleep mode, the DSP 442 clocks a clock corresponding to the "wake" time when the call phrase is detected. values can be recorded. When the DSP 442 transitions the device SoC 444 out of sleep mode, timestamps may be recorded using the first clock and the second clock. However, to determine the "wake" time expressed relative to the second clock, the second clock value of the timestamp is the determined time between the first clock and the second clock. It can be "scaled" and/or otherwise adjusted according to the offset.

時間オフセットは、デバイスSoC 444とDSP 442との両方が両方ともスリープモードでないときに記録され得る第1のタイムスタンプおよび第2のタイムスタンプを使用して決定され得る。第1のタイムスタンプは、クロック値の第1のペアに対応することが可能であり、第2のタイムスタンプは、クロック値の第2のペアに対応することが可能である。クロック値の第1のペアの第1のDSPのクロック値が、第1のクロックの差の値を生成するために、クロック値の第2のペアの第2のDSPのクロック値から引かれ得る。さらに、クロック値の第1のペアの第1のSoCのクロック値が、第2のクロックの差の値を生成するために、クロック値の第2のペアの第2のSoCのクロック値から引かれ得る。その後、いつ呼び出しフレーズが受け取られたかを決定するためにDSP 442がデバイスSoC 444をスリープから復帰させるとき、第1のクロックの差の値と第2のクロックの差の値との間のマッピングが使用され得る。たとえば、第1のクロックの差の値に対する第2のクロックの差の値の比が、決定されることが可能であり、比は、対応するデバイスSoCのクロック値を決定するためにDSPのクロック値を乗算されることが可能である。たとえば、DSP 442がデバイスSoC 444をスリープから復帰させるとき、ユーザが呼び出しフレーズを与えた時間に対応するDSPのクロック値が、デバイスSoC 444に提供され得る。そのとき、デバイスSoC 444は、デバイスSoCのクロックに関連して呼び出しフレーズがユーザによっていつ与えられたかを決定するために、DSPのクロック値をデバイスSoCのクロック値にマッピングすることができる。そして、この値が、オーディオデータの内容を分析するため(たとえば、ユーザからの発話の自然言語の内容を特定するため)に、エコー除去中などオーディオデータの処理中に使用され得る。 The time offset may be determined using a first timestamp and a second timestamp that may be recorded when both device SoC 444 and DSP 442 are both not in sleep mode. The first timestamp can correspond to the first pair of clock values and the second timestamp can correspond to the second pair of clock values. A first DSP clock value of the first pair of clock values may be subtracted from a second DSP clock value of the second pair of clock values to generate a first clock difference value. . Further, the clock values of the first SoC of the first pair of clock values are subtracted from the clock values of the second SoC of the second pair of clock values to generate the second clock difference value. he can Then, when the DSP 442 wakes the device SoC 444 from sleep to determine when a call phrase has been received, the mapping between the first clock difference value and the second clock difference value is can be used. For example, the ratio of the second clock difference value to the first clock difference value can be determined, the ratio being used by the DSP clock to determine the corresponding device SoC clock value. Values can be multiplied. For example, when DSP 442 wakes device SoC 444 from sleep, the DSP's clock value corresponding to the time the user gave the call phrase may be provided to device SoC 444 . The device SoC 444 can then map the DSP's clock value to the device SoC's clock value to determine when a call phrase was given by the user relative to the device SoC's clock. This value can then be used during processing of the audio data, such as during echo cancellation, to analyze the content of the audio data (eg, to identify the natural language content of utterances from the user).

図5は、ユーザとコンピューティングデバイスとの間のインタラクションの1つまたは複数の特徴に基づいて選択される時間の量のためにコンピューティングデバイスの特定のプロセッサを初期化するための方法500を示す。方法500は、ユーザと自動化されたアシスタントとの間のインターフェースを提供することができる1つもしくは複数のプロセッサ、アプリケーション、ならびに/または任意のその他の装置および/もしくはモジュールによって実行され得る。方法500は、第1のプロセッサがユーザからの発話を検出したかどうかを判定する動作502を含み得る。第1のプロセッサは、第2のプロセッサがスリープモードで動作しているときに動作可能であり得る。スリープモードは、自動化されたアシスタントなどの1つまたは複数のアプリケーションが第2のプロセッサによってアクティブに実行されている動作モードに比べてより少ない電力および/またはより少ない計算リソースを第2のプロセッサが消費しているモードであることが可能である。一部の実装において、第1のプロセッサは、デジタル信号プロセッサであることが可能であり、第2のプロセッサは、デバイスSoCであることが可能である。第1のプロセッサと第2のプロセッサとの両方が、バッテリ、蓄電器、および/または任意のその他の充電式の電源などの充電式の電源を使用して動作しているコンピューティングデバイスに組み込まれ得る。 FIG. 5 illustrates a method 500 for initializing a particular processor of a computing device for a selected amount of time based on one or more characteristics of interaction between a user and the computing device. . Method 500 may be performed by one or more processors, applications, and/or any other device and/or module capable of providing an interface between a user and an automated assistant. Method 500 may include an act of determining 502 whether the first processor has detected speech from the user. The first processor may be operational while the second processor is operating in sleep mode. Sleep mode causes the second processor to consume less power and/or less computational resources than operating modes in which one or more applications, such as an automated assistant, are actively running by the second processor. It is possible to be in a mode where In some implementations, the first processor can be a digital signal processor and the second processor can be the device SoC. Both the first processor and the second processor may be incorporated into a computing device operating using a rechargeable power source such as a battery, capacitor, and/or any other rechargeable power source. .

方法500は、第1のプロセッサが発話を検出したとき、動作504に進むことができる。そうではなく、発話が第1のプロセッサによって検出されなかったとき、第1のプロセッサは、ユーザが発話を与えたかどうかを判定するためにコンピューティングデバイスの1つまたは複数のマイクロフォンを監視し続けることができる。動作504は、発話が特定の呼び出しフレーズを含むかどうかを第1のプロセッサによって判定することを含み得る。コンピューティングデバイスは、1つまたは複数の異なる呼び出しフレーズのうちの特定の呼び出しフレーズがユーザによってコンピューティングデバイスに与えられたとき、第2のプロセッサをスリープモードから遷移させるように動作し得る。呼び出しフレーズは、たとえば、「Assistant」および/またはアプリケーションを初期化するために使用され得る任意のその他のフレーズであることが可能である。発話が呼び出しフレーズを含むと第1のプロセッサが判定するとき、方法500は、動作504から動作508に進むことができる。 Method 500 may proceed to operation 504 when the first processor detects speech. Otherwise, when speech has not been detected by the first processor, the first processor continues to monitor one or more microphones of the computing device to determine whether the user has provided speech. can be done. Act 504 may include determining by the first processor whether the utterance includes the particular call phrase. The computing device may be operable to transition the second processor out of sleep mode when a particular one of the one or more different call phrases is provided to the computing device by the user. The invocation phrase can be, for example, "Assistant" and/or any other phrase that can be used to initialize the application. Method 500 may proceed from operation 504 to operation 508 when the first processor determines that the utterance includes a call phrase.

動作508は、第2のプロセッサをスリープモードから動作モードに遷移させることを含み得る。動作508は、第1のプロセッサが呼び出しフレーズを特定することに応じて第1のプロセッサによって実行され得る。しかし、発話が呼び出しフレーズを含まないと第1のプロセッサが判定するとき、方法500は、動作504から動作506に進むことができる。動作506は、第2のプロセッサをスリープモードから動作モードに遷移させることを回避することを含み得る。言い換えると、第1のプロセッサが発話内で呼び出しフレーズを検出しないので、第1のプロセッサは、別の発話が検出されたかどうかを判定するために動作502に戻る。 Act 508 may include transitioning the second processor from sleep mode to operational mode. Operation 508 may be performed by the first processor in response to the first processor identifying the call phrase. However, method 500 may proceed from operation 504 to operation 506 when the first processor determines that the utterance does not include a call phrase. Act 506 may include avoiding transitioning the second processor from sleep mode to operational mode. In other words, since the first processor does not detect the call phrase within the utterance, the first processor returns to operation 502 to determine if another utterance has been detected.

方法500は、動作508から動作510に進むことができ、動作510は、オーディオデータを第1のプロセッサから第2のプロセッサに提供することを含み得る。オーディオデータは、ユーザによってコンピューティングデバイスに与えられた発話に対応し得る。一部の実装において、第1のプロセッサは、発話が呼び出しフレーズを含むかどうかを判定するために第1の呼び出しフレーズモデルを動作させることができ、一方、第2のプロセッサは、発話が呼び出しフレーズを含んでいたかどうかを判定するために第2の呼び出しフレーズモデルを動作させることができる。第1のモデルは、発話と呼び出しフレーズとの間の対応を特定するためのより低い閾値に対応することが可能であり、一方、第2の呼び出しフレーズモデルは、発話と呼び出しフレーズとの間の対応を決定するための、第1のモデルの閾値に比べてより高い閾値に対応することが可能である。したがって、第2のプロセッサがオーディオデータを受信するとき、第2のプロセッサは、第2の呼び出しフレーズモデルを使用して発話が呼び出しフレーズを含むかどうかを判定することができる。 Method 500 may proceed from operation 508 to operation 510, which may include providing audio data from the first processor to the second processor. The audio data may correspond to utterances given to the computing device by a user. In some implementations, a first processor may operate a first call-phrase model to determine whether an utterance contains a call-phrase, while a second processor determines whether the utterance contains a call-phrase. A second call phrase model can be run to determine whether the The first model can accommodate a lower threshold for identifying correspondence between utterances and call-phrases, while the second call-phrase model can accommodate the correspondence between utterances and call-phrases. It is possible to accommodate a higher threshold than that of the first model for determining correspondence. Thus, when the second processor receives the audio data, the second processor can use the second call phrase model to determine whether the utterance includes the call phrase.

方法500は、オーディオデータが呼び出しフレーズを特徴付ける度合いを第2のプロセッサによって決定する任意の動作512を含み得る。オーディオデータが呼び出しフレーズを特徴付ける度合いは、オーディオデータと呼び出しフレーズとの間の1つまたは複数の類似性を定量化する1つまたは複数の測定基準であることが可能である。このようにして、1つまたは複数の測定基準が、コンピューティングデバイスをその後どのようして動作させるべきかについての決定を行うために後で使用され得る。たとえば、オーディオデータが呼び出しフレーズを特徴付ける度合いを特徴付ける値が、(その他のオーディオデータが処理のために第2のプロセッサに渡されない場合に)スリープモードに遷移して戻る前に第2のプロセッサを動作モードで動作させる時間の量を決定するために使用され得る。 Method 500 may include an optional act 512 of determining, by the second processor, the degree to which the audio data characterizes the call phrase. The degree to which the audio data characterizes the calling phrase can be one or more metrics that quantify one or more similarities between the audio data and the calling phrase. In this manner, one or more metrics may later be used to make decisions about how the computing device should subsequently operate. For example, a value characterizing the degree to which the audio data characterizes the call phrase should be used to operate the second processor before transitioning to sleep mode and back (if no other audio data is passed to the second processor for processing). It can be used to determine the amount of time to run the mode.

一部の実装において、方法500は、オーディオデータによって包含される声の特徴を第2のプロセッサによって決定する任意の動作514を含み得る。第2のプロセッサは、発話を与えたユーザ(たとえば、ユーザの対応するユーザプロファイル)を1人または複数のユーザからの許可の下で特定するために使用され得る声識別モデルを動作させることができる。たとえば、コンピューティングデバイスの各ユーザは、異なるおよび/または一意の声紋でしゃべることが可能であり、これらの違いに基づいて、声識別モデルは、どのユーザが発話を与えたかの予測に対応するランク付けを決定することができる。最も高いランク付けに対応するユーザが、コンピューティングデバイスに発話を与えたユーザとして選択され得る。声識別モデルを使用するユーザの特定は、第2のプロセッサをスリープモードではなく動作モードで動作させる時間の量を決定するためにユーザの許可の下で使用され得る。時間の量は、1人または複数のユーザと自動化されたアシスタントとの間の以前のインタラクションに基づき得る。たとえば、呼び出しフレーズを与えることと後続のコマンドを与えることとの間に概して遅延があるユーザのためにより多くの量の時間が選択されることが可能であり、一方、呼び出しフレーズを与えることと別の後続のコマンドを与えることとの間に概して遅延がない別のユーザのためにより少ない量の時間が選択されることが可能である。 In some implementations, the method 500 may include an optional act 514 of determining, by the second processor, vocal characteristics contained by the audio data. A second processor can operate a voice identification model that can be used, with permission from one or more users, to identify the user who gave the utterance (eg, the user's corresponding user profile). . For example, each user of a computing device may speak with a different and/or unique voiceprint, and based on these differences, a voice identification model may rank corresponding predictions of which user gave the utterance. can be determined. The user corresponding to the highest ranking may be selected as the user who gave the utterance to the computing device. User identification using the voice identification model may be used, with the user's permission, to determine the amount of time the second processor is to operate in active mode rather than sleep mode. The amount of time may be based on previous interactions between one or more users and the automated assistant. For example, a larger amount of time can be selected for users who generally have a delay between giving a call phrase and giving a subsequent command, while A smaller amount of time can be selected for another user with generally no delay between giving subsequent commands of .

方法500は、少なくともユーザとコンピューティングデバイスとの間のインタラクションの1つまたは複数の特徴に基づく時間の量の間、第2のプロセッサによって動作モードで動作する動作516をさらに含み得る。たとえば、一部の実装において、時間の量は、オーディオデータが呼び出しフレーズを特徴付ける度合いに基づき得る。代替的にまたは追加的に、時間の量は、オーディオデータを具現化する1つもしくは複数の声の特徴および/または発話を与えたユーザの特定に基づき得る。代替的にまたは追加的に、時間の量は、時刻、利用可能なコンピューティングデバイスの数、ネットワーク強度(network strength)、コンピューティングデバイスの特定の近さ以内にいるユーザの数、および/またはユーザとコンピューティングデバイスとの間のインタラクションに関連付けられ得る任意のその他の特徴などのユーザとコンピューティングデバイスとの間のインタラクションに対応する1つまたは複数のコンテキストの特徴に基づき得る。 Method 500 may further include operation 516 operating in an operational mode with the second processor for at least an amount of time based on one or more characteristics of the interaction between the user and the computing device. For example, in some implementations the amount of time may be based on the degree to which the audio data characterizes the calling phrase. Alternatively or additionally, the amount of time may be based on the identity of the user who provided the one or more vocal characteristics and/or utterances embodying the audio data. Alternatively or additionally, the amount of time may be the time of day, the number of available computing devices, network strength, the number of users within a particular proximity of a computing device, and/or the user may be based on one or more contextual features corresponding to the interaction between the user and the computing device, such as any other feature that may be associated with the interaction between the user and the computing device.

図6は、第1のプロセッサと第2のプロセッサとの間の動作の違いに対応する決定された時間オフセットを使用してオーディオデータを処理する方法600を示す。方法600は、オーディオデータを処理することができる1つもしくは複数のプロセッサ、アプリケーション、および/または任意のその他の装置もしくはモジュールによって実行され得る。方法600において特定される第1のプロセッサおよび第2のプロセッサは、バッテリおよび/または蓄電器などのポータブル電源によって給電され、自動化されたアシスタントへのアクセスを提供するコンピューティングデバイスに組み込まれ得る。方法600は、コンピューティングデバイスおよび/またはコンピューティングデバイスと通信する別のデバイスにおいて発話が受け取られたかどうかを第1のプロセッサによって判定する動作602を含み得る。特に、第1のプロセッサは、ユーザが1つまたは複数のマイクロフォンに発話を与えたかどうかを判定するために1つまたは複数のマイクロフォンからの出力を処理し得る。発話が受け取られなかったと第1のプロセッサが判定するとき、第1のプロセッサは、発話が1人または複数のユーザによって受け取られたかどうかを判定するために1つまたは複数のマイクロフォンの出力を監視し続けることができる。 FIG. 6 illustrates a method 600 of processing audio data using determined time offsets corresponding to operational differences between a first processor and a second processor. Method 600 may be performed by one or more processors, applications, and/or any other device or module capable of processing audio data. The first processor and second processor identified in method 600 may be powered by a portable power source, such as a battery and/or capacitor, and embedded in a computing device that provides access to automated assistants. The method 600 may include an act of determining 602 by the first processor whether the speech was received at the computing device and/or another device in communication with the computing device. In particular, the first processor may process output from one or more microphones to determine whether a user has spoken to one or more microphones. When the first processor determines that the speech has not been received, the first processor monitors the output of the one or more microphones to determine if the speech has been received by the one or more users. can continue.

発話が検出されたと第1のプロセッサが判定するとき、方法600は、動作602から動作604に進むことができる。動作604は、発話が呼び出しフレーズを含んでいたかどうかを判定することを含み得る。第1のプロセッサは、第1のプロセッサによって実行され得る第1の呼び出しフレーズモデルを使用することによって発話が呼び出しフレーズを含んでいたかどうかを判定することができる。特に、第1の呼び出しフレーズモデルは、発話が呼び出しフレーズを含んでいたかどうかを判定するために1つまたは複数のマイクロフォンの出力を分析するために使用され得る。発話が呼び出しフレーズを含んでいたと判定されるとき、方法600は、動作604から動作608に進むことができる。 The method 600 may proceed from operation 602 to operation 604 when the first processor determines that speech has been detected. Act 604 may include determining whether the utterance included a calling phrase. The first processor may determine whether the utterance included the call phrase by using a first call phrase model that may be executed by the first processor. In particular, a first call phrase model may be used to analyze the output of one or more microphones to determine whether the utterance contained the call phrase. Method 600 may proceed from operation 604 to operation 608 when it is determined that the utterance included a call phrase.

動作608は、第2のプロセッサをスリープモードから動作モードに遷移させることを含み得る。動作608は、発話が呼び出しフレーズを含んでいたと判定することに応じて第1のプロセッサによって初期化され得る。発話が呼び出しフレーズを含んでいなかったと第1のプロセッサが判定するとき、方法600は、動作604から動作606に進むことができる。動作606は、第2のプロセッサをスリープモードから動作モードに遷移させることを回避することを含むことができ、第2のプロセッサをスリープモードから遷移させる代わりに、方法600は、後続の発話が1つまたは複数のマイクロフォンに与えられたかどうかを検出するための動作602に戻ることができる。 Act 608 may include transitioning the second processor from sleep mode to operational mode. Operation 608 may be initiated by the first processor in response to determining that the utterance included the calling phrase. The method 600 may proceed from operation 604 to operation 606 when the first processor determines that the utterance did not include a call phrase. Act 606 may include avoiding transitioning the second processor from the sleep mode to the operational mode, and instead of transitioning the second processor out of sleep mode, the method 600 continues until the subsequent utterance is 1 We can return to operation 602 to detect if one or more microphones have been presented.

方法600は、第2のプロセッサによってコンピューティングデバイスにオーディオ出力データを使用してオーディオ出力をレンダリングさせる動作610をさらに含み得る。オーディオ出力は、コンピューティングデバイスに接続される1つまたは複数のインターフェースによって提供され得る。たとえば、コンピューティングデバイスは、オーディオを発するための1つもしくは複数のスピーカを含むことが可能であり、および/またはコンピューティングデバイスは、1つもしくは複数のスピーカを含む別のコンピューティングデバイスと通信することが可能である。オーディオ出力データは、コンピューティングデバイスが接続されるネットワークを介して受信されたデータに基づき得る、および/またはコンピューティングデバイスのメモリに記憶されるデータに基づき得る。たとえば、オーディオ出力は、コンピューティングデバイスのメモリデバイスに記憶される音楽に対応するオーディオデータを使用してレンダリングされる音楽であることが可能である。オーディオ出力データは、オーディオの一部がコンピューティングデバイスによってレンダリングされたおよび/または1つもしくは複数のスピーカによって出力された時間を示す時間データを含むかまたはそのような時間データに関連付けられ得る。 Method 600 may further include an act 610 of causing, by the second processor, the computing device to render audio output using the audio output data. Audio output may be provided by one or more interfaces that connect to the computing device. For example, a computing device may include one or more speakers for emitting audio, and/or a computing device communicates with another computing device that includes one or more speakers. Is possible. Audio output data may be based on data received over a network to which the computing device is connected and/or may be based on data stored in memory of the computing device. For example, the audio output can be music rendered using audio data corresponding to music stored in a memory device of the computing device. Audio output data may include or be associated with time data indicating when a portion of the audio was rendered by the computing device and/or output by one or more speakers.

方法600は、第2のプロセッサを使用して呼び出しフレーズが検出されたかどうかを判定する動作612に進むことができる。一部の実装において、第1のプロセッサは、デジタル信号プロセッサであることが可能であり、第2のプロセッサは、デバイスSoCであることが可能である。第1のプロセッサは、第1の音声認識モデルを動作させることが可能であり、第2のプロセッサは、第2の音声認識モデルを動作させることが可能である。第1の音声認識モデルは、発話が呼び出しフレーズを含むかどうかを判定するためのより低い閾値を有することが可能であり、第2の音声認識モデルは、発話が呼び出しフレーズを含むかどうかを判定するためのより高い閾値を有することが可能である。一部の実装において、第1のプロセッサは、第2のプロセッサによって処理されるオーディオデータよりも低い品質のオーディオデータを処理し得る。たとえば、第1のプロセッサは、第2のプロセッサが1つまたは複数のマイクロフォンを監視するサンプリングレートに比べてより低いサンプリングレートでコンピューティングデバイスの1つまたは複数のマイクロフォンの出力を監視し得る。代替的にまたは追加的に、第1のプロセッサは、第2のプロセッサによって監視されるオーディオチャネルの数に比べて少ない総数のオーディオチャネルを監視し得る。たとえば、第1のプロセッサは、発話がユーザによって与えられたかどうかを判定するために単一のマイクロフォンを監視することが可能であり、第2のプロセッサは、発話および/または呼び出しフレーズがユーザによって与えられたかどうかを判定するために2つ以上のマイクロフォンを監視することが可能である。 Method 600 may proceed to operation 612 with determining whether a call phrase was detected using the second processor. In some implementations, the first processor can be a digital signal processor and the second processor can be the device SoC. A first processor may run a first speech recognition model and a second processor may run a second speech recognition model. The first speech recognition model can have a lower threshold for determining whether the utterance contains the calling phrase, and the second speech recognition model determines whether the utterance contains the calling phrase. It is possible to have a higher threshold for In some implementations, the first processor may process lower quality audio data than the audio data processed by the second processor. For example, the first processor may monitor the output of one or more microphones of the computing device at a lower sampling rate than the sampling rate at which the second processor monitors the one or more microphones. Alternatively or additionally, the first processor may monitor a smaller total number of audio channels compared to the number of audio channels monitored by the second processor. For example, a first processor may monitor a single microphone to determine if speech is given by the user, and a second processor monitors speech and/or call phrases given by the user. It is possible to monitor more than one microphone to determine if

第2のプロセッサは、オーディオ出力がコンピューティングデバイスによってレンダリングされている間、1つまたは複数のマイクロフォンの出力を監視することができる。呼び出しフレーズがユーザによって与えられたと第2のプロセッサが判定するとき、方法600は、動作612から動作614に進むことができる。呼び出しフレーズがユーザによって与えられたと第2のプロセッサが判定しなかったとき、第2のプロセッサは、コンピューティングデバイスの1つまたは複数のマイクロフォンの出力を監視し続けることができる。動作614は、時間データと、第2のプロセッサによって検出された呼び出しフレーズを特徴付けるオーディオ入力データとの間の時間オフセットを第2のプロセッサによって決定することを含み得る。一部の実装において、時間オフセットは、第1のプロセッサのクロックおよび第2のプロセッサの別のクロックのクロック動作の特徴の間の違いに基づき得る。しかし、一部の実装において、第1のプロセッサおよび第2のプロセッサは、単一のクロックによって動作し得る。 A second processor may monitor the output of one or more microphones while the audio output is rendered by the computing device. The method 600 may proceed from operation 612 to operation 614 when the second processor determines that the call phrase was provided by the user. When the second processor has not determined that the call phrase was provided by the user, the second processor can continue to monitor the output of one or more microphones of the computing device. Act 614 may include determining, by the second processor, a time offset between the time data and audio input data characterizing the call phrase detected by the second processor. In some implementations, the time offset may be based on the difference between the clock behavior characteristics of the clock of the first processor and another clock of the second processor. However, in some implementations, the first processor and the second processor may operate with a single clock.

方法600は、少なくともオーディオ入力データの1つまたは複数の特徴の削除を進めるために時間オフセットを使用してオーディオ入力データを第2のプロセッサによって処理する動作616をさらに含み得る。特に、時間オフセットは、1つまたは複数のマイクロフォンに与えられたオーディオ入力からレンダリングされたオーディオ出力の特徴を削除するためにエコー除去中に使用され得る。第1のプロセッサと第2のプロセッサとの間の時間オフセットを考慮することによって、そうでなければエコー除去中に明らかになったであろう誤りが、取り除かれ得る。これは、ユーザが発話を与えることと、自動化されたアシスタントが発話に応答することとの間のより短いレイテンシーにつながり得る。さらに、コンピューティングデバイスが充電式の電源によって動作するので、電源のそれぞれの完全な充電の動作時間が、少なくとも第2のプロセッサに関してレイテンシーおよび総動作時間を削減することによって引き延ばされ得る。 The method 600 may further include an operation 616 of processing the audio input data by the second processor using the time offsets to facilitate removal of one or more features of at least the audio input data. In particular, the time offset can be used during echo cancellation to remove features in the rendered audio output from the audio input applied to one or more microphones. By accounting for the time offset between the first processor and the second processor, errors that would otherwise become apparent during echo cancellation can be removed. This can lead to shorter latency between the user giving an utterance and the automated assistant responding to the utterance. Further, because the computing device operates from a rechargeable power source, the operating time for each full charge of the power source can be extended by reducing latency and total operating time for at least the second processor.

図7は、バッテリ給電式であるコンピューティングデバイスに含まれるWiFiチップを使用してブロードキャストデバイスに応答データを提供するための方法700を示す。方法は、ネットワークデータを処理することができる1つもしくは複数のアプリケーション、プロセッサ、および/または任意のその他の装置もしくはモジュールによって実行され得る。方法700は、mDNSブロードキャストデータがWiFiチップにおいて受信されたかどうかを判定する動作702を含み得る。mDNSブロードキャストデータがWiFiチップにおいて受信されたと判定されるとき、方法700は、動作704に進むことができる。動作704は、特定の目標ポートがmDNSブロードキャストデータによって特定されるかどうかを判定することを含み得る。動作702においてmDNSブロードキャストデータがWiFiチップにおいて受信されないとき、WiFiチップは、WiFiチップにおいて受信されたデータの任意のパケットがmDNSブロードキャストデータに対応するかどうかを判定するためにネットワークトラフィックを監視し続けることができる。 FIG. 7 illustrates a method 700 for providing response data to a broadcast device using a WiFi chip included in a battery-powered computing device. A method may be performed by one or more applications, processors, and/or any other device or module capable of processing network data. Method 700 may include an act of determining 702 whether mDNS broadcast data was received at the WiFi chip. When it is determined that mDNS broadcast data was received at the WiFi chip, method 700 can proceed to operation 704 . Act 704 may include determining whether a particular target port is identified by the mDNS broadcast data. When no mDNS broadcast data is received at the WiFi chip in act 702, the WiFi chip continues to monitor network traffic to determine if any packets of data received at the WiFi chip correspond to mDNS broadcast data. can be done.

mDNSブロードキャストデータがクライアントデバイスの間でメディアをキャスティングするために指定されたポートなどの特定の目標ポートを特定するとき、方法700は、動作704から動作706に進むことができる。動作706は、WiFiチップのメモリに記憶されたキャッシュされたデータがmDNSブロードキャストデータの1つまたは複数の特徴を特徴付けるかどうかを判定することを含み得る。mDNSブロードキャストデータが特定の目標ポートを特定しないとき、方法700は、動作704から動作702に進むことができ、動作702において、WiFiチップは、ネットワークトラフィックを監視し続けることができる。 Method 700 can proceed from operation 704 to operation 706 when the mDNS broadcast data identifies a particular target port, such as a designated port for casting media between client devices. Act 706 may include determining whether the cached data stored in the WiFi chip's memory characterizes one or more characteristics of the mDNS broadcast data. When the mDNS broadcast data does not identify a particular target port, method 700 can proceed from operation 704 to operation 702, where the WiFi chip can continue to monitor network traffic.

一部の実装では、動作706において、WiFiチップが、mDNSブロードキャストデータをWiFiチップのメモリに記憶されたキャッシュされたデータと比較することができる。たとえば、WiFiチップは、ネットワークを介して以前提供されたデータのパケット、および/またはネットワークを介して受信されたパケットに応じて生成されたデータを記憶し得る。たとえば、WiFiチップは、WiFiチップを含むコンピューティングデバイスが別のブロードキャストデバイスにも含まれるアプリケーションを含むことを示すことによって別のブロードキャストデバイスからのキャスト要求に以前応答した可能性がある。代替的にまたは追加的に、WiFiチップのメモリに記憶されたデータは、ブロードキャスト要求によって1つまたは複数のサービスがコンピューティングデバイスによって使用され得るかどうかを示すことができる。代替的にまたは追加的に、WiFiチップのメモリに記憶されたデータは、コンピューティングデバイスの1つまたは複数のハードウェアの特徴を示すことができる。代替的にまたは追加的に、WiFiチップは、WiFiチップによって記憶されたキャッシュされたデータがmDNSブロードキャストデータに関連する1つまたは複数の特徴を特徴付けるかどうかを判定することができる。このようにして、WiFiチップは、デバイスSoCなどのコンピューティングデバイスの別のプロセッサをウェイクアップすることなくネットワークを介してブロードキャストされた要求に応答することができる。 In some implementations, at operation 706, the WiFi chip may compare the mDNS broadcast data with cached data stored in the WiFi chip's memory. For example, a WiFi chip may store packets of data previously provided over the network and/or data generated in response to packets received over the network. For example, a WiFi chip may have previously responded to a cast request from another broadcasting device by indicating that the computing device containing the WiFi chip contains an application that is also contained in another broadcasting device. Alternatively or additionally, data stored in memory of the WiFi chip may indicate whether one or more services may be used by the computing device with the broadcast request. Alternatively or additionally, the data stored in the WiFi chip's memory may be indicative of one or more hardware characteristics of the computing device. Alternatively or additionally, the WiFi chip can determine whether cached data stored by the WiFi chip characterizes one or more characteristics associated with mDNS broadcast data. In this way, the WiFi chip can respond to requests broadcast over the network without waking up another processor in a computing device such as the device SoC.

方法700は、動作706から動作708に進むことができ、動作708は、キャッシュされたデータに基づいて応答データを生成することを含み得る。動作708は、WiFiチップがmDNSブロードキャストデータに関連する1つまたは複数の特徴を特徴付けるキャッシュされたデータを有するときに実行され得る。たとえば、キャッシュされたデータがmDNSブロードキャストの対象であるアプリケーションを特定するとき、WiFiチップは、コンピューティングデバイスがその特定のアプリケーションを確かに含むことをブロードキャストデバイスに示すために応答データを生成し得る。このようにして、コンピューティングデバイスは、ブロードキャストデータに応答するために別のプロセッサをウェイクアップする必要がなく、それによって、バッテリ給電式デバイスに関しては制限され得る計算リソースおよび/または電力リソースの無駄をなくす。 Method 700 can proceed from operation 706 to operation 708, which can include generating response data based on the cached data. Operation 708 may be performed when the WiFi chip has cached data characterizing one or more characteristics associated with mDNS broadcast data. For example, when cached data identifies an application that is the subject of an mDNS broadcast, the WiFi chip may generate response data to indicate to the broadcasting device that the computing device does contain that particular application. In this way, the computing device does not need to wake up another processor to respond to the broadcast data, thereby wasting computational and/or power resources that may be limited for battery-powered devices. lose.

WiFiチップのキャッシュされたデータがmDNSブロードキャストデータに関連する1つまたは複数の特徴を特徴付けないとき、方法700は、動作706から動作710に進むことができる。動作710は、コンピューティングデバイスのデバイスSoCを第1の動作モードから第2の動作モードに遷移させることを含み得る。一部の実装において、第1の動作モードは、デバイスSoCが第2の動作モードに比べてより少ないプロセスを実行しているモードであることが可能である。代替的にまたは追加的に、第1の動作モードは、第2の動作モードで動作しているときのデバイスSoCの電力消費に比べてデバイスSoCによるより少ない電力消費に対応し得る。 The method 700 may proceed from operation 706 to operation 710 when the WiFi chip's cached data does not characterize one or more characteristics associated with the mDNS broadcast data. Operation 710 may include transitioning device SoC of the computing device from a first mode of operation to a second mode of operation. In some implementations, the first mode of operation can be a mode in which the device SoC is executing fewer processes than the second mode of operation. Alternatively or additionally, the first mode of operation may correspond to less power consumption by the device SoC as compared to power consumption of the device SoC when operating in the second mode of operation.

方法700は、動作708および/または動作710から動作712に進むことができる。動作712は、ブロードキャストされる応答データおよび/またはその他の応答データをコンピューティングデバイスに送信させることを含み得る。その他の応答データは、動作712が実行されるときに少なくとも部分的にデバイスSoCによって生成され得る。たとえば、キャッシュされたデータがmDNSブロードキャストデータに関連するサービスなどの特定の特徴を特定しないとき、デバイスSoCは、デバイスSoCがアクセス可能なデータを使用してmDNSブロードキャストデータに関連する1つまたは複数の特徴を特定することができるその他の応答データを生成するために使用され得る。一部の実装において、キャッシュされたデータは、WiFiチップがそうでなければWiFiチップのメモリを介してアクセス不可能であったデータを送信するタスクを課せられるとき、WiFiチップおよび/またはデバイスSoCによって更新され得る。このようにして、ネットワークを介したその他のクライアントデバイスからの後続の問い合わせまたは要求が、デバイスSoCをウェイクアップすることなくWiFiチップによって応答されることが可能であり、それによって、電力および計算リソースの無駄をなくす。 Method 700 may proceed from operation 708 and/or operation 710 to operation 712 . Act 712 may include causing the computing device to transmit broadcast response data and/or other response data. Other response data may be generated at least in part by device SoC when operation 712 is performed. For example, when the cached data does not identify a particular feature, such as a service related to mDNS broadcast data, the device SoC uses data accessible to the device SoC to determine one or more services related to mDNS broadcast data. It can be used to generate other response data that can be characterized. In some implementations, cached data is stored by the WiFi chip and/or device SoC when the WiFi chip is tasked with transmitting data that was otherwise inaccessible through the WiFi chip's memory. can be updated. In this way, subsequent queries or requests from other client devices over the network can be answered by the WiFi chip without waking up the device SoC, thereby saving power and computational resources. Eliminate waste.

図8は、例示的なコンピュータシステム810のブロック図である。概して、コンピュータシステム810は、バスサブシステム812を介していくつかの周辺デバイスと通信する少なくとも1つのプロセッサ814を含む。これらの周辺デバイスは、たとえば、メモリ825およびファイルストレージサブシステム826を含むストレージサブシステム824、ユーザインターフェース出力デバイス820、ユーザインターフェース入力デバイス822、ならびにネットワークインターフェースサブシステム816を含む可能性がある。入力および出力デバイスは、コンピュータシステム810とのユーザインタラクションを可能にする。ネットワークインターフェースサブシステム816は、外部ネットワークへのインターフェースを提供し、その他のコンピュータシステムの対応するインターフェースデバイスに結合される。 FIG. 8 is a block diagram of an exemplary computer system 810. As shown in FIG. Computer system 810 generally includes at least one processor 814 that communicates with several peripheral devices via a bus subsystem 812 . These peripheral devices may include, for example, storage subsystem 824 including memory 825 and file storage subsystem 826 , user interface output device 820 , user interface input device 822 , and network interface subsystem 816 . Input and output devices allow user interaction with computer system 810 . Network interface subsystem 816 provides an interface to external networks and is coupled to corresponding interface devices of other computer systems.

ユーザインターフェース入力デバイス822は、キーボード、マウス、トラックボール、タッチパッド、もしくはグラフィックスタブレットなどのポインティングデバイス、スキャナ、ディスプレイに組み込まれたタッチスクリーン、音声認識システムなどのオーディオ入力デバイス、マイクロフォン、および/またはその他の種類の入力デバイスを含む可能性がある。概して、用語「入力デバイス」の使用は、コンピュータシステム810または通信ネットワークに情報を入力するためのすべての可能な種類のデバイスおよび方法を含むように意図される。 User interface input devices 822 may include pointing devices such as keyboards, mice, trackballs, touch pads, or graphics tablets, scanners, touch screens integrated into displays, audio input devices such as voice recognition systems, microphones, and/or May include other types of input devices. In general, use of the term "input device" is intended to include all possible types of devices and methods for entering information into computer system 810 or communication networks.

ユーザインターフェース出力デバイス820は、ディスプレイサブシステム、プリンタ、ファックスマシン、またはオーディオ出力デバイスなどの非視覚的表示を含む可能性がある。ディスプレイサブシステムは、ブラウン管(CRT)、液晶ディスプレイ(LCD)などのフラットパネルデバイス、プロジェクションデバイス、または可視画像を生成するための何らかのその他のメカニズムを含む可能性がある。ディスプレイサブシステムは、オーディオ出力デバイスを介するなどして非視覚的表示を提供する可能性もある。概して、用語「出力デバイス」の使用は、コンピュータシステム810からユーザまたは別のマシンもしくはコンピュータシステムに情報を出力するためのすべての可能な種類のデバイスおよび方法を含むように意図される。 User interface output devices 820 may include non-visual displays such as display subsystems, printers, fax machines, or audio output devices. A display subsystem may include a cathode ray tube (CRT), a flat panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for producing a visible image. The display subsystem may also provide non-visual presentation, such as through an audio output device. In general, use of the term "output device" is intended to include all possible types of devices and methods for outputting information from computer system 810 to a user or another machine or computer system.

ストレージサブシステム824は、本明細書において説明されるモジュールの一部またはすべての機能を提供するプログラミングおよびデータ構造を記憶する。たとえば、ストレージサブシステム824は、方法500、方法600、方法700の選択された態様を実行するため、ならびに/または第1のクライアントデバイス134、第2のクライアントデバイス102、第3のクライアントデバイス112、クライアントデバイス202、クライアントデバイス302、サーバデバイス402、コンピューティングデバイス418、および/もしくは本明細書において検討される任意のその他のエンジン、モジュール、チップ、プロセッサ、アプリケーションなどのうちの1つもしくは複数を実装するための論理を含む可能性がある。 Storage subsystem 824 stores programming and data structures that provide the functionality of some or all of the modules described herein. For example, the storage subsystem 824 may perform selected aspects of the methods 500, 600, 700 and/or the first client device 134, second client device 102, third client device 112, implements one or more of the client device 202, client device 302, server device 402, computing device 418, and/or any other engine, module, chip, processor, application, etc. contemplated herein may contain logic to

これらのソフトウェアモジュールは、概して、プロセッサ814によって単独で、またはその他のプロセッサとの組合せで実行される。ストレージサブシステム824において使用されるメモリ825は、プログラムの実行中の命令およびデータの記憶のための主ランダムアクセスメモリ(RAM)830と、決まった命令が記憶される読み出し専用メモリ(ROM)832とを含むいくつかのメモリを含み得る。ファイルストレージサブシステム826は、プログラムおよびデータファイルのための永続的ストレージを提供することができ、ハードディスクドライブ、関連する取り外し可能な媒体をともなうフロッピーディスクドライブ、CD-ROMドライブ、光学式ドライブ、または取り外し可能なメディアカートリッジを含む可能性がある。特定の実装の機能を実装するモジュールは、ストレージサブシステム824内のファイルストレージサブシステム826によって、またはプロセッサ814によりアクセスされ得るその他のマシンに記憶される可能性がある。 These software modules are typically executed by processor 814 alone or in combination with other processors. The memory 825 used in the storage subsystem 824 includes primary random access memory (RAM) 830 for storage of instructions and data during program execution, and read only memory (ROM) 832 where fixed instructions are stored. may contain several memories including The file storage subsystem 826 can provide persistent storage for program and data files and can be a hard disk drive, a floppy disk drive with associated removable media, a CD-ROM drive, an optical drive, or a removable drive. It may contain possible media cartridges. Modules implementing the functionality of a particular implementation may be stored by file storage subsystem 826 in storage subsystem 824 or on other machines that may be accessed by processor 814 .

バスサブシステム812は、コンピュータシステム810の様々な構成要素およびサブシステムに意図されたように互いに通信させるためのメカニズムを提供する。バスサブシステム812は単一のバスとして概略的に示されているが、バスサブシステムの代替的な実装は複数のバスを使用する可能性がある。 Bus subsystem 812 provides a mechanism for allowing the various components and subsystems of computer system 810 to communicate with each other as intended. Although bus subsystem 812 is shown schematically as a single bus, alternate implementations of the bus subsystem may use multiple buses.

コンピュータシステム810は、ワークステーション、サーバ、コンピューティングクラスタ、ブレードサーバ、サーバファーム、または任意のその他のデータ処理システムもしくはコンピューティングデバイスを含む様々な種類であることが可能である。コンピュータおよびネットワークの変わり続ける性質が原因で、図8に示されたコンピュータシステム810の説明は、いくつかの実装を示すことを目的とする特定の例としてのみ意図される。図8に示されたコンピュータシステムよりも多くのまたは図8に示されたコンピュータシステムよりも少ない構成要素を有するコンピュータシステム810の多くのその他の構成が、可能である。 Computer system 810 can be of various types including workstations, servers, computing clusters, blade servers, server farms, or any other data processing system or computing device. Due to the ever-changing nature of computers and networks, the description of computer system 810 shown in FIG. 8 is intended only as a specific example, intended to illustrate some implementations. Many other configurations of computer system 810 having more or fewer components than the computer system shown in FIG. 8 are possible.

本明細書において説明されたシステムがユーザ(もしくは本明細書においては「参加者」と呼ばれることが多い)についての個人情報を収集するかまたは個人情報を利用する可能性がある状況において、ユーザは、プログラムまたは特徴がユーザ情報(たとえば、ユーザのソーシャルネットワーク、ソーシャルな行為もしくは活動、職業、ユーザの好み、またはユーザの現在の地理的位置についての情報)を収集するかどうかを制御するか、あるいはユーザにより関連性がある可能性があるコンテンツをコンテンツサーバから受信するべきかどうかおよび/またはどのようにして受信するべきかを制御する機会を与えられる可能性がある。また、特定のデータが、個人を特定できる情報が削除されるように、データが記憶されるかまたは使用される前に1つまたは複数の方法で処理される可能性がある。たとえば、ユーザのアイデンティティ(identity)が、個人を特定できる情報がユーザに関して決定され得ないか、または地理的位置情報が取得される場合にユーザの地理的位置が(都市、郵便番号、もしくは州のレベルまでになど)一般化される可能性があり、したがって、ユーザの特定の地理的位置が決定され得ないように処理される可能性がある。したがって、ユーザは、情報がユーザについてどのようにして収集されるかおよび/または使用されるかを制御することができる可能性がある。 In situations where the systems described herein may collect or utilize personal information about a user (or "participant" as often referred to herein), the user may , controls whether a program or feature collects user information (e.g., information about a user's social networks, social activities or activities, occupation, user preferences, or user's current geographic location); A user may be given the opportunity to control if and/or how potentially relevant content should be received from a content server. Also, certain data may be processed in one or more ways before the data is stored or used such that personally identifiable information is removed. For example, the user's identity cannot be determined about the user, or the user's geographic location (such as city, zip code, or state) when geolocation information is obtained. level) and thus may be treated such that the user's specific geographic location cannot be determined. Accordingly, users may be able to control how information is collected and/or used about them.

いくつかの実装が本明細書において説明され、図示されたが、本明細書において説明された機能を実行するならびに/あるいは結果および/または利点のうちの1つもしくは複数を得るための様々なその他の手段および/または構造が利用される可能性があり、そのような変更および/または修正の各々は本明細書において説明された実装の範囲内にあるとみなされる。より広く、本明細書において説明されたすべてのパラメータ、寸法、材料、および構成は、例示的であるように意図されており、実際のパラメータ、寸法、材料、および/または構成は、教示が使用される特定の1つの応用または複数の応用に依存する。当業者は、本明細書において説明された特定の実装の多くの均等物を通常の実験だけを使用して認識するかまたは突き止めることができる。したがって、上述の実装は単に例として提示されており、添付の請求項およびその均等物の範囲内で、実装が、特に説明され、特許請求されたのとは異なる方法で実施される可能性があることを理解されたい。本開示の実装は、本明細書において説明されたそれぞれの個々の特徴、システム、品物、材料、キット、および/または方法を対象とする。さらに、2つ以上のそのような特徴、システム、品物、材料、キット、および/または方法の任意の組合せは、そのような特徴、システム、品物、材料、キット、および/または方法が相互に矛盾しない場合は本開示の範囲に含まれる。 Although several implementations have been described and illustrated herein, various other implementations may be used to perform the functions and/or obtain one or more of the results and/or advantages described herein. means and/or structures may be utilized and each such change and/or modification is considered to be within the scope of the implementations described herein. More broadly, all parameters, dimensions, materials and configurations described herein are intended to be exemplary and actual parameters, dimensions, materials and/or configurations may vary depending on the teachings used. depending on the particular application or applications being used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific implementations described herein. Thus, the above-described implementations are presented by way of example only, and within the scope of the appended claims and equivalents thereof, implementations may be practiced otherwise than as specifically described and claimed. It should be understood that there is Implementations of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. Further, any combination of two or more such features, systems, articles, materials, kits, and/or methods shall not be construed as mutually exclusive of such features, systems, articles, materials, kits, and/or methods. otherwise it is within the scope of this disclosure.

一部の実装においては、方法が、マイクロフォンの出力をコンピューティングデバイスの第1のプロセッサにおいて処理することであって、出力が、ユーザによってマイクロフォンに与えられた発話に対応し、コンピューティングデバイスが、発話がユーザによって与えられるときにスリープモードで動作している第2のプロセッサを含む、処理することなどの動作を含むものとして説明される。方法は、出力がコンピューティングデバイスを介してアクセスされ得る自動化されたアシスタントを呼び出すための呼び出しフレーズに少なくとも部分的に対応するかどうかを第1のプロセッサにおいて判定することをさらに含み得る。方法は、出力が呼び出しフレーズに少なくとも部分的に対応すると第1のプロセッサが判定するとき、第1のプロセッサによって第2のプロセッサをスリープモードから動作モードに遷移させることと、マイクロフォンの出力を特徴付けるデータを第1のプロセッサによって第2のプロセッサに提供することと、データが呼び出しフレーズを特徴付ける度合いを第1のプロセッサから受信されたデータに基づいて第2のプロセッサによって決定することと、データが呼び出しフレーズを特徴付ける度合いに基づいて、第2のプロセッサが動作モードのままであるウェイク時間の量を第2のプロセッサによって決定することと、第2のプロセッサのためのウェイク時間の量を決定することに応じて、第2のプロセッサを、少なくともウェイク時間の量の間、動作モードで動作させることとをさらに含み得る。 In some implementations, the method is processing the output of the microphone in a first processor of the computing device, the output corresponding to speech provided to the microphone by the user, the computing device: It is described as including operations such as processing, including a second processor operating in sleep mode when an utterance is given by a user. The method may further include determining at the first processor whether the output corresponds, at least in part, to a call phrase for calling an automated assistant accessible via the computing device. The method includes transitioning, by the first processor, a second processor from a sleep mode to an operational mode when the first processor determines that the output corresponds at least partially to a call phrase; to a second processor by a first processor; determining by a second processor the degree to which the data characterizes the call phrase based on the data received from the first processor; determining by the second processor an amount of wake time during which the second processor remains in the operational mode based on the degree characterizing the and causing the second processor to operate in the active mode for at least the amount of wake time.

一部の実装において、方法は、少なくともウェイク時間の量の間、第2のプロセッサが動作モードで動作しているとき、ユーザまたは別のユーザからの別個の発話を特徴付ける追加のデータを第1のプロセッサから第2のプロセッサにおいて受信することと、第2のプロセッサによって追加のデータに基づいて自動化されたアシスタントに別個の発話に応答させることとをさらに含み得る。一部の実装において、第1のプロセッサは、第1の音声認識モデルを動作させ、第2のプロセッサは、第1の音声認識モデルとは異なる第2の音声認識モデルを動作させる。一部の実装において、第1の音声認識モデルは、データが呼び出しフレーズを特徴付ける別の度合いを決定するための第1の正確性の閾値に関連付けられ、第2の音声認識モデルは、データが呼び出しフレーズを特徴付ける度合いを決定するための、第1の正確性の閾値とは異なる第2の正確性の閾値に関連付けられる。一部の実装において、第2の正確性の閾値は、口で言われた入力と呼び出しフレーズとの間の相関のより大きな度合いによって満たされ、相関のより大きな度合いは、第1の正確性の閾値を満たすための相関の度合いを基準とする。 In some implementations, the method transfers additional data characterizing a distinct utterance from the user or another user to the first processor when the second processor is operating in the operational mode for at least an amount of wake time. It may further include receiving at a second processor from the processor and having the automated assistant respond to the separate utterances based on the additional data by the second processor. In some implementations, a first processor operates a first speech recognition model and a second processor operates a second speech recognition model different from the first speech recognition model. In some implementations, a first speech recognition model is associated with a first accuracy threshold for determining another degree to which the data characterizes the call phrase, and a second speech recognition model is associated with the data characterizing the call phrase. Associated with a second accuracy threshold, different from the first accuracy threshold, for determining the degree of characterization of the phrase. In some implementations, the second accuracy threshold is met by a greater degree of correlation between the spoken input and the calling phrase, and the greater degree of correlation is The degree of correlation to satisfy a threshold is used as a criterion.

一部の実装において、第1のプロセッサは、デジタル信号プロセッサ(DSP)であり、第2のプロセッサは、デバイスシステムオンチップ(SoC)であり、コンピューティングデバイスは、デバイスSoCが動作モードであるときに第1のプロセッサおよび第2のプロセッサに電力を提供する1つまたは複数のバッテリを含む。一部の実装において、第2のプロセッサが動作モードのままであるウェイク時間の量を決定することは、第2のプロセッサのために指定されたウェイク時間の以前決定された量を特定することを含み、ウェイク時間の以前決定された量は、ユーザが発話を与える前のユーザと自動化されたアシスタントとの間の1つまたは複数のインタラクションに基づく。一部の実装において、方法は、出力が呼び出しフレーズに少なくとも部分的に対応しないと第1のプロセッサが判定するとき、第2のプロセッサをスリープモードから動作モードに遷移させることを第1のプロセッサによって回避することをさらに含み得る。 In some implementations, the first processor is a digital signal processor (DSP), the second processor is a device system-on-chip (SoC), and the computing device is includes one or more batteries that provide power to the first processor and the second processor. In some implementations, determining the amount of wake time during which the second processor remains in operational mode includes specifying a previously determined amount of wake time designated for the second processor. Including, the previously determined amount of wake time is based on one or more interactions between the user and the automated assistant before the user gives the speech. In some implementations, the method includes, by the first processor, transitioning the second processor from the sleep mode to the operational mode when the first processor determines that the output does not at least partially correspond to the calling phrase. It can further include avoiding.

一部の実装において、方法は、出力が呼び出しフレーズに少なくとも部分的に対応すると第1のプロセッサが判定するとき、マイクロフォンの出力によって特徴付けられるユーザの声の特徴を、マイクロフォンの出力を特徴付けるデータに基づいて第2のプロセッサによって決定することをさらに含むことが可能であり、第2のプロセッサが動作モードのままであるウェイク時間の量を決定することが、マイクロフォンの出力によって特徴付けられるユーザの声の特徴にさらに基づく。 In some implementations, the method converts features of the user's voice characterized by the output of the microphone into data characterizing the output of the microphone when the first processor determines that the output corresponds at least partially to the calling phrase. Determining an amount of wake time during which the second processor remains in the operating mode may further include determining by the second processor based on the user's voice characterized by the output of the microphone. further based on the characteristics of

その他の実装においては、方法が、マイクロフォンの出力をコンピューティングデバイスの第1のプロセッサにおいて処理することであって、出力が、ユーザによってマイクロフォンに与えられた発話に対応し、コンピューティングデバイスが、発話がユーザによって与えられるときにスリープモードで動作している第2のプロセッサを含む、処理することなどの動作を含むものとして説明される。一部の実装において、方法は、出力がコンピューティングデバイスを介してアクセスされ得る自動化されたアシスタントを呼び出すための呼び出しフレーズに少なくとも部分的に対応するかどうかを第1のプロセッサにおいて判定することをさらに含み得る。一部の実装において、方法は、出力が呼び出しフレーズに少なくとも部分的に対応すると第1のプロセッサが判定するとき、第1のプロセッサによって第2のプロセッサをスリープモードから動作モードに遷移させることと、マイクロフォンの出力によって特徴付けられる声の特徴を第2のプロセッサによって決定することと、第2のプロセッサが動作モードのままであるウェイク時間の量を、出力によって特徴付けられる声の特徴に基づいて第2のプロセッサによって決定することと、第2のプロセッサのためのウェイク時間の量を決定することに基づいて、少なくともウェイク時間の量の間、第2のプロセッサを動作モードによって動作させることとをさらに含み得る。 In other implementations, the method is processing the output of the microphone in a first processor of the computing device, the output corresponding to speech provided to the microphone by the user, the computing device processing the speech is described as including operations such as processing, including a second processor operating in sleep mode when a is provided by a user. In some implementations, the method further comprises determining at the first processor whether the output corresponds, at least in part, to a call phrase for calling an automated assistant that can be accessed via the computing device. can contain. In some implementations, the method comprises transitioning a second processor from a sleep mode to an operational mode by the first processor when the first processor determines that the output corresponds at least in part to the calling phrase; determining, by a second processor, a voice characteristic characterized by the output of the microphone; and determining, by a second processor, an amount of wake time during which the second processor remains in an operating mode based on the vocal characteristic characterized by the output. and operating the second processor in the operating mode for at least the amount of wake time based on determining the amount of wake time for the second processor. can contain.

一部の実装において、方法は、第2のプロセッサが動作モードによって動作した後、第2のプロセッサがその後スリープモードによって動作しているとき、マイクロフォンからの別の出力が自動化されたアシスタントを呼び出すための呼び出しフレーズに少なくとも部分的に対応するとコンピューティングデバイスの第1のプロセッサにおいて判定することであって、その他の入力が、別個のユーザが別個の発話をマイクロフォンに与えることに応じて与えられる、判定することと、第1のプロセッサによって第2のプロセッサをスリープモードから動作モードに遷移させることと、マイクロフォンからの別の出力によって特徴付けられる別の声の特徴を別の出力に基づいて第2のプロセッサによって決定することと、第2のプロセッサが動作モードのままであるウェイク時間の別の量を、別の出力によって特徴付けられる声の特徴に基づいて第2のプロセッサによって決定することであって、ウェイク時間の別の量が、ウェイク時間の量と異なる、決定することと、第2のプロセッサのためのウェイク時間の量を決定することに基づいて、少なくともウェイク時間の別の量の間、第2のプロセッサを動作モードによって動作させることとをさらに含み得る。 In some implementations, the method is such that after the second processor operates through the operating mode, when the second processor is subsequently operating through the sleep mode, another output from the microphone invokes an automated assistant. determining in a first processor of the computing device that the other input corresponds at least partially to the calling phrase of transitioning a second processor from a sleep mode to an operational mode by the first processor; determining another voice characteristic characterized by another output from the microphone to a second voice characteristic based on the another output; determining by the processor and determining by the second processor another amount of wake time during which the second processor remains in the operating mode based on voice characteristics characterized by the different output; , the different amount of wake time is different than the amount of wake time, and based on determining the amount of wake time for the second processor, for at least the different amount of wake time; operating the second processor according to the operating mode.

一部の実装において、第2のプロセッサは、発話がユーザおよび/または別個のユーザによってマイクロフォンに与えられたかどうかを判定するときに声特徴モデルを動作させる。一部の実装において、コンピューティングデバイスは、第2のプロセッサが動作モードによって動作しているときに第1のプロセッサおよび第2のプロセッサに電力を提供する1つまたは複数のバッテリを含む。一部の実装において、ウェイク時間の量は、ユーザが発話を与える前のユーザと自動化されたアシスタントとの間の1つまたは複数のインタラクションに基づく。 In some implementations, the second processor operates a voice feature model when determining whether speech was presented to the microphone by the user and/or a separate user. In some implementations, the computing device includes one or more batteries that provide power to the first processor and the second processor when the second processor is operating according to the operating mode. In some implementations, the amount of wake time is based on one or more interactions between the user and the automated assistant before the user gives a speech.

さらにその他の実装においては、方法が、コンピューティングデバイスのマイクロフォンへの入力がコンピューティングデバイスを介してアクセスされ得る自動化されたアシスタントを呼び出すための呼び出しフレーズに少なくとも部分的に対応するとコンピューティングデバイスのプロセッサによって判定することなどの動作を含むものとして説明される。方法は、マイクロフォンへの入力に基づいてプロセッサによってコンピューティングデバイスの別のプロセッサをスリープモードから動作モードに遷移させることをさらに含み得る。方法は、別のプロセッサがスリープモードから動作モードに遷移した後、コンピューティングデバイスと通信する1つまたは複数のスピーカを介してコンピューティングデバイスによって提供されるオーディオ出力を特徴付ける第1のデータを別のプロセッサによって生成することであって、第1のデータが、別のプロセッサが第1のデータを生成した時間を特徴付ける第1の時間データを含む、生成すること、別の入力がコンピューティングデバイスのマイクロフォンに与えられたとプロセッサによって判定すること、コンピューティングデバイスのマイクロフォンへの別の入力を特徴付ける第2のデータをプロセッサによって生成することであって、第2のデータが、プロセッサが第2のデータを生成した別の時間を特徴付ける第2の時間データを含む、生成すること、別のプロセッサが第1のデータを生成した時間とプロセッサが第2のデータを生成した別の時間との間の時間オフセットを別のプロセッサによって決定すること、1つまたは複数のスピーカによって提供されるオーディオ出力の1つまたは複数の特徴の削除を進めるために別のプロセッサによって時間オフセットを使用して第2のデータを処理すること、マイクロフォンへの別の入力がコンピューティングデバイスを介してアクセスされ得る自動化されたアシスタントを呼び出すための発話に対応するかどうかを、時間オフセットを使用して第2のデータを処理することに基づいて別のプロセッサによって判定することをさらに含み得る。方法は、マイクロフォンへの別の入力が自動化されたアシスタントを呼び出すための発話に対応すると判定されるとき、別のプロセッサによって自動化されたアシスタントにコンピューティングデバイスと通信するインターフェースを介して応答出力を提供させることをさらに含み得る。 In still other implementations, the method is such that the input to the computing device's microphone corresponds, at least in part, to a calling phrase for calling an automated assistant that can be accessed via the computing device's processor. It is described as including operations such as determining by. The method may further include causing the processor to transition another processor of the computing device from a sleep mode to an operational mode based on the input to the microphone. The method transfers first data characterizing audio output provided by the computing device via one or more speakers in communication with the computing device to another processor after the other processor transitions from sleep mode to operational mode. generating by a processor, wherein the first data includes first time data characterizing a time when another processor generated the first data; another input is a microphone of the computing device and generating, by the processor, second data characterizing another input to the microphone of the computing device, the second data being the second data that the processor generates the second data generating, including second time data characterizing another time at which the other processor generated the first data and a time offset between the time at which the other processor generated the first data and another time at which the processor generated the second data; Determining by another processor, processing the second data using the time offset by another processor to advance deletion of one or more features of the audio output provided by one or more speakers. Based on processing the second data using the time offset, whether another input to the microphone corresponds to speech to invoke an automated assistant that may be accessed via the computing device. and determining by another processor. The method provides a response output by another processor to the automated assistant through an interface in communication with the computing device when the other input to the microphone is determined to correspond to an utterance to invoke the automated assistant. may further include causing.

一部の実装において、オーディオ出力の1つまたは複数の特徴の削除を進めるために時間オフセットを使用して第2のデータを処理することは、第2のデータおよびオーディオデータを使用して音響エコー除去(AEC: acoustic echo cancellation)プロセスを実行することを含む。一部の実装において、時間オフセットは、プロセッサのクロックおよび別のプロセッサの別のクロックのクロック動作の特徴の違いに対応する。一部の実装において、時間オフセットは、クロックを使用して決定された第1のクロック値と別のクロックを使用して決定された第2のクロック値との間の違いに基づく。一部の実装において、第1のクロック値および第2のクロック値は、別のプロセッサが動作モードであるときに決定される。一部の実装において、時間オフセットは、クロック値の間の違いの比率に別の時間に対応する時間の値をかけることによって決定される。一部の実装において、コンピューティングデバイスは、別のプロセッサが動作モードによって動作しているときにプロセッサおよび別のプロセッサに電力を提供する1つまたは複数のバッテリを含む。一部の実装において、プロセッサは、デジタル信号プロセッサ(DSP)であり、別のプロセッサは、デバイスシステムオンチップ(SoC)である。 In some implementations, processing the second data using the time offsets to advance the removal of one or more features of the audio output is an acoustic echo using the second data and the audio data. Including performing an acoustic echo cancellation (AEC) process. In some implementations, the time offset corresponds to a difference in clock behavior characteristics of a processor's clock and another clock of another processor. In some implementations, the time offset is based on the difference between a first clock value determined using a clock and a second clock value determined using another clock. In some implementations, the first clock value and the second clock value are determined when the separate processor is in operational mode. In some implementations, the time offset is determined by multiplying the ratio of the difference between the clock values by the time value corresponding to another time. In some implementations, a computing device includes one or more batteries that provide power to a processor and another processor when the other processor is operating according to the operating mode. In some implementations, the processor is a digital signal processor (DSP) and another processor is a device system-on-chip (SoC).

さらにその他の実装においては、方法が、ブロードキャストデバイスからのマルチキャストドメインネームシステム(mDNS)のブロードキャストされたデータをブロードキャストデバイスからコンピューティングデバイスのWiFiチップにおいて受信することであって、コンピューティングデバイスが、コンピューティングデバイスのWiFiチップがmDNSのブロードキャストされたデータを受信するときに第1の動作モードで動作しているデバイスシステムオンチップ(SoC)を含む、受信することなどの動作を含むものとして説明される。方法は、mDNSのブロードキャストされたデータによって特定された目標ポートがコンピューティングデバイスを介してアクセスされ得る特定のポートに対応するかどうかをmDNSのブロードキャストされたデータに基づいてWiFiチップによって判定することをさらに含み得る。方法は、mDNSのブロードキャストされたデータによって特定された目標ポートがコンピューティングデバイスを介してアクセスされ得る特定のポートに対応するとき、デバイスSoCが第1の動作モードで動作しているときに、WiFiチップがアクセス可能なメモリデバイスに記憶されるキャッシュされたブロードキャストデバイスデータに、特定のポートに対応する目標ポートに基づいてアクセスすることと、キャッシュされたブロードキャストデバイスデータがmDNSのブロードキャストされたデータによって指定されたブロードキャストデバイスの1つまたは複数の特徴を特徴付けるかどうかをメモリに記憶されたキャッシュされたブロードキャストデバイスデータに基づいて判定することと、キャッシュされたブロードキャストデバイスデータがブロードキャストデバイスの1つまたは複数の特徴を特徴付けるとき、キャッシュされたブロードキャストデバイスデータに基づいて応答データを生成することと、応答データをブロードキャストデバイスに送信することとをさらに含み得る。 In yet another implementation, the method includes receiving multicast Domain Name System (mDNS) broadcast data from the broadcasting device at a WiFi chip of a computing device, the computing device including the device system-on-chip (SoC) operating in the first mode of operation when the WiFi chip of the mobile device receives the mDNS broadcasted data. . The method comprises determining by the WiFi chip based on the mDNS broadcast data whether the target port identified by the mDNS broadcast data corresponds to a particular port that can be accessed via the computing device. It can contain more. The method uses WiFi when the device SoC is operating in a first mode of operation when the target port identified by the mDNS broadcast data corresponds to a specific port that can be accessed via the computing device. Accessing cached broadcast device data stored in a chip-accessible memory device based on a target port corresponding to a particular port, and specifying the cached broadcast device data by mDNS broadcast data. determining based on the cached broadcast device data stored in memory whether the cached broadcast device data characterizes one or more characteristics of the broadcast device; Characterizing the feature may further include generating response data based on the cached broadcast device data and transmitting the response data to the broadcast device.

一部の実装において、方法は、mDNSのブロードキャストされたデータによって特定された目標ポートがコンピューティングデバイスを介してアクセスされ得る特定のポートに対応するとき、およびキャッシュされたブロードキャストデバイスデータがブロードキャストデバイスの1つまたは複数の特徴を特徴付けないとき、キャッシュされたブロードキャストデバイスデータが1つまたは複数の特徴を特徴付けないことに基づいてデバイスSoCを第1の動作モードから第2の動作モードに遷移させることであって、第2の動作モードが、第1の動作モードで動作しているときのデバイスSoCの電力消費に比べてデバイスSoCによるより高い電力消費に関連付けられる、遷移させることをさらに含み得る。 In some implementations, the method includes when the target port identified by the mDNS broadcast data corresponds to a specific port that can be accessed via the computing device, and when the cached broadcast device data is the broadcast device's transitioning the device SoC from the first mode of operation to the second mode of operation based on the cached broadcast device data not characterizing the one or more characteristics when the one or more characteristics are not characterized wherein the second mode of operation is associated with higher power consumption by the device SoC as compared to power consumption of the device SoC when operating in the first mode of operation; .

一部の実装において、コンピューティングデバイスは、デバイスSoCが第2の動作モードによって動作しているときにWiFiチップおよびデバイスSoCに電力を提供する1つまたは複数のバッテリを含む。一部の実装において、キャッシュされたブロードキャストデバイスデータがブロードキャストデバイスの1つまたは複数の特徴を特徴付けるかどうかを判定することは、キャッシュされたブロードキャストデバイスデータがブロードキャストデバイスからのmDNSのブロードキャストされたデータの送信を初期化したアプリケーションを特定するかどうかを判定することを含む。一部の実装において、キャッシュされたブロードキャストデバイスデータがブロードキャストデバイスの1つまたは複数の特徴を特徴付けるかどうかを判定することは、キャッシュされたブロードキャストデバイスデータがブロードキャストデバイスによって要求されているサービスを特定するかどうかを判定することを含む。一部の実装において、方法は、mDNSのブロードキャストされたデータによって特定された目標ポートがコンピューティングデバイスを介してアクセスされ得る特定のポートに対応するとき、およびキャッシュされたブロードキャストデバイスデータがブロードキャストデバイスの1つまたは複数の特徴を特徴付けないとき、mDNSのブロードキャストされたデータに基づいてその他の応答データをデバイスSoCに生成させることと、WiFiチップによってその他の応答データをブロードキャストデバイスに送信することとをさらに含み得る。 In some implementations, the computing device includes one or more batteries that provide power to the WiFi chip and the device SoC when the device SoC is operating according to the second mode of operation. In some implementations, determining whether the cached broadcast device data characterizes the one or more characteristics of the broadcast device is determined by determining whether the cached broadcast device data is mDNS broadcast data from the broadcast device. Including determining whether to identify the application that initiated the transmission. In some implementations, determining whether the cached broadcast device data characterizes one or more characteristics of the broadcast device identifies the service for which the cached broadcast device data is requested by the broadcast device. including determining whether In some implementations, the method includes when the target port identified by the mDNS broadcast data corresponds to a specific port that can be accessed via the computing device, and when the cached broadcast device data is the broadcast device's having the device SoC generate other response data based on the mDNS broadcast data when not characterizing the one or more characteristics; and transmitting the other response data to the broadcasting device by the WiFi chip. It can contain more.

100 図
102 第2のクライアントデバイス
106 WiFiチップ
108 デバイスSoC
110 ポータブル電源
112 第3のクライアントデバイス
116 WiFiチップ
118 デバイスSoC
122 スタンドアロンのスピーカデバイス
124 ユーザ
126 セルラ電話
130 mDNSデータ
132 mDNSデータ
134 第1のクライアントデバイス
136 自動化されたアシスタント
138 アシスタントインターフェース
140 メモリ
142 メモリ
144 アプリケーション
146 応答データ
148 応答データ
150 図
200 図
202 クライアントデバイス
204 デジタル信号プロセッサDSP
208 デバイスSoC
210 電源
212 オーディオデータ
214 第1のモデル
216 第2のモデル
218 発話
220 ユーザ
222 コンピューティングデバイス
230 図
232 オーディオデータ
234 発話
240 図
242 オーディオデータ
244 発話
300 図
302 クライアントデバイス
306 WiFiチップ
308 デバイスSoC
312 オーディオデータ
318 発話
320 ユーザ
322 スタンドアロンのスピーカデバイス
324 ウェイク時間
400 システム
402 サーバデバイス
404 自動化されたアシスタント
406 入力処理エンジン
408 音声処理モジュール
410 データ解析モジュール
412 パラメータモジュール
416 アシスタントデータ
418 コンピューティングデバイス
420 アシスタントインターフェース
422 クライアントの自動化されたアシスタント
424 時間オフセットエンジン
426 電力エンジン
430 オーディオ処理エンジン
432 クライアントデータ
434 WiFiチップ、その他のコンピューティングデバイス
436 メモリ
438 ブロードキャストエンジン
440 ネットワーク、音声認識モデル
444 デバイスSoC
446 電源
448 ウェイク時間エンジン
500 方法
600 方法
810 コンピュータシステム
812 バスサブシステム
814 プロセッサ
816 ネットワークインターフェースサブシステム
820 ユーザインターフェース出力デバイス
822 ユーザインターフェース入力デバイス
824 ストレージサブシステム
825 メモリ
826 ファイルストレージサブシステム
830 主ランダムアクセスメモリ(RAM)
832 読み出し専用メモリ(ROM) 100 figures
102 second client device
106 WiFi chip
108 device SoCs
110 portable power
112 Third client device
116 WiFi Chip
118 device SoCs
122 stand-alone speaker device
124 users
126 cellular phone
130 mDNS data
132 mDNS data
134 first client device
136 Automated Assistant
138 Assistant Interface
140 memory
142 memory
144 applications
146 response data
148 response data
150 figure
200 figures
202 client devices
204 Digital Signal Processor DSP
208 device SoCs
210 power supply
212 audio data
214 first model
216 second model
218 Utterance
220 users
222 Computing Devices
230 figure
232 audio data
234 Utterance
240 figure
242 audio data
244 Utterance
300 figures
302 client device
306 WiFi chip
308 device SoCs
312 audio data
318 Utterance
320 users
322 stand-alone speaker device
324 wake time
400 system
402 Server Device
404 Automated Assistant
406 Input Processing Engine
408 Audio Processing Module
410 Data Analysis Module
412 Parameter Module
416 Assistant Data
418 Computing Devices
420 assistant interface
422 Client Automated Assistant
424 hour offset engine
426 Power Engine
430 audio processing engine
432 Client Data
434 WiFi chips and other computing devices
436 memory
438 Broadcast Engine
440 network, speech recognition model
444 device SoCs
446 power supply
448 Wake Time Engine
500 ways
600 ways
810 computer system
812 Bus Subsystem
814 processor
816 network interface subsystem
820 User Interface Output Device
822 user interface input device
824 storage subsystem
825 memory
826 file storage subsystem
830 main random access memory (RAM)
832 Read Only Memory (ROM)

Claims

receiving multicast Domain Name System (mDNS) broadcast data from a broadcasting device at a WiFi chip of a computing device from said broadcasting device;
said computing device comprises a device system-on-chip (SoC) operating in a first mode of operation when said WiFi chip of said computing device receives said mDNS broadcasted data;
determining by the WiFi chip based on the mDNS broadcast data whether a target port identified by the mDNS broadcast data corresponds to a specific port that can be accessed via the computing device; and,
when the target port identified by the mDNS broadcast data corresponds to the specific port that can be accessed via the computing device;
cached broadcast device data stored in a memory device accessible to the WiFi chip when the device SoC is operating in the first mode of operation; accessing based on
determining whether the cached broadcast device data characterizes one or more characteristics of the broadcast device specified by the mDNS broadcast data based on the cached broadcast device data stored in the memory device; a step of determining
when the cached broadcast device data characterizes one or more characteristics of the broadcast device;
generating response data based on the cached broadcast device data;
and sending said response data to said broadcast device.

when the target port specified by the mDNS broadcast data corresponds to the specified port that can be accessed via the computing device, and the cached broadcast device data is one of the broadcast devices or When not characterizing multiple features,
transitioning the device SoC from the first mode of operation to a second mode of operation based on the cached broadcast device data not characterizing the one or more characteristics; is associated with higher power consumption by the device SoC as compared to power consumption of the device SoC when operating in the first mode of operation. Method.

3. The computing device of claim 2, wherein the computing device includes one or more batteries that provide power to the WiFi chip and the device SoC when the device SoC is operating according to the second mode of operation. Method.

determining whether the cached broadcast device data characterizes the one or more characteristics of the broadcast device;
2. The method of claim 1, comprising determining whether the cached broadcast device data identifies an application that initiated transmission of the mDNS broadcast data from the broadcast device.

determining whether the cached broadcast device data characterizes the one or more characteristics of the broadcast device;
5. The method of claim 4, comprising determining whether the cached broadcast device data identifies a service requested by the broadcast device.

determining whether the cached broadcast device data characterizes the one or more characteristics of the broadcast device;
2. The method of claim 1, comprising determining whether the cached broadcast device data identifies a service requested by the broadcast device.

when the target port specified by the mDNS broadcast data corresponds to the specified port that can be accessed via the computing device, and the cached broadcast device data is one of the broadcast devices or When not characterizing multiple features,
causing the device SoC to generate other response data based on the mDNS broadcast data;
and sending the other response data to the broadcast device by the WiFi chip.

A portable computing device,
one or more speakers;
one or more batteries;
a device system-on-chip (SoC) at least selectively powered by the one or more batteries;
a WiFi chip at least selectively powered by said one or more batteries;
at least selectively executing WiFi chip instructions stored by the WiFi chip,
receiving multicast Domain Name System (mDNS) broadcast data from a broadcasting device over a local network, comprising:
the device SoC is operating in a first mode of operation when the WiFi chip receives the mDNS broadcast data;
determining based on the mDNS broadcast data whether a target port identified by the mDNS broadcast data corresponds to a particular port that may be accessed via the portable computing device;
when the target port identified by the mDNS broadcast data corresponds to the specific port that can be accessed via the portable computing device;
cached broadcast device data stored in a memory device accessible to the WiFi chip when the device SoC is operating in the first mode of operation; accessing based on
determining whether the cached broadcast device data characterizes one or more characteristics of the broadcast device specified by the mDNS broadcast data based on the cached broadcast device data stored in the memory device; a step of determining
when the cached broadcast device data characterizes one or more characteristics of the broadcast device;
generating response data based on the cached broadcast device data;
A portable computing device that performs the step of transmitting the response data to the broadcast device over the local network.

When the WiFi chip executes the stored WiFi chip instructions,
when the target port specified by the mDNS broadcast data corresponds to the specified port that can be accessed via the portable computing device, and the cached broadcast device data is one of the broadcast devices; or when not characterizing multiple features,
transitioning the device SoC from the first mode of operation to a second mode of operation based on the cached broadcast device data not characterizing the one or more characteristics; is associated with higher power consumption by the device SoC as compared to power consumption of the device SoC when operating in the first mode of operation. portable computing device.

10. The portable computing device of claim 9, wherein the one or more batteries power the device SoC only when the device SoC is operating in the second mode of operation.

In determining whether the cached broadcast device data characterizes the one or more characteristics of the broadcast device, the WiFi chip:
9. The portable computing device of claim 8, determining whether the cached broadcast device data identifies an application that initiated transmission of the mDNS broadcast data from the broadcast device.

In determining whether the cached broadcast device data characterizes the one or more characteristics of the broadcast device, the WiFi chip:
12. The portable computing device of claim 11, determining whether the cached broadcast device data identifies a service requested by the broadcast device.

In determining whether the cached broadcast device data characterizes the one or more characteristics of the broadcast device, the WiFi chip:
9. The portable computing device of claim 8, determining whether the cached broadcast device data identifies a service requested by the broadcast device.

When the WiFi chip executes the stored WiFi chip instructions,
when the target port specified by the mDNS broadcast data corresponds to the specified port that can be accessed via the portable computing device, and the cached broadcast device data is one of the broadcast devices; or when not characterizing multiple features,
causing the device SoC to generate other response data based on the mDNS broadcast data;
9. The portable computing device of claim 8, further comprising transmitting the other response data to the broadcast device over the local network.

When executed by a WiFi chip in a computing device, said WiFi chip:
receiving multicast Domain Name System (mDNS) broadcast data from a broadcasting device at a WiFi chip of a computing device from said broadcasting device;
said computing device comprises a device system-on-chip (SoC) operating in a first mode of operation when said WiFi chip of said computing device receives said mDNS broadcasted data;
determining by the WiFi chip based on the mDNS broadcast data whether a target port identified by the mDNS broadcast data corresponds to a specific port that can be accessed via the computing device; and,
when the target port identified by the mDNS broadcast data corresponds to the specific port that can be accessed via the computing device;
cached broadcast device data stored in a memory device accessible to the WiFi chip when the device SoC is operating in the first mode of operation; accessing based on
determining whether the cached broadcast device data characterizes one or more characteristics of the broadcast device specified by the mDNS broadcast data based on the cached broadcast device data stored in the memory device; a step of determining
when the cached broadcast device data characterizes one or more characteristics of the broadcast device;
generating response data based on the cached broadcast device data;
and transmitting the response data to the broadcast device.