TWI833678B - Generative chatbot system for real multiplayer conversational and method thereof - Google Patents
Generative chatbot system for real multiplayer conversational and method thereof Download PDFInfo
- Publication number
- TWI833678B TWI833678B TW112135679A TW112135679A TWI833678B TW I833678 B TWI833678 B TW I833678B TW 112135679 A TW112135679 A TW 112135679A TW 112135679 A TW112135679 A TW 112135679A TW I833678 B TWI833678 B TW I833678B
- Authority
- TW
- Taiwan
- Prior art keywords
- message
- portable device
- server host
- response
- timing
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 230000004044 response Effects 0.000 claims abstract description 97
- 238000013473 artificial intelligence Methods 0.000 claims abstract description 41
- 239000013598 vector Substances 0.000 claims abstract description 31
- 230000002123 temporal effect Effects 0.000 claims description 8
- 230000006399 behavior Effects 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 6
- 230000008921 facial expression Effects 0.000 claims description 5
- 230000035790 physiological processes and functions Effects 0.000 claims description 5
- 230000009118 appropriate response Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 4
- 239000008186 active pharmaceutical agent Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 208000025967 Dissociative Identity disease Diseases 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000036651 mood Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 235000002198 Annona diversifolia Nutrition 0.000 description 1
- 244000303258 Annona diversifolia Species 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 241000287828 Gallus gallus Species 0.000 description 1
- 206010021703 Indifference Diseases 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 241000590419 Polygonia interrogationis Species 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000036772 blood pressure Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 210000000554 iris Anatomy 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
Description
本發明涉及一種聊天機器人之系統及其方法,特別是真實多人應答情境下的生成式聊天機器人之系統及其方法。The present invention relates to a chat robot system and a method thereof, in particular to a generative chat robot system and a method thereof in a real multi-person response situation.
近年來,隨著人工智慧的普及與蓬勃發展,各種人工智慧的應用便如雨後春筍般地湧現。其中,又以聊天機器人最受矚目。In recent years, with the popularization and vigorous development of artificial intelligence, various artificial intelligence applications have mushroomed. Among them, chatbots attract the most attention.
一般而言,傳統的聊天機器人通常是與使用者進行一對一的對話,也就是說,當使用者傳送問題時,聊天機器人才根據問題進行回應。然而,目前尚未有聊天機器人能夠在真實的多人應答情境下,主動給予合適的應答建議或提示,舉例來說,在多人對話的環境中,傳統的聊天機器人無法主動且快速地給予使用者合適的對話建議。因此,具有聊天主動性及應答效率不佳的問題。Generally speaking, traditional chatbots usually have one-on-one conversations with users. That is to say, when users send questions, the chatbot responds based on the questions. However, there is currently no chatbot that can proactively give appropriate response suggestions or prompts in a real multi-person response situation. For example, in a multi-person conversation environment, traditional chatbots cannot proactively and quickly provide users with appropriate response suggestions or tips. Suitable conversation suggestions. Therefore, they have the initiative to chat and answer questions with poor efficiency.
綜上所述,可知先前技術在長期以來一直存在聊天主動性及應答效率不佳的問題,因此實有必要提出改進的技術手段,來解決此一問題。To sum up, it can be seen that the previous technology has had problems of poor chat initiative and poor response efficiency for a long time. Therefore, it is necessary to propose improved technical means to solve this problem.
本發明揭露一種真實多人應答情境下的生成式聊天機器人之系統及其方法。The invention discloses a system and method of a generative chat robot in a real multi-person response situation.
首先,本發明揭露一種真實多人應答情境下的生成式聊天機器人之系統,此系統包含:人工智慧裝置、可攜式裝置及伺服端主機。其中,人工智慧裝置用以通過應用程式介面(Application Programming Interface, API)接收上下文訊息及其相應的時序邏輯,並且一併輸入至大型語言模型(Large Language Model, LLM)以產生應答訊息,再通過此應用程式介面傳送所述應答訊息。所述可攜式裝置包含:感測器、揚聲器、儲存裝置及語音處理器。其中,感測器用以持續感測多個語音訊號;揚聲器用以輸出反饋語音;儲存裝置用以儲存與語音訊號相應的多個特徵向量及其相應的多個文字訊息,每一文字訊息包含時序標記及分類標記;以及語音處理器電性連接感測器、揚聲器及儲存裝置,所述語音處理器被配置為:通過梅爾頻率倒譜係數(Mel-Frequency Cepstral Coefficients, MFCC)將感測到的所述語音訊號轉換為相應的所述特徵向量,用以對語音訊號進行分類;執行語音轉文字(Speech-to-Text)處理,將所述語音訊號分別轉換為相應的文字訊息;基於時序關係及分類結果,在對應語音訊號的文字訊息中嵌入時序標記及分類標記且儲存至儲存裝置作為所述上下文訊息;以及當接收到隨選對話訊息時,執行文字轉語音(Text-to-Speech)處理,將此隨選對話訊息轉換為反饋語音以通過揚聲器輸出。接著,所述伺服端主機連接人工智慧裝置及可攜式裝置,此伺服端主機包含:非暫態電腦可讀儲存媒體及硬體處理器。其中,所述非暫態電腦可讀儲存媒體用以儲存多個電腦可讀指令;以及所述硬體處理器電性連接非暫態電腦可讀儲存媒體,用以執行多個電腦可讀指令,使伺服端主機執行:持續自可攜式裝置的儲存裝置載入上下文訊息,並且根據文字訊息中嵌入的時序標記及分類標記判斷多人對話的時序邏輯,此時序邏輯包含對話的人數、時序及主題;將上下文訊息及時序邏輯傳送至人工智慧裝置,並且自人工智慧裝置接收相應的應答訊息以儲存至應答清單;以及自動從應答清單中,選擇所述應答訊息至少其中之一以作為隨選對話訊息,並且將此隨選對話訊息傳送至可攜式裝置。First, the present invention discloses a system for generating chat robots in a real multi-person response situation. This system includes: an artificial intelligence device, a portable device and a server host. Among them, the artificial intelligence device is used to receive context information and its corresponding timing logic through an application programming interface (Application Programming Interface, API), and input it into a large language model (Large Language Model, LLM) to generate a response message, and then through This API sends the response message. The portable device includes: a sensor, a speaker, a storage device and a voice processor. Among them, the sensor is used to continuously sense multiple voice signals; the speaker is used to output feedback voice; the storage device is used to store multiple feature vectors corresponding to the voice signals and multiple corresponding text messages. Each text message includes a timing mark. and a classification mark; and a voice processor electrically connected to the sensor, speaker and storage device, the voice processor is configured to: use Mel-Frequency Cepstral Coefficients (MFCC) to convert the sensed The voice signal is converted into the corresponding feature vector to classify the voice signal; speech-to-text processing is performed to convert the voice signal into corresponding text messages respectively; based on the timing relationship and classification results, embedding timing marks and classification marks in the text messages corresponding to the voice signals and storing them in the storage device as the context messages; and when receiving the on-demand dialogue message, executing text-to-speech (Text-to-Speech) Processing, converting this on-demand dialogue message into feedback speech for output through the speaker. Then, the server host is connected to the artificial intelligence device and the portable device. The server host includes: a non-transitory computer-readable storage medium and a hardware processor. Wherein, the non-transitory computer-readable storage medium is used to store a plurality of computer-readable instructions; and the hardware processor is electrically connected to the non-transitory computer-readable storage medium and is used to execute a plurality of computer-readable instructions. , causing the server host to execute: continuously load context information from the storage device of the portable device, and determine the timing logic of multi-person conversations based on the timing marks and classification marks embedded in the text message. This timing logic includes the number of people in the conversation, the timing and topic; transmit context information and timing logic to the artificial intelligence device, and receive corresponding response messages from the artificial intelligence device to store in the response list; and automatically select at least one of the response messages from the response list as a subsequent Select a conversation message and send the on-demand conversation message to the portable device.
另外,本發明還揭露一種真實多人應答情境下的生成式聊天機器人之方法,其步驟包括:將伺服端主機分別與人工智慧裝置及可攜式裝置相互連接,其中,人工智慧裝置通過應用程式介面接收上下文訊息及其相應的時序邏輯,以及傳送應答訊息;可攜式裝置通過感測器持續感測多個語音訊號,並且通過梅爾頻率倒譜係數將感測到的語音訊號轉換為相應的特徵向量,用以對所述語音訊號進行分類;可攜式裝置執行語音轉文字處理,將語音訊號分別轉換為相應的文字訊息;可攜式裝置基於時序關係及分類結果,在對應語音訊號的文字訊息中嵌入時序標記及分類標記且儲存至可攜式裝置的儲存裝置作為上下文訊息;伺服端主機持續自可攜式裝置的儲存裝置載入上下文訊息,並且根據其中嵌入的時序標記及分類標記判斷多人對話的時序邏輯,所述時序邏輯包含對話的人數、時序及主題;伺服端主機將上下文訊息及時序邏輯傳送至人工智慧裝置,用以輸入至人工智慧裝置的大型語言模型以產生相應的應答訊息,再通過應用程式介面將產生的應答訊息傳送至伺服端主機;伺服端主機將應答訊息儲存至應答清單,並且自動從應答清單中,選擇所述應答訊息至少其中之一以作為隨選對話訊息,再將此隨選對話訊息傳送至可攜式裝置;以及可攜式裝置接收到隨選對話訊息時,執行文字轉語音處理,將隨選對話訊息轉換為反饋語音以通過揚聲器輸出。In addition, the present invention also discloses a method for generating a chat robot in a real multi-person response situation. The steps include: connecting the server host to an artificial intelligence device and a portable device respectively, wherein the artificial intelligence device passes an application program The interface receives context messages and their corresponding timing logic, and transmits response messages; the portable device continuously senses multiple voice signals through the sensor, and converts the sensed voice signals into corresponding The feature vector is used to classify the speech signal; the portable device performs speech-to-text processing to convert the speech signals into corresponding text messages; the portable device converts the corresponding speech signal based on the timing relationship and classification results. The timing mark and classification mark are embedded in the text message and stored in the storage device of the portable device as context information; the server host continues to load the context information from the storage device of the portable device, and based on the timing mark and classification embedded therein Tags determine the timing logic of multi-person conversations. The timing logic includes the number of people, timing and topics of the conversation; the server host transmits contextual information and timing logic to the artificial intelligence device for input to a large language model of the artificial intelligence device to generate The corresponding response message is then sent to the server host through the application program interface; the server host stores the response message in the response list, and automatically selects at least one of the response messages from the response list as on-demand conversation message, and then transmit the on-demand conversation message to the portable device; and when the portable device receives the on-demand conversation message, it performs text-to-speech processing to convert the on-demand conversation message into feedback voice for use through the speaker output.
本發明所揭露之系統與方法如上,與先前技術的差異在於本發明是透過偵測多個語音訊號以轉換為相應的特徵向量及文字訊息,並且在文字訊息中嵌入時序標記及分類標記以儲存為上下文訊息,以便伺服端主機判斷多人對話的時序邏輯,再將上下文訊息及時序邏輯傳送至人工智慧裝置以助其確定當前對話階段、主題演變及預測對話發展,進而主動生成應答訊息並儲存至伺服端主機,以及由伺服端主機篩選出合適的應答訊息以傳送至可攜式裝置輸出。The system and method disclosed by the present invention are as above. The difference from the prior art is that the present invention detects multiple speech signals to convert them into corresponding feature vectors and text messages, and embeds timing marks and classification marks in the text messages for storage. It is contextual information that allows the server host to determine the timing logic of multi-person conversations, and then transmits the contextual information and timing logic to the artificial intelligence device to help it determine the current conversation stage, topic evolution, and predict the development of the conversation, and then actively generate and store response messages. to the server host, and the server host filters out appropriate response messages and sends them to the portable device for output.
透過上述的技術手段,本發明可以達成提高聊天主動性及應答效率之技術功效。Through the above technical means, the present invention can achieve the technical effect of improving chat initiative and response efficiency.
以下將配合圖式及實施例來詳細說明本發明之實施方式,藉此對本發明如何應用技術手段來解決技術問題並達成技術功效的實現過程能充分理解並據以實施。The embodiments of the present invention will be described in detail below with reference to the drawings and examples, so that the implementation process of how to apply technical means to solve technical problems and achieve technical effects of the present invention can be fully understood and implemented accordingly.
首先,請先參閱「第1圖」,「第1圖」為本發明真實多人應答情境下的生成式聊天機器人之系統的系統方塊圖,此系統包含:人工智慧裝置110、可攜式裝置120及伺服端主機130。其中,人工智慧裝置110用以通過應用程式介面接收上下文訊息及其相應的時序邏輯,並且一併輸入至大型語言模型以產生應答訊息,再通過應用程式介面傳送應答訊息。在實際實施上,所述人工智慧裝置110是使用大型語言模型的聊天機器人,所述大型語言模型如:生成型預訓練變換模型(Generative Pre-trained Transformer, GPT)、PaLM、Galactica、LLaMA、LaMDA或其相似物,並且能夠根據上下文訊息及其相應的時序邏輯,確定當前對話階段、主題演變及預測對話的發展,進而將預測對話作為應答訊息。First, please refer to "Figure 1". "Figure 1" is a system block diagram of a generative chat robot system in a real multi-person response situation of the present invention. This system includes: an
在可攜式裝置120的部分,其包含:感測器121、揚聲器122、儲存裝置123及語音處理器124。其中,感測器121用以持續感測多個語音訊號。在實際實施上,感測器121還可感測用戶的生理狀態、臉部表情及肢體動作至少其中之一以生成用戶行為訊息,並且將此用戶行為訊息傳送至伺服端主機130,由伺服端主機130判斷用戶的個性以設定個性參數。舉例來說,可以感測血壓、心跳、脈搏、血糖等生理特徵來判斷生理狀態,如:高興、興奮、沮喪等等;或是通過感測人臉、虹膜等等來判斷臉部表情及心情等等,以便通過生理狀態、臉部表情及心情來判斷用戶的個性,如:外向、內向、熱情、冷淡等等。The
揚聲器122用以輸出反饋語音。在實際實施上,揚聲器可包含耳機、喇叭或其相似物。除此之外,可攜式裝置120還可包含顯示元件,用以在揚聲器122輸出反饋語音時,同步在顯示元件顯示隨選對話訊息。在實際實施上,顯示元件可包含:顯示器、點矩陣發光二極體或其相似物。The
儲存裝置123用以儲存與語音訊號相應的多個特徵向量及其相應的多個文字訊息,每一文字訊息皆包含時序標記及分類標記。在實際實施上,所述儲存裝置123可包含硬碟、光碟、快閃記憶體或其相似物。除此之外,儲存裝置123還會將所有嵌入時序標記及分類標記的文字訊息一併作為上下文訊息。The
語音處理器124電性連接感測器121、揚聲器122及儲存裝置123,此語音處理器124被配置為:通過梅爾頻率倒譜係數將感測到的語音訊號轉換為相應的特徵向量,用以對語音訊號進行分類;執行語音轉文字處理,將語音訊號轉換為相應的文字訊息;基於時序關係及分類結果,在對應語音訊號的文字訊息中嵌入時序標記及分類標記且儲存至儲存裝置123;以及當接收到隨選對話訊息時,執行文字轉語音處理,將隨選對話訊息轉換為反饋語音以通過揚聲器輸出,例如:通過有線或無線(藍牙)的耳機、喇叭或其相似物輸出所述反饋語音。在實際實施上,語音處理器124可以使用專用於處理語音訊號的處理器,如:數位訊號處理器(Digital Signal Processing, DSP)來實現。除此之外,可攜式裝置120更包含將用戶的語音訊號,通過梅爾頻率倒譜係數轉換為特徵向量以傳送至伺服端主機130,由伺服端主機130與預設的多個個性特徵向量進行比對以判斷出用戶的個性,並且根據判斷結果設定個性參數。The
接著,在伺服端主機130的部分,其連接人工智慧裝置110及可攜式裝置120,所述伺服端主機130包含:非暫態電腦可讀儲存媒體131及硬體處理器132。其中,非暫態電腦可讀儲存媒體用以儲存多個電腦可讀指令。在實際實施上,所述電腦可讀指令是由伺服端主機130執行,而執行本發明操作的電腦可讀指令可以是組合語言指令、指令集架構指令、機器指令、機器相關指令、微指令、韌體指令、或者以一種或多種程式語言的任意組合編寫的原始碼或目的碼(Object Code),所述程式語言包括物件導向的程式語言,如:Common Lisp、Python、C++、Objective-C、Smalltalk、Delphi、Java、Swift、C#、Perl、Ruby與PHP等,以及常規的程序式(Procedural)程式語言,如:C語言或類似的程式語言。Next, the
硬體處理器133電性連接非暫態電腦可讀儲存媒體131,用以執行所述多個電腦可讀指令,使伺服端主機130執行:持續自可攜式裝置120的儲存裝置123載入上下文訊息,並且根據其中嵌入的時序標記及分類標記判斷多人對話的時序邏輯,此時序邏輯包含對話的人數、時序及主題;將上下文訊息及時序邏輯傳送至人工智慧裝置110,並且自人工智慧裝置110接收相應的應答訊息以儲存至應答清單;以及自動從應答清單中,選擇所述應答訊息至少其中之一以作為隨選對話訊息,並且將此隨選對話訊息傳送至可攜式裝置120。在實際實施上,硬體處理器133可以是中央處理器、微處理器或其相似物。另外,以多人對話的時序邏輯為例,可以從分類標記的分類數量判斷人數,從時序標記判斷對話的先後順序,從上下文訊息的內容判斷主題,如搭配時間及關鍵字 ,舉例來說,假設時間為中午,關鍵字為「吃甚麼」,可以將主題判斷為「午餐討論」。另外,所述隨選對話訊息可從應答清單中隨機篩選出符合個性參數的應答訊息以作為隨選對話訊息,所述個性參數允許由可攜式裝置120連線至伺服端主機130進行設定。The hardware processor 133 is electrically connected to the non-transitory computer-
特別要說明的是,在實際實施上,本發明可部分地或完全地基於硬體來實現,例如,系統中的一個或多個元件可以透過積體電路晶片、系統單晶片(System on Chip, SoC)、複雜可程式邏輯裝置(Complex Programmable Logic Device, CPLD)、現場可程式邏輯閘陣列(Field Programmable Gate Array, FPGA)等硬體處理器(Hardware Processor)來實現。本發明所述的非暫態電腦可讀儲存媒體,其上載有用於使處理器實現本發明的各個方面的電腦可讀指令(或稱為電腦程式指令),非暫態電腦可讀儲存媒體可以是可以保持和儲存由指令執行設備使用的指令的有形設備。非暫態電腦可讀儲存媒體可以是但不限於電儲存設備、磁儲存設備、光儲存設備、電磁儲存設備、半導體儲存設備或上述的任意合適的組合。電腦可讀儲存媒體的更具體的例子(非窮舉的列表)包括:硬碟、隨機存取記憶體、唯讀記憶體、快閃記憶體、光碟、軟碟以及上述的任意合適的組合。此處所使用的非暫態電腦可讀儲存媒體不被解釋爲瞬時訊號本身,諸如無線電波或者其它自由傳播的電磁波、通過波導或其它傳輸媒介傳播的電磁波(例如,通過光纖電纜的光訊號)、或者通過電線傳輸的電訊號。另外,此處所描述的電腦可讀指令可以從非暫態電腦可讀儲存媒體下載到各個計算/處理設備,或者通過網路,例如:網際網路、區域網路、廣域網路及/或無線網路下載到外部電腦設備或外部儲存設備。所述網路可以包括銅傳輸電纜、光纖傳輸、無線傳輸、路由器、防火牆、交換器、集線器及/或閘道器。每一個計算/處理設備中的網路卡或者網路介面從網路接收電腦可讀指令,並轉發此電腦可讀指令,以供儲存在各個計算/處理設備中的非暫態電腦可讀儲存媒體中。It should be noted that in actual implementation, the present invention can be implemented partially or completely based on hardware. For example, one or more components in the system can be implemented through an integrated circuit chip or a system on chip (System on Chip, SoC), Complex Programmable Logic Device (CPLD), Field Programmable Gate Array (FPGA) and other hardware processors (Hardware Processor). The non-transitory computer-readable storage medium of the present invention carries computer-readable instructions (or computer program instructions) for causing the processor to implement various aspects of the present invention. The non-transitory computer-readable storage medium can A tangible device that can hold and store instructions for use by an instruction execution device. The non-transitory computer-readable storage medium may be, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: hard disks, random access memory, read-only memory, flash memory, optical disks, floppy disks, and any suitable combination of the foregoing. As used herein, non-transitory computer-readable storage media is not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical signals through fiber optic cables), Or electrical signals transmitted through wires. In addition, the computer-readable instructions described herein can be downloaded from a non-transitory computer-readable storage medium to various computing/processing devices, or through a network, such as the Internet, a local area network, a wide area network and/or a wireless network Download to an external computer device or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, hubs and/or gateways. A network card or network interface in each computing/processing device receives computer-readable instructions from the network and forwards the computer-readable instructions for non-transitory computer-readable storage in each computing/processing device in the media.
請參閱「第2A圖」及「第2B圖」,「第2A圖」及「第2B圖」為本發明真實多人應答情境下的生成式聊天機器人之方法的方法流程圖,其步驟包括:將伺服端主機130分別與人工智慧裝置110及可攜式裝置120相互連接,其中,人工智慧裝置110通過應用程式介面接收上下文訊息及其相應的時序邏輯,以及傳送應答訊息(步驟210);可攜式裝置120通過感測器持續感測多個語音訊號,並且通過梅爾頻率倒譜係數將感測到的語音訊號轉換為相應的特徵向量,用以對所述語音訊號進行分類(步驟220);可攜式裝置120執行語音轉文字處理,將語音訊號分別轉換為相應的文字訊息(步驟230);可攜式裝置120基於時序關係及分類結果,在對應語音訊號的文字訊息中嵌入時序標記及分類標記且儲存至可攜式裝置120的儲存裝置123作為上下文訊息(步驟240);伺服端主機130持續自可攜式裝置120的儲存裝置123載入上下文訊息,並且根據其中嵌入的時序標記及分類標記判斷多人對話的時序邏輯,所述時序邏輯包含對話的人數、時序及主題(步驟250);伺服端主機130將上下文訊息及時序邏輯傳送至人工智慧裝置110,用以輸入至人工智慧裝置110的大型語言模型以產生相應的應答訊息,再通過應用程式介面將產生的應答訊息傳送至伺服端主機130(步驟260);伺服端主機130將應答訊息儲存至應答清單,並且自動從應答清單中,選擇所述應答訊息至少其中之一以作為隨選對話訊息,再將此隨選對話訊息傳送至可攜式裝置120(步驟270);以及可攜式裝置120接收到隨選對話訊息時,執行文字轉語音處理,將隨選對話訊息轉換為反饋語音以通過揚聲器122輸出(步驟280)。透過上述步驟,即可透過偵測多個語音訊號以轉換為相應的特徵向量及文字訊息,並且在文字訊息中嵌入時序標記及分類標記以儲存為上下文訊息,以便伺服端主機130判斷多人對話的時序邏輯,再將上下文訊息及時序邏輯傳送至人工智慧裝置110以助其確定當前對話階段、主題演變及預測對話發展,進而主動生成應答訊息並儲存至伺服端主機130,以及由伺服端主機130篩選出合適的應答訊息以傳送至可攜式裝置120輸出。Please refer to "Figure 2A" and "Figure 2B". "Figure 2A" and "Figure 2B" are method flow charts of the method of the generative chat robot in a real multi-person response situation of the present invention. The steps include: The
以下配合「第3圖」至「第5圖」以實施例的方式進行如下說明,如「第3圖」所示意,「第3圖」為應用本發明的可攜式裝置之示意圖。在實際實施上,可攜式裝置120可以是智慧型手機300、錄音筆、個人數位助理(Personal Digital Assistant, PDA)等具有收音功能的可攜式裝置,其透過能夠收音的感測器,如:麥克風,持續感測人聲的語音訊號,並且通過 MFCC 的技術將感測到的語音訊號轉換為相應的特徵向量,用以對語音訊號進行分類。以智慧型手機300為例,假設透過麥克風310收音獲得多個語音訊號,並且轉換後共有三種特徵向量,那麼,代表有三個人可能在對話。接著,智慧型手機300會執行 STT 處理,將每一個語音訊號轉換為相應的文字訊息,並且基於時序及分類嵌入相應的時序標記與分類標記,其中,時序標記可包含時間、日期等等;分類標記可包含文字、數字、符號至少其中之一,用以指明不同的人員,例如:以「A」代表第一個人、以「B」代表第二個人,並以此類推,或者以「U01」代表第一個人、以「U02」代表第二個人,並以此類推。特別要說明的是,倘若麥克風310持續進行收音,則智慧型手機300會持續將其轉換為相應的特徵向量及文字訊息,以及為每一個文字訊息嵌入時序標記及分類標記,並且將儲存在儲存裝置123中的所有或指定時段(如:30分鐘內)的文字訊息一併視為上下文訊息。如此一來,伺服端主機130可以持續從儲存裝置123載入上下文訊息,並且據以判斷多人對話的時序邏輯,其包含對話的人數、時序及主題。其中,判斷人數的方式可根據分類標記的種類數量來判斷,假設有三種分類標記代表有三個人;判斷時序可根據時序標記來判斷對話先後順序;判斷主題可根據上下文訊息的內容,針對關鍵字或字詞出現的頻率或時間點進行判斷,例如,在中午提及多種食物或餐飲字詞,則可判斷主題為討論午餐。在實際實施上,上下文訊息可如「第3圖」所示意依時序顯示在智慧型手機300的顯示元件301。另外,智慧型手機300可通過藍牙耳機320輸出反饋語音。The following description is provided in the form of embodiments with reference to "Fig. 3" to "Fig. 5". As shown in "Fig. 3", "Fig. 3" is a schematic diagram of a portable device applying the present invention. In actual implementation, the
如「第4圖」所示意,「第4圖」為本發明的上下文訊息及時序邏輯之示意圖。在實際實施上,上下文訊息410可包含時序標記、分類標記及文字訊息。在此上下文訊息410的基礎上,伺服端主機130可以根據時序標記判斷對話的先後順序(即:對話時序),並且可賦予具有唯一性的序號作為區隔,例如可記錄為「01 -> 02 -> 03」代表各文字訊息的先後順序;根據分類標記判斷對話人數,例如:有「A」、「B」及「C」三種分類,故可判斷對話人數為三人;根據關鍵字「午餐」及時間點(如:中午時段)判斷對話主題為「討論午餐」。此時,伺服端主機130可根據上述判斷結果產生相應的時序邏輯420。特別要說明的是,在實際實施上,除了以上述舉例呈現上下文訊息410及時序邏輯420之外,兩者亦可整合在一起,如「第4圖」所示意的上下文訊息暨時序邏輯430。另外,在傳送至人工智慧裝置110以獲得相應的應答訊息時,可以通過分類標記的分類指定產生適用於此分類的人員的應答訊息。舉例來說,假設要獲得適用於「A」的應答訊息,可以在傳送上下文訊息及時序邏輯時,加入「請產生訊息供A應答」的要求。如此一來,人工智慧裝置110即可根據上下文訊息及時序邏輯,回傳相應的至少一個應答訊息至伺服端主機130以儲存至應答清單,甚至在具有上述要求的情況下,還可以只回傳滿足上述要求的應答訊息,甚至是在對話主題改變時,藉由指令來要求指定的對話主題,進而達成跨主題應答,舉例來說,在傳送上下文訊息及時序邏輯時,同時加入「在對話主題為M的前提下產生訊息供A應答」的要求,其中,M代表不同的對話主題,如:討論午餐、討論飲料等等,以便允許由使用者指定某一對話主題進行提示與回應。在實際實施上,上述要求可透過可攜式裝置120進行輸入或設定,如:通過語音輸入或鍵入文字、數字、符號等等的方式進行設定。As shown in "Figure 4", "Figure 4" is a schematic diagram of the context information and timing logic of the present invention. In actual implementation, the
如「第5圖」所示意,「第5圖」為應用本發明在應答清單中主動篩選出應答訊息之示意圖。假設應答清單500中已存在多筆應答訊息,伺服端主機130可以從中篩選出符合個性參數的應答訊息以作為隨選對話訊息,舉例來說,假設個性參數設定為「冷淡」,那麼伺服端主機130在選擇應答訊息時,將排除存在具有延伸對話或引導對話(如:含有問號)的應答訊息,以此例而言,將選擇「我想吃雞排飯」作為隨選對話訊息,並且將其傳送至可攜式裝置120以轉換為反饋語音,進而通過可攜式裝置120的揚聲器122輸出,如「第3圖」所示意,通過與智慧型手機300連接的藍牙耳機320輸出。在實際實施上,所述個性參數可通過使用者自行設定、由伺服端主機130根據可攜式裝置120感測到的用戶行為訊息判斷用戶的個性並加以設定、由伺服端主機130根據預設的多個個性特徵向量與用戶的特徵向量進行比對後,判斷出用戶的個性並據以設定。舉例來說,可將低沉聲音的特徵向量視為代表「冷淡」的個性特徵向量、將高昂聲音的特徵向量視為代表「熱情」的個性特徵向量,當可攜式裝置120的使用者,其語音的特徵向量與代表「冷淡」的個性特徵向量相符時,伺服端主機130可將其個性參數設定為「冷淡」。As shown in "Figure 5", "Figure 5" is a schematic diagram of applying the present invention to actively filter out response messages from the response list. Assuming that there are multiple response messages in the
綜上所述,可知本發明與先前技術之間的差異在於透過偵測多個語音訊號以轉換為相應的特徵向量及文字訊息,並且在文字訊息中嵌入時序標記及分類標記以儲存為上下文訊息,以便伺服端主機判斷多人對話的時序邏輯,再將上下文訊息及時序邏輯傳送至人工智慧裝置以助其確定當前對話階段、主題演變及預測對話發展,進而主動生成應答訊息並儲存至伺服端主機,以及由伺服端主機篩選出合適的應答訊息以傳送至可攜式裝置輸出,藉由此一技術手段可以解決先前技術所存在的問題,進而達成提高聊天主動性及應答效率之技術功效。To sum up, it can be seen that the difference between the present invention and the prior art is to detect multiple voice signals to convert them into corresponding feature vectors and text messages, and to embed timing marks and classification marks in the text messages to store them as contextual information. , so that the server host can judge the timing logic of multi-person conversations, and then transmit the contextual information and timing logic to the artificial intelligence device to help it determine the current conversation stage, topic evolution and predict the development of the conversation, and then actively generate response messages and store them in the server The host, and the server host selects appropriate response messages and transmits them to the portable device for output. This technical means can solve the problems existing in the previous technology, thereby achieving the technical effect of improving chat initiative and response efficiency.
雖然本發明以前述之實施例揭露如上,然其並非用以限定本發明,任何熟習相像技藝者,在不脫離本發明之精神和範圍內,當可作些許之更動與潤飾,因此本發明之專利保護範圍須視本說明書所附之申請專利範圍所界定者為準。Although the present invention has been disclosed in the foregoing embodiments, they are not intended to limit the present invention. Anyone skilled in the similar art can make some modifications and modifications without departing from the spirit and scope of the present invention. Therefore, the present invention is The scope of patent protection shall be determined by the scope of the patent application attached to this specification.
110:人工智慧裝置 120:可攜式裝置 121:感測器 122:揚聲器 123:儲存裝置 124:語音處理器 130:伺服端主機 131:非暫態電腦可讀儲存媒體 132:硬體處理器 300:智慧型手機 301:顯示元件 310:麥克風 320:藍牙耳機 410:上下文訊息 420:時序邏輯 430:上下文訊息暨時序邏輯 500:應答清單 步驟210:將一伺服端主機分別與一人工智慧裝置及一可攜式裝置相互連接,其中,該人工智慧裝置通過一應用程式介面(Application Programming Interface, API)接收一上下文訊息及其相應的一時序邏輯,以及傳送至少一應答訊息 步驟220:該可攜式裝置通過至少一感測器持續感測多個語音訊號,並且通過梅爾頻率倒譜係數(Mel-Frequency Cepstral Coefficients, MFCC)將感測到的所述語音訊號轉換為相應的所述特徵向量,用以對所述語音訊號進行分類 步驟230:該可攜式裝置執行語音轉文字(Speech-to-Text)處理,將所述語音訊號分別轉換為相應的一文字訊息 步驟240:該可攜式裝置基於時序關係及分類結果,在對應所述語音訊號的所述文字訊息中嵌入所述時序標記及所述分類標記且儲存至該可攜式裝置的一儲存裝置作為所述上下文訊息 步驟250:該伺服端主機持續自該可攜式裝置的該儲存裝置載入所述上下文訊息,並且根據其中嵌入的所述時序標記及所述分類標記判斷多人對話的一時序邏輯,該時序邏輯包含對話的人數、時序及主題 步驟260:該伺服端主機將所述上下文訊息及該時序邏輯傳送至該人工智慧裝置,用以輸入至該人工智慧裝置的大型語言模型(Large Language Model, LLM)以產生相應的所述應答訊息,再通過該應用程式介面將產生的所述應答訊息傳送至該伺服端主機 步驟270:該伺服端主機將所述應答訊息儲存至一應答清單,並且自動從該應答清單中,選擇所述應答訊息至少其中之一以作為一隨選對話訊息,再將該隨選對話訊息傳送至該可攜式裝置 步驟280:該可攜式裝置接收到該隨選對話訊息時,執行文字轉語音(Text-to-Speech)處理,將該隨選對話訊息轉換為一反饋語音以通過該揚聲器輸出 110:Artificial intelligence device 120: Portable device 121: Sensor 122: Speaker 123:Storage device 124: Voice processor 130:Server host 131: Non-transitory computer-readable storage media 132:Hardware processor 300:Smartphone 301:Display component 310:Microphone 320: Bluetooth headset 410:Context message 420: Sequential logic 430:Context information and temporal logic 500:Response list Step 210: Connect a server host to an artificial intelligence device and a portable device respectively, wherein the artificial intelligence device receives a context message and its corresponding time through an application programming interface (Application Programming Interface, API). sequence logic, and transmit at least one response message Step 220: The portable device continuously senses multiple voice signals through at least one sensor, and converts the sensed voice signals into The corresponding feature vector is used to classify the speech signal Step 230: The portable device performs speech-to-text processing to convert the voice signals into corresponding text messages. Step 240: Based on the timing relationship and the classification result, the portable device embeds the timing mark and the classification mark in the text message corresponding to the voice signal and stores it in a storage device of the portable device as the contextual message Step 250: The server host continues to load the context information from the storage device of the portable device, and determines a timing logic of the multi-person conversation based on the timing mark and the classification mark embedded therein. Logic includes the number of people, timing and topics of the conversation Step 260: The server host transmits the context information and the timing logic to the artificial intelligence device for input into the large language model (Large Language Model, LLM) of the artificial intelligence device to generate the corresponding response message. , and then transmit the generated response message to the server host through the application programming interface Step 270: The server host stores the response messages in a response list, and automatically selects at least one of the response messages from the response list as an on-demand dialogue message, and then sends the on-demand dialogue message. Send to the portable device Step 280: When the portable device receives the on-demand dialogue message, it performs text-to-speech processing to convert the on-demand dialogue message into a feedback voice for output through the speaker.
第1圖為本發明真實多人應答情境下的生成式聊天機器人之系統的系統方塊圖。 第2A圖及第2B圖為本發明真實多人應答情境下的生成式聊天機器人之方法的方法流程圖。 第3圖為應用本發明的可攜式裝置之示意圖。 第4圖為本發明的上下文訊息及時序邏輯之示意圖。 第5圖為應用本發明在應答清單中主動篩選出應答訊息之示意圖。 Figure 1 is a system block diagram of a generative chat robot system in a real multi-person response situation of the present invention. Figure 2A and Figure 2B are method flow charts of the method of the present invention's generative chat robot in a real multi-person response situation. Figure 3 is a schematic diagram of a portable device using the present invention. Figure 4 is a schematic diagram of context information and timing logic of the present invention. Figure 5 is a schematic diagram of applying the present invention to actively filter out response messages from the response list.
110:人工智慧裝置 110:Artificial intelligence device
120:可攜式裝置 120: Portable device
121:感測器 121: Sensor
122:揚聲器 122: Speaker
123:儲存裝置 123:Storage device
124:語音處理器 124: Voice processor
130:伺服端主機 130:Server host
131:非暫態電腦可讀儲存媒體 131: Non-transitory computer-readable storage media
132:硬體處理器 132:Hardware processor
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112135679A TWI833678B (en) | 2023-09-19 | 2023-09-19 | Generative chatbot system for real multiplayer conversational and method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112135679A TWI833678B (en) | 2023-09-19 | 2023-09-19 | Generative chatbot system for real multiplayer conversational and method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
TWI833678B true TWI833678B (en) | 2024-02-21 |
Family
ID=90825100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW112135679A TWI833678B (en) | 2023-09-19 | 2023-09-19 | Generative chatbot system for real multiplayer conversational and method thereof |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI833678B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200019370A1 (en) * | 2018-07-12 | 2020-01-16 | Disney Enterprises, Inc. | Collaborative ai storytelling |
TW202133027A (en) * | 2020-02-27 | 2021-09-01 | 中華電信股份有限公司 | Dialogue system and method for human-machine cooperation |
CN115221294A (en) * | 2021-04-15 | 2022-10-21 | 腾讯科技(深圳)有限公司 | Dialogue processing method, dialogue processing device, electronic equipment and storage medium |
US20230015665A1 (en) * | 2019-07-22 | 2023-01-19 | Capital One Services, Llc | Multi-turn dialogue response generation with template generation |
CN116319631A (en) * | 2017-04-07 | 2023-06-23 | 微软技术许可有限责任公司 | Voice forwarding in automatic chat |
CN116662520A (en) * | 2023-07-21 | 2023-08-29 | 六合熙诚(北京)信息科技有限公司 | Multi-round dialogue generation method suitable for psychological role scene simulation |
-
2023
- 2023-09-19 TW TW112135679A patent/TWI833678B/en active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116319631A (en) * | 2017-04-07 | 2023-06-23 | 微软技术许可有限责任公司 | Voice forwarding in automatic chat |
US20200019370A1 (en) * | 2018-07-12 | 2020-01-16 | Disney Enterprises, Inc. | Collaborative ai storytelling |
US20230015665A1 (en) * | 2019-07-22 | 2023-01-19 | Capital One Services, Llc | Multi-turn dialogue response generation with template generation |
US20230252241A1 (en) * | 2019-07-22 | 2023-08-10 | Capital One Services, Llc | Multi-turn dialogue response generation with persona modeling |
TW202133027A (en) * | 2020-02-27 | 2021-09-01 | 中華電信股份有限公司 | Dialogue system and method for human-machine cooperation |
CN115221294A (en) * | 2021-04-15 | 2022-10-21 | 腾讯科技(深圳)有限公司 | Dialogue processing method, dialogue processing device, electronic equipment and storage medium |
CN116662520A (en) * | 2023-07-21 | 2023-08-29 | 六合熙诚(北京)信息科技有限公司 | Multi-round dialogue generation method suitable for psychological role scene simulation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10586539B2 (en) | In-call virtual assistant | |
US10970492B2 (en) | IoT-based call assistant device | |
KR102509464B1 (en) | Utterance classifier | |
US11508378B2 (en) | Electronic device and method for controlling the same | |
JP2014211900A (en) | Systems and methods for haptic augmentation of voice-to-text conversion | |
US20210125610A1 (en) | Ai-driven personal assistant with adaptive response generation | |
WO2017200080A1 (en) | Intercommunication method, intercommunication device, and program | |
EP4091161B1 (en) | Synthesized speech audio data generated on behalf of human participant in conversation | |
CN111542814A (en) | Method, computer device and computer readable storage medium for changing responses to provide rich-representation natural language dialog | |
US11830502B2 (en) | Electronic device and method for controlling the same | |
KR20210042523A (en) | An electronic apparatus and Method for controlling the electronic apparatus thereof | |
CN111557001B (en) | Method for providing natural language dialogue, computer device and computer readable storage medium | |
CN111556999B (en) | Method, computer device and computer readable storage medium for providing natural language dialogue by providing substantive answer in real time | |
JP7063230B2 (en) | Communication device and control program for communication device | |
CN109074809A (en) | Information processing equipment, information processing method and program | |
TWI833678B (en) | Generative chatbot system for real multiplayer conversational and method thereof | |
JP2021113835A (en) | Voice processing device and voice processing method | |
JP2021117371A (en) | Information processor, information processing method and information processing program | |
Mruthyunjaya et al. | Human-Augmented robotic intelligence (HARI) for human-robot interaction | |
Fujii et al. | Open source system integration towards natural interaction with robots | |
CN117171323A (en) | System and method for generating chat robot under real multi-person response situation | |
Lin et al. | Nonverbal acoustic communication in human-computer interaction | |
US11657814B2 (en) | Techniques for dynamic auditory phrase completion | |
Guha | Detecting User Emotions From Audio Conversations With the Smart Assistants | |
Sun | Intelligible dialogue manager for social robots: An AI dialogue robot solution based on Rasa open-source framework and Pepper robot |