[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2019095586A1 - 会议纪要生成方法、应用服务器及计算机可读存储介质 - Google Patents

会议纪要生成方法、应用服务器及计算机可读存储介质 Download PDF

Info

Publication number
WO2019095586A1
WO2019095586A1 PCT/CN2018/077628 CN2018077628W WO2019095586A1 WO 2019095586 A1 WO2019095586 A1 WO 2019095586A1 CN 2018077628 W CN2018077628 W CN 2018077628W WO 2019095586 A1 WO2019095586 A1 WO 2019095586A1
Authority
WO
WIPO (PCT)
Prior art keywords
speaker
content
speakers
meeting
meeting minutes
Prior art date
Application number
PCT/CN2018/077628
Other languages
English (en)
French (fr)
Inventor
王健宗
黄章成
程宁
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019095586A1 publication Critical patent/WO2019095586A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities

Definitions

  • the present application relates to the field of voice processing technologies, and in particular, to a conference minutes generation method, an application server, and a computer readable storage medium.
  • the present application provides a method for generating a meeting minutes, an application server, and a computer readable storage medium, which can automatically summarize and generate meeting minutes according to meeting content records, thereby saving human resource costs.
  • the present application provides an application server, where the application server includes a memory, a processor, and a memory meeting generation system that can be run on the processor, where the meeting minutes are generated.
  • the system is executed by the processor, the following steps are performed: acquiring audio record information of a conference, and extracting, from the audio record information, the content of each speaker according to the voice feature of each speaker; The content of the speech of the speaker is subjected to keyword extraction; and the meeting minutes corresponding to the meeting are generated according to the extracted keywords.
  • the present application further provides a method for generating a meeting minutes, which is applied to an application server, the method comprising: acquiring audio record information of a conference, and recording the audio record according to the voice feature of each speaker. Extracting the content of each of the speakers of the information; performing keyword extraction on the content of the speech of each of the speakers; and generating a meeting minutes corresponding to the meeting according to the extracted keywords.
  • the present application further provides a computer readable storage medium storing a meeting minutes generating system, the meeting minutes generating system being executable by at least one processor, so that The at least one processor performs the steps of the method of generating a meeting minutes as described above.
  • the conference minutes generating method, the application server, and the computer readable storage medium proposed by the present application first acquire audio recording information of a conference, and from the audio recording according to the voice characteristics of each speaker.
  • the content of each speaker of the speaker is extracted from the information; secondly, keyword extraction is performed on the content of the speech of each of the speakers; and finally, the meeting minutes corresponding to the meeting are generated according to the extracted keywords.
  • the participants in the meeting can focus more on the content and process of the meeting.
  • the meeting summary is streamlined and accurate. It can also be used for reference and reference by other people in need. Compared with traditional manual recording, this solution is more efficient and accurate, and saves human resource costs.
  • FIG. 1 is a schematic diagram of an optional application environment of each embodiment of the present application.
  • FIG. 2 is a schematic diagram of an optional hardware architecture of an application server of the present application
  • FIG. 3 is a schematic diagram of a program module of a first embodiment of a meeting minutes generation system of the present application
  • FIG. 4 is a schematic diagram of a program module of a second embodiment of the meeting minutes generating system of the present application.
  • FIG. 5 is a schematic flowchart of an implementation process of a first embodiment of a method for generating a meeting minutes of the present application
  • FIG. 6 is a schematic diagram of an implementation process of a second embodiment of a method for generating a meeting minutes of the present application.
  • FIG. 1 it is a schematic diagram of an optional application environment of each embodiment of the present application.
  • the present application is applicable to an application environment including, but not limited to, the terminal device 1, the application server 2, and the network 3.
  • the terminal device 1 may be a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, an in-vehicle device, etc. Mobile devices, etc., as well as fixed terminals such as digital TVs, desktop computers, notebooks, broadband phones, servers, and the like.
  • the application server 2 may be a computing device such as a rack server, a blade server, a tower server, or a rack server.
  • the application server 2 may be a standalone server or a server cluster composed of multiple servers.
  • the network 3 may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, Wireless or wired networks such as 5G networks, Bluetooth, Wi-Fi, and call
  • the application server 2 can be respectively connected to one or more of the terminal devices 1 through the network 3 for data transmission and interaction.
  • FIG. 2 it is a schematic diagram of an optional hardware architecture of the application server 2 of the present application.
  • the application server 2 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus. It is to be noted that FIG. 2 only shows the application server 2 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 11 may be an internal storage unit of the application server 2, such as a hard disk or memory of the application server 2.
  • the memory 11 may also be an external storage device of the application server 2, such as a plug-in hard disk equipped on the application server 2, a smart memory card (SMC), and a secure digital number. (Secure Digital, SD) card, flash card, etc.
  • the memory 11 can also include both the internal storage unit of the application server 2 and its external storage device.
  • the memory 11 is generally used to store an operating system installed in the application server 2 and various types of application software, such as program code of the meeting minutes generation system 100. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control the overall operation of the application server 2, such as performing control and processing related to data interaction or communication with the terminal device 1.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running the conference minutes generating system and the like.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the application server 2 and other electronic devices.
  • the network interface 13 is mainly used to connect the application server 2 to one or more of the terminal devices 1 through the network 3, and the application server 2 and the one or more terminals. A data transmission channel and a communication connection are established between the devices 1.
  • the present application proposes a meeting minutes generation system 100.
  • FIG. 3 it is a program module diagram of the first embodiment of the meeting minutes generation system 100 of the present application.
  • the meeting minutes generating system 100 includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the meeting minutes generating operation of the embodiments of the present application can be implemented. .
  • the meeting minutes generation system 100 can be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 3, the meeting minutes generation system 100 can be divided into a content acquisition module 101, an extraction module 102, and a generation module 103. among them:
  • the content obtaining module 101 is configured to obtain audio record information of a conference, and extract, from the audio record information, the content of each speaker's speech according to the voice feature of each speaker.
  • the application server 2 collects the conference voice content through each terminal device 1, receives the voice content sent by each terminal device 1 and saves the voice content, and the voice content can be saved into a specified audio format, such as MP3. , wma, wav, etc.
  • the terminal device 1 collects the voice content through a sound collecting device (for example, a microphone).
  • the terminal device 1 can send the collected voice content to the application server 2 in real time or periodically, or when the participant on the side of the terminal device 1 ends a speech, the terminal device 1 will continuously collect the voice.
  • the content is sent to the application server 2.
  • the application server 2 After receiving the voice content sent by the terminal device 1, the application server 2 saves the voice content.
  • the content obtaining module 101 can obtain the audio record information of the conference, because the full voice content of the conference is saved on the application server 2.
  • the audio recording information is preferably the voice content of the conference.
  • the conference call is a video conference call
  • the conference record received and saved by the application server 2 is audio and video (voice and video screen) content, and at this time, the audio record information acquired by the content acquisition module 101 Also preferred is the voice content of the conference.
  • the voice characteristics of each speaker can be pre-acquired prior to the meeting. Specifically, each participant is preset with a unique ID number. Before the meeting, the voice characteristics of each participant are pre-admitted, and then an identity index table is established according to the voice characteristics and ID number of each participant. The identity index table stores the correspondence between the voice characteristics of each participant and the ID of each participant, thereby enabling confirmation of the membership of the participant.
  • the participants can be from the local or remote speakers.
  • the speaker's voice feature may be generated into a speaker model, and the speaker model and the corresponding speaker ID number are stored in the identity index table.
  • the speaker sound feature of the segment of the voice content needs to be extracted first, and the sound feature is extracted. Compare with each speaker model in the identity index table and get a matching score. If the matching score reaches a preset score, it indicates that the speaker model corresponding to the sound feature parameter exists in the index table, thereby obtaining the speaker ID number and confirming the speaker identity. Otherwise, it indicates that there is no speaker model corresponding to the sound feature in the index table, and a new speaker model and a new ID number are generated according to the sound feature, and stored in the identity index table, so as to facilitate the search for matching.
  • a UBM model general background model
  • an i-vector extraction algorithm can be used for matching scoring.
  • the i-vector value is calculated from the two pieces of speech content as the sound characteristics of the speaker of the two pieces of speech content.
  • the input is scored by the dot-product algorithm or the PLDA algorithm. If the score exceeds a certain threshold, it is considered that the two speech contents belong to the same speaker. .
  • the content acquisition module 101 may extract each audio from the audio record information according to the voice feature of each speaker. The content of a speech by the speaker.
  • the extracting module 102 is configured to perform keyword extraction on the content of the speech of each of the speakers.
  • the voice content of each speaker may be converted into a corresponding text before keyword extraction.
  • the extraction module 102 may first sort the multiple segments of text content in a certain order. For example, the multi-segment text content can be sorted according to the time axis (eg, according to the order in which the text content is generated, the number of sentences, the serial number, etc.).
  • the extraction module 102 can employ a TF-IDF algorithm to extract keywords for each of the speakers' speech content.
  • the TF-IDF algorithm can be used to assess how important a word is in a spoken text. The importance of a word increases proportionally with the number of times it appears in the text.
  • the TF-IDF value of a certain word is obtained by word frequency (TF) and inverse document frequency (IDF), and the TF-IDF value is higher if the word is more important to the spoken text.
  • TF word frequency
  • IDF inverse document frequency
  • the extraction module 102 can rank the TF-IDF value in the first few words as the keyword of the utterance text. For example, a word with the TF-IDF value ranked in the top five is used as a keyword for the spoken text.
  • the generating module 103 is configured to generate a meeting minutes corresponding to the meeting according to the extracted keywords.
  • the generating module 103 may generate a meeting minutes based on the extracted keywords in combination with the speaking content to which each keyword belongs. In other implementation manners of the present application, the generating module 103 may further take the speaker's intonation (generally, the higher the intonation of the voice content, correspondingly, the higher the importance of the voice content) as a consideration parameter to generate The meeting minutes.
  • the generating module 103 may further process the generated meeting minutes by using an NLP natural language algorithm to generate a more fluent and standardized meeting minutes.
  • the NLP analysis engine based on the NLP natural language algorithm can pre-collect and store a large amount of real corpus, so that the linguistic behavior of the words in the meeting minutes can be revised.
  • the meeting minutes generating system 100 includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the meeting minutes generating operation of the embodiments of the present application can be implemented. .
  • the meeting minutes generation system 100 can be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions.
  • the meeting minutes generation system 100 can be divided into a content acquisition module 101, an extraction module 102, a generation module 103, a feature creation module 104, and a transmission module 105.
  • the program modules 101-103 are the same as the first embodiment of the meeting minutes generation system 100 of the present application, and the feature creation module 104 and the transmission module 105 are added thereto. among them:
  • the feature establishing module 104 is configured to acquire a voice sample of each of the speakers, and extract a sound feature of each of the speakers from a voice sample of each of the speakers.
  • each participant is required to perform a conference check-in by voice to obtain a voice sample, thereby realizing pre-admission of the voice of each participant and performing sound feature extraction.
  • the sending module 105 is configured to send the meeting minutes generated by the generating module 103 to the preset user by mail or fax, or provide a link to the preset user to obtain the meeting minutes.
  • the preset user may be a participant or other pre-designated person.
  • the sending module 105 may also encrypt the meeting minutes to ensure data security before storing or transmitting the meeting minutes.
  • the meeting minutes are compressed and encrypted, and the decompression password is a designated password or a password known or agreed by each participant.
  • the present application also proposes a method for generating a meeting minutes.
  • FIG. 5 it is a schematic flow chart of the implementation of the first embodiment of the method for generating meeting minutes of the present application.
  • the order of execution of the steps in the flowchart shown in FIG. 5 may be changed according to different requirements, and some steps may be omitted.
  • Step S502 Acquire audio record information of a conference, and extract the content of each speaker's speech from the audio record information according to the voice feature of each speaker.
  • the application server 2 collects the conference voice content through each terminal device 1, receives the voice content sent by each terminal device 1 and saves the voice content, and the voice content can be saved into a specified audio format, such as MP3. , wma, wav, etc.
  • the terminal device 1 collects the voice content through a sound collection device (for example, a microphone).
  • the terminal device 1 can send the collected voice content to the application server 2 in real time or periodically, or when the participant on the side of the terminal device 1 ends a speech, the terminal device 1 will continuously collect the voice.
  • the content is sent to the application server 2.
  • the application server 2 After receiving the voice content sent by the terminal device 1, the application server 2 saves the voice content.
  • the audio record information of the conference can be obtained from the application server 2.
  • the audio recording information is preferably the voice content of the conference.
  • the conference call is a video conference call
  • the conference record received and saved by the application server 2 is audio and video (voice and video picture) content, and at this time, the acquired audio record information is also preferably the same.
  • the voice content of the meeting is a video conference call.
  • the voice characteristics of each speaker can be pre-acquired prior to the meeting. Specifically, each participant is preset with a unique ID number. Before the meeting, the voice characteristics of each participant are pre-admitted, and then an identity index table is established according to the voice characteristics and ID number of each participant. The identity index table stores the correspondence between the voice characteristics of each participant and the ID of each participant, thereby enabling confirmation of the membership of the participant.
  • the participants can be from the local or remote speakers.
  • the speaker's voice feature may be generated into a speaker model, and the speaker model and the corresponding speaker ID number are stored in the identity index table.
  • the speaker sound feature of the segment of the voice content needs to be extracted first, and the sound feature is extracted. Compare with each speaker model in the identity index table and get a matching score. If the matching score reaches a preset score, it indicates that the speaker model corresponding to the sound feature parameter exists in the index table, thereby obtaining the speaker ID number and confirming the speaker identity. Otherwise, it indicates that there is no speaker model corresponding to the sound feature in the index table, and a new speaker model and a new ID number are generated according to the sound feature, and stored in the identity index table, so as to facilitate the search for matching.
  • a UBM model general background model
  • an i-vector extraction algorithm can be used for matching scoring.
  • the i-vector value is calculated from the two pieces of speech content as the sound characteristics of the speaker of the two pieces of speech content.
  • the input is scored by the dot-product algorithm or the PLDA algorithm. If the score exceeds a certain threshold, it is considered that the two speech contents belong to the same speaker. .
  • the voice of each speaker can be extracted from the audio record information according to the voice feature of each speaker.
  • the content of the speech is
  • Step S504 performing keyword extraction on the content of the speech of each of the speakers.
  • the voice content of each speaker may be converted into a corresponding text before keyword extraction.
  • the plurality of pieces of text content may be first sorted in a certain order.
  • the multi-segment text content can be sorted according to the time axis (eg, according to the order in which the text content is generated, the number of sentences, the serial number, etc.).
  • a TF-IDF algorithm may be employed to extract keywords for each of the speakers' speech content.
  • the TF-IDF algorithm can be used to assess how important a word is in a spoken text. The importance of a word increases proportionally with the number of times it appears in the text.
  • the TF-IDF value of a certain word is obtained by word frequency (TF) and inverse document frequency (IDF), and the TF-IDF value is higher if the word is more important to the spoken text. The bigger. Therefore, the first few words of the TF-IDF value can be used as the keywords of the speech text. For example, a word with the TF-IDF value ranked in the top five is used as a keyword for the spoken text.
  • Step S506 generating a meeting minutes corresponding to the meeting according to the extracted keywords.
  • the meeting minutes may be generated based on the extracted keywords in combination with the speaking content to which each keyword belongs.
  • the speaker's intonation (generally, the higher the intonation of the voice content, correspondingly, the higher the importance of the voice content) may be further taken as a consideration parameter to generate the conference. summary.
  • the generated meeting minutes may be further processed by an NLP natural language algorithm to generate a more fluent and standardized meeting minutes.
  • the NLP analysis engine based on the NLP natural language algorithm can pre-collect and store a large amount of real corpus, so that the linguistic behavior of the words in the meeting minutes can be revised.
  • the conference minutes generating method proposed by the present application firstly acquires audio record information of the conference, and extracts each of the speakers from the audio record information according to the voice feature of each speaker.
  • the content of the speech secondly, performing keyword extraction on the content of the speech of each of the speakers; further, generating a meeting minutes corresponding to the meeting according to the extracted keywords; and finally, generating the meeting minutes by mail Or send it to the preset user in the form of a fax, or provide a link to the preset user to obtain the meeting minutes.
  • the participants in the meeting can focus more on the content and process of the meeting.
  • the meeting summary is streamlined and accurate. It can also be used for reference and reference by other people in need. Compared with traditional manual recording, this solution is more efficient and accurate, and saves human resource costs.
  • FIG. 6 it is a schematic diagram of an implementation process of a second embodiment of a method for generating a meeting minutes of the present application.
  • the order of execution of the steps in the flowchart shown in FIG. 6 may be changed according to different requirements, and some steps may be omitted.
  • Step S500 Acquire a voice sample of each of the speakers, and extract a sound feature of each of the speakers from a voice sample of each of the speakers.
  • each participant is required to perform a conference check-in by voice to obtain a voice sample, thereby realizing pre-admission of the voice of each participant and performing sound feature extraction.
  • Step S502 Acquire audio record information of a conference, and extract the content of each speaker's speech from the audio record information according to the voice feature of each speaker.
  • the application server 2 collects the conference voice content through each terminal device 1, receives the voice content sent by each terminal device 1 and saves the voice content, and the voice content can be saved into a specified audio format, such as MP3. , wma, wav, etc.
  • the terminal device 1 collects the voice content through a sound collection device (for example, a microphone).
  • the terminal device 1 can send the collected voice content to the application server 2 in real time or periodically, or when the participant on the side of the terminal device 1 ends a speech, the terminal device 1 will continuously collect the voice.
  • the content is sent to the application server 2.
  • the application server 2 After receiving the voice content sent by the terminal device 1, the application server 2 saves the voice content.
  • the audio record information of the conference can be obtained from the application server 2.
  • the audio recording information is preferably the voice content of the conference.
  • the conference call is a video conference call
  • the conference record received and saved by the application server 2 is audio and video (voice and video picture) content, and at this time, the acquired audio record information is also preferably the same.
  • the voice content of the meeting is a video conference call.
  • the voice characteristics of each speaker can be pre-acquired prior to the meeting. Specifically, each participant is preset with a unique ID number. Before the meeting, the voice characteristics of each participant are pre-admitted, and then an identity index table is established according to the voice characteristics and ID number of each participant. The identity index table stores the correspondence between the voice characteristics of each participant and the ID of each participant, thereby enabling confirmation of the membership of the participant.
  • the participants can be from the local or remote speakers.
  • the speaker's voice characteristics may be generated into a speaker model, and the speaker model and the corresponding speaker ID number are stored in the identity index table.
  • the speaker sound feature of the segment of the voice content needs to be extracted first, and the sound feature is extracted. Compare with each speaker model in the identity index table and get a matching score. If the matching score reaches a preset score, it indicates that the speaker model corresponding to the sound feature parameter exists in the index table, thereby obtaining the speaker ID number and confirming the speaker identity. Otherwise, it indicates that there is no speaker model corresponding to the sound feature in the index table, and a new speaker model and a new ID number are generated according to the sound feature, and stored in the identity index table, so as to facilitate the search for matching.
  • a UBM model general background model
  • an i-vector extraction algorithm can be used for matching scoring.
  • the i-vector value is calculated from the two pieces of speech content as the sound characteristics of the speaker of the two pieces of speech content.
  • the input is scored by the dot-product algorithm or the PLDA algorithm. If the score exceeds a certain threshold, it is considered that the two speech contents belong to the same speaker. .
  • the voice of each speaker can be extracted from the audio record information according to the voice feature of each speaker.
  • the content of the speech is
  • Step S504 performing keyword extraction on the content of the speech of each of the speakers.
  • the voice content of each speaker may be converted into a corresponding text before keyword extraction.
  • the plurality of pieces of text content may be first sorted in a certain order.
  • the multi-segment text content can be sorted according to the time axis (eg, according to the order in which the text content is generated, the number of sentences, the serial number, etc.).
  • a TF-IDF algorithm may be employed to extract keywords for each of the speakers' speech content.
  • the TF-IDF algorithm can be used to assess how important a word is in a spoken text. The importance of a word increases proportionally with the number of times it appears in the text.
  • the TF-IDF value of a certain word is obtained by word frequency (TF) and inverse document frequency (IDF), and the TF-IDF value is higher if the word is more important to the spoken text. The bigger. Therefore, the first few words of the TF-IDF value can be used as the keywords of the speech text. For example, a word with the TF-IDF value ranked in the top five is used as a keyword for the spoken text.
  • Step S506 generating a meeting minutes corresponding to the meeting according to the extracted keywords.
  • the meeting minutes may be generated based on the extracted keywords in combination with the speaking content to which each keyword belongs.
  • the speaker's intonation (generally, the higher the intonation of the voice content, correspondingly, the higher the importance of the voice content) may be further taken as a consideration parameter to generate the conference. summary.
  • the generated meeting minutes may be further processed by an NLP natural language algorithm to generate a more fluent and standardized meeting minutes.
  • the NLP analysis engine based on the NLP natural language algorithm can pre-collect and store a large amount of real corpus, so that the linguistic behavior of the words in the meeting minutes can be revised.
  • Step S508 sending the meeting minutes to the preset user by mail or fax, or providing a link to the preset user to obtain the meeting minutes.
  • the preset user may be a participant or other pre-designated person.
  • the meeting minutes may also be encrypted prior to storing or transmitting the meeting minutes to ensure data security. For example, compress and encrypt the meeting minutes, decompress the password as a specified password or a password known or agreed by each participant.
  • the method for generating meeting minutes proposed by the present application firstly acquires a voice sample of each of the speakers, and extracts each of the speakers from the voice samples of each of the speakers. a sound feature; secondly, acquiring audio record information of the conference, and extracting the content of each speaker from the audio record information according to the voice feature of each speaker; and, for each of the speakers
  • the content of the speech of the person is extracted by the keyword; further, the meeting minutes corresponding to the meeting are generated according to the extracted keywords; finally, the generated meeting minutes are sent to the preset user by mail or fax, or The preset user provides a link to obtain the meeting minutes.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Toys (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请公开了一种会议纪要生成方法,包括:获取一会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容;对每一所述发言人的发言内容进行关键字提取;及根据所述提取的关键字生成与所述会议对应的会议纪要。本申请还提供一种应用服务器及计算机可读存储介质。本申请提供的会议纪要生成方法、应用服务器及计算机可读存储介质可根据会议内容记录自动总结并生成会议纪要,节省人力资源成本。

Description

会议纪要生成方法、应用服务器及计算机可读存储介质
本申请要求于2017年11月17日提交中国专利局、申请号为201711141751.5、发明名称为“会议纪要生成方法、应用服务器及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及语音处理技术领域,尤其涉及会议纪要生成方法、应用服务器及计算机可读存储介质。
背景技术
在政府、公司办公过程当中,每个工作日几乎都可能会面临各项会议,大到重要的决策层指示会议,小到组内针对某个事件的讨论亦或是功能的探究,都经由“会议“这种形式来完成。而在参会过程中,参会成员一般专注于跟进会议内容、进程,在会议结束后,会议纪要往往需要依靠专门的工作人员根据参会过程进行收集整理,从而导致整理会议纪要的过程需要人力成本的投入。对于一些小型的组内会议,往往因为时间及人力原因,无专门的工作人员来整理会议纪要,将不利于推动团队的建设与成长。
发明内容
有鉴于此,本申请提出一种会议纪要生成方法、应用服务器及计算机可读存储介质,可以实现根据会议内容记录自动总结并生成会议纪要,节省人力资源成本。
首先,为实现上述目的,本申请提出一种应用服务器,所述应用服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的会议纪要生成系统,所述会议纪要生成系统被所述处理器执行时实现如下步骤:获 取一会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容;对每一所述发言人的发言内容进行关键字提取;及根据所述提取的关键字生成与所述会议对应的会议纪要。
此外,为实现上述目的,本申请还提供一种会议纪要生成方法,应用于应用服务器,所述方法包括:获取一会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容;对每一所述发言人的发言内容进行关键字提取;及根据所述提取的关键字生成与所述会议对应的会议纪要。
进一步地,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有会议纪要生成系统,所述会议纪要生成系统可被至少一个处理器执行,以使所述至少一个处理器执行如上述会议纪要生成方法的步骤。
相较于现有技术,本申请所提出的会议纪要生成方法、应用服务器及计算机可读存储介质,首先,获取一会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容;其次,对每一所述发言人的发言内容进行关键字提取;最后,根据所述提取的关键字生成与所述会议对应的会议纪要。这样,可以实现根据会议内容记录自动总结并生成会议纪要,方便参会人员对会议内容的回顾,会议中的参会人员可以更专注于会议内容与进程,会议结束后,精简、准确的会议纪要也可以供其他有需求人员进行查阅与参考引用,相比于传统的人工记录整理,本方案更高效准确,同时节省了人力资源成本。
附图说明
图1是本申请各个实施例一可选的应用环境示意图;
图2是本申请应用服务器一可选的硬件架构的示意图;
图3是本申请会议纪要生成系统第一实施例的程序模块示意图;
图4是本申请会议纪要生成系统第二实施例的程序模块示意图;
图5为本申请会议纪要生成方法第一实施例的实施流程示意图;
图6为本申请会议纪要生成方法第二实施例的实施流程示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
参阅图1所示,是本申请各个实施例一可选的应用环境示意图。
在本实施例中,本申请可应用于包括,但不仅限于,终端设备1、应用服务器2、网络3的应用环境中。其中,所述终端设备1可以是移动电话、智能电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、导航装置、车载装置等等的可移动设备,以及诸如数字TV、台式计算机、笔记本、宽带电话、服务器等等的固定终端。所述应用服务器2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务 器等计算设备,该应用服务器2可以是独立的服务器,也可以是多个服务器所组成的服务器集群。所述网络3可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi、通话网络等无线或有线网络。
其中,所述应用服务器2可以通过所述网络3分别与一个或多个所述终端设备1通信连接,以进行数据传输和交互。
参阅图2所示,是本申请应用服务器2一可选的硬件架构的示意图。
本实施例中,所述应用服务器2可包括,但不仅限于,可通过系统总线相互通信连接存储器11、处理器12、网络接口13。需要指出的是,图2仅示出了具有组件11-13的应用服务器2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器11可以是所述应用服务器2的内部存储单元,例如该应用服务器2的硬盘或内存。在另一些实施例中,所述存储器11也可以是所述应用服务器2的外部存储设备,例如该应用服务器2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器11还可以既包括所述应用服务器2的内部存储单元也包括其外部存储设备。本实施例中,所述存储器11通常用于存储安装于所述应用服务器2的操作系统和各类应用软件,例如会议纪要生成系统100的程序代码等。此外,所述存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器12在一些实施例中可以是中央处理器(Central Processing  Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述应用服务器2的总体操作,例如执行与所述终端设备1进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行所述的会议纪要生成系统等。
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述应用服务器2与其他电子设备之间建立通信连接。本实施例中,所述网络接口13主要用于通过所述网络3将所述应用服务器2与一个或多个所述终端设备1相连,在所述应用服务器2与所述一个或多个终端设备1之间的建立数据传输通道和通信连接。
至此,己经详细介绍了本申请相关设备的硬件结构和功能。下面,将基于上述介绍提出本申请的各个实施例。
首先,本申请提出一种会议纪要生成系统100。
参阅图3所示,是本申请会议纪要生成系统100第一实施例的程序模块图。
本实施例中,所述会议纪要生成系统100包括一系列的存储于存储器11上的计算机程序指令,当该计算机程序指令被处理器12执行时,可以实现本申请各实施例的会议纪要生成操作。在一些实施例中,基于该计算机程序指令各部分所实现的特定的操作,会议纪要生成系统100可以被划分为一个或多个模块。例如,在图3中,会议纪要生成系统100可以被分割成内容获取模块101、提取模块102及生成模块103。其中:
所述内容获取模块101用于获取一会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容。
在一实施例中,当电话会议开始后,应用服务器2通过各终端设备1采集会议语音内容,接收各终端设备1发送的语音内容并予以保存,语音内容可以保存为指定的音频格式,如MP3、wma、wav等。
具体的,当终端设备1一侧的参会人员开始发言时,该终端设备1则通过 声音采集装置(例如麦克风)采集语音内容。该终端设备1可以将采集的语音内容实时或定时的发送给应用服务器2,或者,当该终端设备1这侧的参会人员结束一次发言后,该终端设备1才将本次持续采集的语音内容发送给应用服务器2。应用服务器2接收到终端设备1发送的语音内容后,对语音内容予以保存。
由于应用服务器2上保存有一会议的全程语音内容,内容获取模块101则可以获取该会议的音频记录信息。在本实施方式中,音频记录信息优选是该会议的语音内容。在本申请的其他实施方式中,若电话会议为视频电话会议,则应用服务器2接收并保存的会议记录是音视频(语音和视频画面)内容,此时,内容获取模块101获取的音频记录信息同样优选是该会议的语音内容。
每一发言人(参会人员)的声音特征可以在进行会议前进行预先获取。具体地,每一参会人员被预先设定有唯一的ID编号。会议前预先录取每一参会人员的声音特征,然后根据每一参会人员的声音特征与ID编号建立一身份索引表。该身份索引表中存储了每一参会人员的声音特征与每一参会人员的ID的对应关系,进而可以实现对参会成员身份进行确认。所述参会人员可以来自本端或者远端的发言人员。
在一实施方式中,可以将发言者的声音特征生成一发言者模型,将该发言者模型与对应的发言者ID编号存储在身份索引表中。
在完成参会人员的身份索引表建立后,当需要分析音频记录信息中某一段语音内容属于那个发言人的发言内容时,需要先提取该段语音内容的发言人声音特征,并将该声音特征与身份索引表中的每一发言者模型进行比较,并得到匹配得分。如果匹配得分达到一预设分数,则表明索引表中存在该声音特征参数对应的发言者模型,由此即可得到该发言者ID编号,确认该发言者身份。否则,表明索引表中不存在与该声音特征对应的发言者模型,则根据该声音特征生成新的发言者模型以及新的ID编号,并存储在身份索引表中,以便后续方便查找匹配。
在进行匹配打分时,可以使用一个UBM模型(通用背景模型)和i-vector提取算法来进行匹配打分。举例而言,从两段语音内容中计算i-vector值作为该两段语音内容的发言人的声音特征。对于两个计算得到的i-vector值,利用dot-product(点积)算法或者PLDA算法对输入项进行打分,如果分数超过某一阈值,则认为为该两段语音内容属于同一发言人的发言。
通过对所述音频记录信息中每一段语音内容与参会人员的ID编号建立进行映射关系,所述内容获取模块101即可根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容。
所述提取模块102用于对每一所述发言人的发言内容进行关键字提取。
在一实施例中,可以先将各发言人的语音内容转化成对应的文本后再进行关键字提取。可选地,当转换得到的文字内容有多段时,提取模块102可以先按照一定的顺序对多段文字内容进行排序。例如,可以按照时间轴(如根据文字内容的生成顺序或句数、序号等)对多段文字内容进行排序。
提取模块102可以采用TF-IDF算法来提取每一所述发言人的发言内容的关键字。TF-IDF算法可以用于评估一字词对于一个发言文本中的重要程度。字词的重要性会随着它在文本中出现的次数成正比增加。在进行TF-IDF计算时,通过词频(TF)与逆文档频率(IDF)得出某个字词的TF-IDF值,若该字词对发言文本的重要性越高则该TF-IDF值越大。因此提取模块102可以将TF-IDF值排在最前面的几个字词作为该发言文本的关键词。例如,将TF-IDF值排在前五的字词作为该发言文本的关键词。
所述生成模块103用于根据所述提取的关键字生成与所述会议对应的会议纪要。
在一实施方式中,生成模块103可以根据提取的关键字并结合每一关键字所属的发言内容来生成会议纪要。在本申请的其他实施方式中,生成模块103还可以进一步将发言者的语调(一般来说,语音内容的语调越高,相应的,该语音内容的重要性越高)作为考量参数,以生成所述会议纪要。
在一实施方式中,生成模块103还可以通过NLP自然语言算法对上述生成的会议纪要进行进一步处理,以生成语义更通顺、规范的会议纪要。基于NLP自然语言算法建立的NLP分析引擎可以预先搜集并存储有大量的真实语料,从而可以实现对会议纪要中的字词中有瑕疵或不规范的语言行为进行修订。
参阅图4所示,是本申请会议纪要生成系统100第二实施例的程序模块图。本实施例中,所述会议纪要生成系统100包括一系列的存储于存储器11上的计算机程序指令,当该计算机程序指令被处理器12执行时,可以实现本申请各实施例的会议纪要生成操作。在一些实施例中,基于该计算机程序指令各部分所实现的特定的操作,会议纪要生成系统100可以被划分为一个或多个模块。例如,在图4中,会议纪要生成系统100可以被分割成内容获取模块101、提取模块102、生成模块103、特征建立模块104及发送模块105。所述各程序模块101-103与本申请会议纪要生成系统100第一实施例相同,并在此基础上增加特征建立模块104及发送模块105。其中:
所述特征建立模块104用于获取每一所述发言人的语音样本,并从每一所述发言人的语音样本中提取出每一所述发言人的声音特征。
具体地,可以在进行会议前,要求每一参会人员通过语音方式进行会议签到以获取语音样本,从而来实现预先录取每一参会人员的声音并进行声音特征提取。
所述发送模块105用于将生成模块103生成的会议纪要以邮件或传真形式发送给预设用户,或向所述预设用户提供链接以获取所述会议纪要。所述预设用户可以是参会人员或者其他预先指定的人员。
在一实施方式中,在存储或发送会议纪要之前,发送模块105还可以对会议纪要进行加密,以保证数据安全。例如,对会议纪要进行压缩加密,解压密码为指定密码或者为各参会人员公知的或约定的密码。
此外,本申请还提出一种会议纪要生成方法。
参阅图5所示,是本申请会议纪要生成方法第一实施例的实施流程示意 图。在本实施例中,根据不同的需求,图5所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。
步骤S502,获取一会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容。
在一实施例中,当电话会议开始后,应用服务器2通过各终端设备1采集会议语音内容,接收各终端设备1发送的语音内容并予以保存,语音内容可以保存为指定的音频格式,如MP3、wma、wav等。
具体的,当终端设备1一侧的参会人员开始发言时,该终端设备1则通过声音采集装置(例如麦克风)采集语音内容。该终端设备1可以将采集的语音内容实时或定时的发送给应用服务器2,或者,当该终端设备1这侧的参会人员结束一次发言后,该终端设备1才将本次持续采集的语音内容发送给应用服务器2。应用服务器2接收到终端设备1发送的语音内容后,对语音内容予以保存。
由于应用服务器2上保存有一会议的全程语音内容,则可以从应用服务器2上获取该会议的音频记录信息。在本实施方式中,音频记录信息优选是该会议的语音内容。在本申请的其他实施方式中,若电话会议为视频电话会议,则应用服务器2接收并保存的会议记录是音视频(语音和视频画面)内容,此时,获取的音频记录信息同样优选是该会议的语音内容。
每一发言人(参会人员)的声音特征可以在进行会议前进行预先获取。具体地,每一参会人员被预先设定有唯一的ID编号。会议前预先录取每一参会人员的声音特征,然后根据每一参会人员的声音特征与ID编号建立一身份索引表。该身份索引表中存储了每一参会人员的声音特征与每一参会人员的ID的对应关系,进而可以实现对参会成员身份进行确认。所述参会人员可以来自本端或者远端的发言人员。
在一实施方式中,可以将发言者的声音特征生成一发言者模型,将该发言者模型与对应的发言者ID编号存储在身份索引表中。
在完成参会人员的身份索引表建立后,当需要分析音频记录信息中某一段语音内容属于那个发言人的发言内容时,需要先提取该段语音内容的发言人声音特征,并将该声音特征与身份索引表中的每一发言者模型进行比较,并得到匹配得分。如果匹配得分达到一预设分数,则表明索引表中存在该声音特征参数对应的发言者模型,由此即可得到该发言者ID编号,确认该发言者身份。否则,表明索引表中不存在与该声音特征对应的发言者模型,则根据该声音特征生成新的发言者模型以及新的ID编号,并存储在身份索引表中,以便后续方便查找匹配。
在进行匹配打分时,可以使用一个UBM模型(通用背景模型)和i-vector提取算法来进行匹配打分。举例而言,从两段语音内容中计算i-vector值作为该两段语音内容的发言人的声音特征。对于两个计算得到的i-vector值,利用dot-product(点积)算法或者PLDA算法对输入项进行打分,如果分数超过某一阈值,则认为为该两段语音内容属于同一发言人的发言。
通过对所述音频记录信息中每一段语音内容与参会人员的ID编号建立进行映射关系,即可根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容。
步骤S504,对每一所述发言人的发言内容进行关键字提取。
在一实施例中,可以先将各发言人的语音内容转化成对应的文本后再进行关键字提取。可选地,当转换得到的文字内容有多段时,可以先按照一定的顺序对多段文字内容进行排序。例如,可以按照时间轴(如根据文字内容的生成顺序或句数、序号等)对多段文字内容进行排序。
在一实施方式中,可以采用TF-IDF算法来提取每一所述发言人的发言内容的关键字。TF-IDF算法可以用于评估一字词对于一个发言文本中的重要程度。字词的重要性会随着它在文本中出现的次数成正比增加。在进行TF-IDF计算时,通过词频(TF)与逆文档频率(IDF)得出某个字词的TF-IDF值,若该字词对发言文本的重要性越高则该TF-IDF值越大。因此可以将TF-IDF值排在最 前面的几个字词作为该发言文本的关键词。例如,将TF-IDF值排在前五的字词作为该发言文本的关键词。
步骤S506,根据所述提取的关键字生成与所述会议对应的会议纪要。
在一实施方式中,可以根据提取的关键字并结合每一关键字所属的发言内容来生成会议纪要。在本申请的其他实施方式中,还可以进一步将发言者的语调(一般来说,语音内容的语调越高,相应的,该语音内容的重要性越高)作为考量参数,以生成所述会议纪要。
在一实施方式中,还可以通过NLP自然语言算法对上述生成的会议纪要进行进一步处理,以生成语义更通顺、规范的会议纪要。基于NLP自然语言算法建立的NLP分析引擎可以预先搜集并存储有大量的真实语料,从而可以实现对会议纪要中的字词中有瑕疵或不规范的语言行为进行修订。
通过上述步骤S502-S506,本申请所提出的会议纪要生成方法,首先,获取会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容;其次,对每一所述发言人的发言内容进行关键字提取;再者,根据所述提取的关键字生成与所述会议对应的会议纪要;最后,将生成的会议纪要以邮件或传真形式发送给预设用户,或向所述预设用户提供链接以获取所述会议纪要。这样,可以实现根据会议内容记录自动总结并生成会议纪要,方便参会人员对会议内容的回顾,会议中的参会人员可以更专注于会议内容与进程,会议结束后,精简、准确的会议纪要也可以供其他有需求人员进行查阅与参考引用,相比于传统的人工记录整理,本方案更高效准确,同时节省了人力资源成本。
参阅图6所示,是本申请会议纪要生成方法第二实施例的实施流程示意图。在本实施例中,根据不同的需求,图6所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。
步骤S500,获取每一所述发言人的语音样本,并从每一所述发言人的语音样本中提取出每一所述发言人的声音特征。
具体地,可以在进行会议前,要求每一参会人员通过语音方式进行会议签到以获取语音样本,从而来实现预先录取每一参会人员的声音并进行声音特征提取。
步骤S502,获取一会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容。
在一实施例中,当电话会议开始后,应用服务器2通过各终端设备1采集会议语音内容,接收各终端设备1发送的语音内容并予以保存,语音内容可以保存为指定的音频格式,如MP3、wma、wav等。
具体的,当终端设备1一侧的参会人员开始发言时,该终端设备1则通过声音采集装置(例如麦克风)采集语音内容。该终端设备1可以将采集的语音内容实时或定时的发送给应用服务器2,或者,当该终端设备1这侧的参会人员结束一次发言后,该终端设备1才将本次持续采集的语音内容发送给应用服务器2。应用服务器2接收到终端设备1发送的语音内容后,对语音内容予以保存。
由于应用服务器2上保存有一会议的全程语音内容,则可以从应用服务器2上获取该会议的音频记录信息。在本实施方式中,音频记录信息优选是该会议的语音内容。在本申请的其他实施方式中,若电话会议为视频电话会议,则应用服务器2接收并保存的会议记录是音视频(语音和视频画面)内容,此时,获取的音频记录信息同样优选是该会议的语音内容。
每一发言人(参会人员)的声音特征可以在进行会议前进行预先获取。具体地,每一参会人员被预先设定有唯一的ID编号。会议前预先录取每一参会人员的声音特征,然后根据每一参会人员的声音特征与ID编号建立一身份索引表。该身份索引表中存储了每一参会人员的声音特征与每一参会人员的ID的对应关系,进而可以实现对参会成员身份进行确认。所述参会人员可以来自本端或者远端的发言人员。
在一实施方式中,可以将发言者的声音特征生成一发言者模型,将该发 言者模型与对应的发言者ID编号存储在身份索引表中。
在完成参会人员的身份索引表建立后,当需要分析音频记录信息中某一段语音内容属于那个发言人的发言内容时,需要先提取该段语音内容的发言人声音特征,并将该声音特征与身份索引表中的每一发言者模型进行比较,并得到匹配得分。如果匹配得分达到一预设分数,则表明索引表中存在该声音特征参数对应的发言者模型,由此即可得到该发言者ID编号,确认该发言者身份。否则,表明索引表中不存在与该声音特征对应的发言者模型,则根据该声音特征生成新的发言者模型以及新的ID编号,并存储在身份索引表中,以便后续方便查找匹配。
在进行匹配打分时,可以使用一个UBM模型(通用背景模型)和i-vector提取算法来进行匹配打分。举例而言,从两段语音内容中计算i-vector值作为该两段语音内容的发言人的声音特征。对于两个计算得到的i-vector值,利用dot-product(点积)算法或者PLDA算法对输入项进行打分,如果分数超过某一阈值,则认为为该两段语音内容属于同一发言人的发言。
通过对所述音频记录信息中每一段语音内容与参会人员的ID编号建立进行映射关系,即可根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容。
步骤S504,对每一所述发言人的发言内容进行关键字提取。
在一实施例中,可以先将各发言人的语音内容转化成对应的文本后再进行关键字提取。可选地,当转换得到的文字内容有多段时,可以先按照一定的顺序对多段文字内容进行排序。例如,可以按照时间轴(如根据文字内容的生成顺序或句数、序号等)对多段文字内容进行排序。
在一实施方式中,可以采用TF-IDF算法来提取每一所述发言人的发言内容的关键字。TF-IDF算法可以用于评估一字词对于一个发言文本中的重要程度。字词的重要性会随着它在文本中出现的次数成正比增加。在进行TF-IDF计算时,通过词频(TF)与逆文档频率(IDF)得出某个字词的TF-IDF值,若该字 词对发言文本的重要性越高则该TF-IDF值越大。因此可以将TF-IDF值排在最前面的几个字词作为该发言文本的关键词。例如,将TF-IDF值排在前五的字词作为该发言文本的关键词。
步骤S506,根据所述提取的关键字生成与所述会议对应的会议纪要。
在一实施方式中,可以根据提取的关键字并结合每一关键字所属的发言内容来生成会议纪要。在本申请的其他实施方式中,还可以进一步将发言者的语调(一般来说,语音内容的语调越高,相应的,该语音内容的重要性越高)作为考量参数,以生成所述会议纪要。
在一实施方式中,还可以通过NLP自然语言算法对上述生成的会议纪要进行进一步处理,以生成语义更通顺、规范的会议纪要。基于NLP自然语言算法建立的NLP分析引擎可以预先搜集并存储有大量的真实语料,从而可以实现对会议纪要中的字词中有瑕疵或不规范的语言行为进行修订。
步骤S508,将所述会议纪要以邮件或传真形式发送给预设用户,或向所述预设用户提供链接以获取所述会议纪要。所述预设用户可以是参会人员或者其他预先指定的人员。
在一实施方式中,在存储或发送会议纪要之前,还可以对会议纪要进行加密,以保证数据安全。例如,对会议纪要进行压缩加密,解压密码为指定密码或者为各参会人员公知的或约定的密码
通过上述步骤S500-S508,本申请所提出的会议纪要生成方法,首先,获取每一所述发言人的语音样本,并从每一所述发言人的语音样本中提取出每一所述发言人的声音特征;其次,获取会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容;再者,对每一所述发言人的发言内容进行关键字提取;再者,根据所述提取的关键字生成与所述会议对应的会议纪要;最后,将生成的会议纪要以邮件或传真形式发送给预设用户,或向所述预设用户提供链接以获取所述会议纪要。这样,可以实现根据会议内容记录自动总结并生成会议纪要,方便参会 人员对会议内容的回顾,会议中的参会人员可以更专注于会议内容与进程,会议结束后,精简、准确的会议纪要也可以供其他有需求人员进行查阅与参考引用,相比于传统的人工记录整理,本方案更高效准确,同时节省了人力资源成本。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种会议纪要生成方法,应用于应用服务器,其特征在于,所述方法包括:
    获取一会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容;
    对每一所述发言人的发言内容进行关键字提取;及
    根据所述提取的关键字生成与所述会议对应的会议纪要。
  2. 如权利要求1所述的会议纪要生成方法,其特征在于,所述方法还包括:
    获取每一所述发言人的语音样本,并从每一所述发言人的语音样本中提取出每一所述发言人的声音特征。
  3. 如权利要求1所述的会议纪要生成方法,其特征在于,所述根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容的步骤包括:
    对每一所述发言人设置一ID编号,并根据每一所述发言人的声音特征建立一发言者模型;
    从所述音频记录信息中的第一段语音中提取出发言人的声音特征;
    将所述提取的声音特征与所述多个发言者模型进行比较,并得到匹配得分;及
    根据匹配得分的高低确定所述第一段语音的发言人的ID编号。
  4. 如权利要求2所述的会议纪要生成方法,其特征在于,所述根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容的步骤包括:
    对每一所述发言人设置一ID编号,并根据每一所述发言人的声音特征建立一发言者模型;
    从所述音频记录信息中的第一段语音中提取出发言人的声音特征;
    将所述提取的声音特征与所述多个发言者模型进行比较,并得到匹配得分;及
    根据匹配得分的高低确定所述第一段语音的发言人的ID编号。
  5. 如权利要求1所述的会议纪要生成方法,其特征在于,所述对每一所述发言人的发言内容进行关键字提取的步骤包括:
    将每一所述发言人的发言内容转换为文字内容;
    通过TF-IDF算法计算所述文字内容中每个词语的TF-IDF值;及
    将TF-IDF值排名靠前的词语认定为所述发言内容的关键字并进行提取。
  6. 如权利要求2所述的会议纪要生成方法,其特征在于,所述对每一所述发言人的发言内容进行关键字提取的步骤包括:
    将每一所述发言人的发言内容转换为文字内容;
    通过TF-IDF算法计算所述文字内容中每个词语的TF-IDF值;及
    将TF-IDF值排名靠前的词语认定为所述发言内容的关键字并进行提取。
  7. 根据权利要求1所述的会议纪要生成方法,其特征在于,所根据所述提取的关键字生成与所述会议对应的会议纪要的步骤包括:
    根据所述提取的关键字生成会议主旨内容;及
    利用自然语言算法对所述会议主旨内容进行处理,以生成所述会议对应的会议纪要。
  8. 根据权利要求1所述的会议纪要生成方法,其特征在于,所述方法还包括:
    将所述会议纪要以邮件或传真形式发送给预设用户,或向所述预设用户提供链接以获取所述会议纪要。
  9. 一种应用服务器,其特征在于,所述应用服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的会议纪要生成系统,所述会议纪要生成系统被所述处理器执行时实现如下步骤:
    获取一会议的音频记录信息,并根据每一发言人的声音特征从所述音频 记录信息中提取出每一所述发言人的发言内容;
    对每一所述发言人的发言内容进行关键字提取;及
    根据所述提取的关键字生成与所述会议对应的会议纪要。
  10. 如权利要求9所述的应用服务器,其特征在于,所述会议纪要生成系统被所述处理器执行时还实现步骤:
    获取每一所述发言人的语音样本,并从每一所述发言人的语音样本中提取出每一所述发言人的声音特征。
  11. 如权利要求9所述的应用服务器,其特征在于,所述根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容的步骤,具体包括:
    对每一所述发言人设置一ID编号,并根据每一所述发言人的声音特征建立一发言者模型;
    从所述音频记录信息中的第一段语音中提取出发言人的声音特征;
    将所述提取的声音特征与所述多个发言者模型进行比较,并得到匹配得分;及
    根据匹配得分的高低确定所述第一段语音的发言人的ID编号。
  12. 如权利要求9所述的应用服务器,其特征在于,所述对每一所述发言人的发言内容进行关键字提取的步骤,具体包括:
    将每一所述发言人的发言内容转换为文字内容;
    通过TF-IDF算法计算所述文字内容中每个词语的TF-IDF值;及
    将TF-IDF值排名靠前的词语认定为所述发言内容的关键字并进行提取。
  13. 如权利要求9所述的应用服务器,其特征在于,所根据所述提取的关键字生成与所述会议对应的会议纪要的步骤包括:
    根据所述提取的关键字生成会议主旨内容;及
    利用自然语言算法对所述会议主旨内容进行处理,以生成所述会议对应的会议纪要。
  14. 如权利要求9所述的应用服务器,其特征在于,所述会议纪要生成系统被所述处理器执行时还实现步骤:
    将所述会议纪要以邮件或传真形式发送给预设用户,或向所述预设用户提供链接以获取所述会议纪要。
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有会议纪要生成系统,所述会议纪要生成系统可被至少一个处理器执行,以使所述至少一个处理器执行如下步骤:
    获取一会议的音频记录信息,并根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容;
    对每一所述发言人的发言内容进行关键字提取;及
    根据所述提取的关键字生成与所述会议对应的会议纪要。
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述会议纪要生成系统被所述处理器执行时还实现步骤:
    获取每一所述发言人的语音样本,并从每一所述发言人的语音样本中提取出每一所述发言人的声音特征。
  17. 如权利要求15所述的计算机可读存储介质,其特征在于,所述根据每一发言人的声音特征从所述音频记录信息中提取出每一所述发言人的发言内容的步骤,具体包括:
    对每一所述发言人设置一ID编号,并根据每一所述发言人的声音特征建立一发言者模型;
    从所述音频记录信息中的第一段语音中提取出发言人的声音特征;
    将所述提取的声音特征与所述多个发言者模型进行比较,并得到匹配得分;及
    根据匹配得分的高低确定所述第一段语音的发言人的ID编号。
  18. 如权利要求15所述的计算机可读存储介质,其特征在于,所述对每一所述发言人的发言内容进行关键字提取的步骤,具体包括:
    将每一所述发言人的发言内容转换为文字内容;
    通过TF-IDF算法计算所述文字内容中每个词语的TF-IDF值;及
    将TF-IDF值排名靠前的词语认定为所述发言内容的关键字并进行提取。
  19. 如权利要求15所述的计算机可读存储介质,其特征在于,所根据所述提取的关键字生成与所述会议对应的会议纪要的步骤包括:
    根据所述提取的关键字生成会议主旨内容;及
    利用自然语言算法对所述会议主旨内容进行处理,以生成所述会议对应的会议纪要。
  20. 如权利要求15所述的计算机可读存储介质,其特征在于,所述会议纪要生成系统被所述处理器执行时还实现步骤:
    将所述会议纪要以邮件或传真形式发送给预设用户,或向所述预设用户提供链接以获取所述会议纪要。
PCT/CN2018/077628 2017-11-17 2018-02-28 会议纪要生成方法、应用服务器及计算机可读存储介质 WO2019095586A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711141751.5A CN108022583A (zh) 2017-11-17 2017-11-17 会议纪要生成方法、应用服务器及计算机可读存储介质
CN201711141751.5 2017-11-17

Publications (1)

Publication Number Publication Date
WO2019095586A1 true WO2019095586A1 (zh) 2019-05-23

Family

ID=62080675

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077628 WO2019095586A1 (zh) 2017-11-17 2018-02-28 会议纪要生成方法、应用服务器及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN108022583A (zh)
WO (1) WO2019095586A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113014540A (zh) * 2020-11-24 2021-06-22 腾讯科技(深圳)有限公司 一种数据处理方法、装置、设备及存储介质

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109525800A (zh) * 2018-11-08 2019-03-26 江西国泰利民信息科技有限公司 一种远程会议语音识别数据传输方法
CN109361825A (zh) * 2018-11-12 2019-02-19 平安科技(深圳)有限公司 会议纪要记录方法、终端及计算机存储介质
CN109473103A (zh) * 2018-11-16 2019-03-15 上海玖悦数码科技有限公司 一种会议纪要生成方法
CN109543173A (zh) * 2018-11-30 2019-03-29 苏州麦迪斯顿医疗科技股份有限公司 抢救记录生成方法、装置、电子设备及存储介质
CN109803059A (zh) * 2018-12-17 2019-05-24 百度在线网络技术(北京)有限公司 音频处理方法和装置
CN111415128B (zh) * 2019-01-07 2024-06-07 阿里巴巴集团控股有限公司 控制会议的方法、系统、装置、设备和介质
CN109960743A (zh) * 2019-01-16 2019-07-02 平安科技(深圳)有限公司 会议内容区分方法、装置、计算机设备及存储介质
CN110049270B (zh) * 2019-03-12 2023-05-30 平安科技(深圳)有限公司 多人会议语音转写方法、装置、系统、设备及存储介质
CN110010130A (zh) * 2019-04-03 2019-07-12 安徽阔声科技有限公司 一种面向参会者同步语音转写文字的智能方法
CN110134756A (zh) * 2019-04-15 2019-08-16 深圳壹账通智能科技有限公司 会议记录生成方法、电子装置及存储介质
CN110298252A (zh) * 2019-05-30 2019-10-01 平安科技(深圳)有限公司 会议纪要生成方法、装置、计算机设备及存储介质
CN110322872A (zh) * 2019-06-05 2019-10-11 平安科技(深圳)有限公司 会议语音数据处理方法、装置、计算机设备和存储介质
CN111277589A (zh) * 2020-01-19 2020-06-12 腾讯云计算(北京)有限责任公司 会议文档生成方法及装置
CN111626061A (zh) * 2020-05-27 2020-09-04 深圳前海微众银行股份有限公司 会议记录生成方法、装置、设备及可读存储介质
CN111666746B (zh) * 2020-06-05 2023-09-29 中国银行股份有限公司 会议纪要的生成方法及装置、电子设备及存储介质
CN113782026A (zh) * 2020-06-09 2021-12-10 北京声智科技有限公司 一种信息处理方法、装置、介质和设备
CN111787172A (zh) * 2020-06-12 2020-10-16 深圳市珍爱捷云信息技术有限公司 基于移动终端实现电话会议方法、装置、服务器和存储介质
CN111797226B (zh) * 2020-06-30 2024-04-05 北京百度网讯科技有限公司 会议纪要的生成方法、装置、电子设备以及可读存储介质
CN111899742B (zh) * 2020-08-06 2021-03-23 广州科天视畅信息科技有限公司 一种提高会议进行效率的方法及系统
CN112687272B (zh) * 2020-12-18 2023-03-21 北京金山云网络技术有限公司 一种会议纪要的记录方法、装置及电子设备
CN113766170A (zh) * 2021-09-18 2021-12-07 苏州科天视创信息科技有限公司 基于音视频的在线会议多端资源共享方法及系统
CN114757155B (zh) * 2022-06-14 2022-09-27 深圳乐播科技有限公司 一种会议文档的生成方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102572372A (zh) * 2011-12-28 2012-07-11 中兴通讯股份有限公司 会议纪要的提取方法和装置
CN104427292A (zh) * 2013-08-22 2015-03-18 中兴通讯股份有限公司 会议纪要的提取方法及装置
US20150348538A1 (en) * 2013-03-14 2015-12-03 Aliphcom Speech summary and action item generation
CN106448675A (zh) * 2016-10-21 2017-02-22 科大讯飞股份有限公司 识别文本修正方法及系统
CN106802885A (zh) * 2016-12-06 2017-06-06 乐视控股(北京)有限公司 一种会议纪要自动记录方法、装置和电子设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9560206B2 (en) * 2010-04-30 2017-01-31 American Teleconferencing Services, Ltd. Real-time speech-to-text conversion in an audio conference session
CN105957531B (zh) * 2016-04-25 2019-12-31 上海交通大学 基于云平台的演讲内容提取方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102572372A (zh) * 2011-12-28 2012-07-11 中兴通讯股份有限公司 会议纪要的提取方法和装置
US20150348538A1 (en) * 2013-03-14 2015-12-03 Aliphcom Speech summary and action item generation
CN104427292A (zh) * 2013-08-22 2015-03-18 中兴通讯股份有限公司 会议纪要的提取方法及装置
CN106448675A (zh) * 2016-10-21 2017-02-22 科大讯飞股份有限公司 识别文本修正方法及系统
CN106802885A (zh) * 2016-12-06 2017-06-06 乐视控股(北京)有限公司 一种会议纪要自动记录方法、装置和电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113014540A (zh) * 2020-11-24 2021-06-22 腾讯科技(深圳)有限公司 一种数据处理方法、装置、设备及存储介质
CN113014540B (zh) * 2020-11-24 2022-09-27 腾讯科技(深圳)有限公司 一种数据处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN108022583A (zh) 2018-05-11

Similar Documents

Publication Publication Date Title
WO2019095586A1 (zh) 会议纪要生成方法、应用服务器及计算机可读存储介质
US10958598B2 (en) Method and apparatus for generating candidate reply message
CN103187053B (zh) 输入方法和电子设备
CN111666746B (zh) 会议纪要的生成方法及装置、电子设备及存储介质
CN109388701A (zh) 会议记录生成方法、装置、设备和计算机存储介质
US10846330B2 (en) System and methods for vocal commenting on selected web pages
CN104346480B (zh) 信息挖掘方法和装置
CN110866110A (zh) 基于人工智能的会议纪要生成方法、装置、设备及介质
US20140280186A1 (en) Crowdsourcing and consolidating user notes taken in a virtual meeting
CN109657181B (zh) 互联网信息链式存储方法、装置、计算机设备及存储介质
CN104158945A (zh) 通话信息获取方法、装置及系统
WO2019148585A1 (zh) 会议摘要生成方法以及装置
WO2020103447A1 (zh) 视频信息链式存储方法、装置、计算机设备及存储介质
CN106713111B (zh) 一种添加好友的处理方法、终端及服务器
CN110750619B (zh) 聊天记录关键词的提取方法、装置、计算机设备及存储介质
CN112446622A (zh) 企业微信会话评价方法、系统、电子设备及存储介质
CN110705235A (zh) 业务办理的信息录入方法、装置、存储介质及电子设备
CN108846098B (zh) 一种信息流摘要生成及展示方法
CN109582906B (zh) 数据可靠度的确定方法、装置、设备和存储介质
CN111798118B (zh) 企业经营风险监控方法及装置
KR102030551B1 (ko) 인스턴트 메신저 구동 장치 및 그 동작 방법
WO2019071907A1 (zh) 基于操作页面识别帮助信息的方法及应用服务器
CN111223487A (zh) 一种信息处理方法及电子设备
CN113741864A (zh) 基于自然语言处理的语义化服务接口自动设计方法与系统
WO2021103594A1 (zh) 一种默契度检测方法、设备、服务器及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18878912

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.10.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18878912

Country of ref document: EP

Kind code of ref document: A1