[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113096674A - Audio processing method and device and electronic equipment - Google Patents

Audio processing method and device and electronic equipment Download PDF

Info

Publication number
CN113096674A
CN113096674A CN202110341744.XA CN202110341744A CN113096674A CN 113096674 A CN113096674 A CN 113096674A CN 202110341744 A CN202110341744 A CN 202110341744A CN 113096674 A CN113096674 A CN 113096674A
Authority
CN
China
Prior art keywords
audio data
target
audio
application
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110341744.XA
Other languages
Chinese (zh)
Other versions
CN113096674B (en
Inventor
徐杰
朱洪雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202110341744.XA priority Critical patent/CN113096674B/en
Publication of CN113096674A publication Critical patent/CN113096674A/en
Application granted granted Critical
Publication of CN113096674B publication Critical patent/CN113096674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses an audio processing method, an audio processing device and electronic equipment, the method is applied to a virtual collector, the virtual collector is a part of an operating system of the electronic equipment, responds to a received audio acquisition instruction, performs audio mixing processing on acquired first audio data and acquired second audio data to obtain target audio data, and outputs the target audio data to target application, so that the target application outputs target audio, at least two audio data are subjected to audio mixing output through the virtual collector, the requirement that the target application needs to mix and output a plurality of audio data is met, and the processing effect of the audio data is improved.

Description

Audio processing method and device and electronic equipment
Technical Field
The present application relates to the field of information processing technologies, and in particular, to an audio processing method and apparatus, and an electronic device.
Background
With the development of communication technology, more and more users select video conferences or audio conferences to replace the conventional face-to-face conferences.
When a user uses online conference software to start a conference, the conference software only receives and transmits the sound of the current speaker, and when the speaker needs to demonstrate a section of video or audio to other participants, the other participants cannot hear the sound, so that the effect of the online conference is influenced.
Disclosure of Invention
In view of this, the present application provides the following technical solutions:
an audio processing method is applied to a virtual collector, wherein the virtual collector is a part of an operating system of electronic equipment, and the method comprises the following steps:
receiving an audio acquisition instruction of a target application;
responding to the audio acquisition instruction, acquiring first audio data and second audio data at least matched with the first audio data, wherein the first audio data is audio generated based on a target object received by a microphone of the electronic equipment, and the second audio data is audio generated by the electronic equipment;
performing sound mixing processing on the first audio data and the second audio data to obtain target audio data;
outputting the target audio data to the target application to cause the target application to output the target audio data.
Optionally, the acquiring first audio data and second audio data at least matching the first audio data includes:
receiving first audio data;
and generating an audio selection instruction in response to the target object generating the first audio data, and acquiring second audio data matched with the audio selection instruction in a local storage area of the electronic equipment.
Optionally, the acquiring first audio data and second audio data at least matching the first audio data includes:
receiving first audio data;
analyzing the first audio data;
and if the analysis result comprises the audio acquisition keywords, acquiring second audio data matched with the audio acquisition keywords.
Optionally, the mixing the first audio data and the second audio data to obtain target audio data includes:
respectively carrying out sampling processing on the first audio data and the second audio data according to a target sampling rate;
synthesizing the sampled first audio data and the sampled second audio data into target audio data, wherein the target audio data has the target sampling rate.
Optionally, the method further comprises:
sensitive word filtering is carried out on the target audio data, and filtered target audio data are obtained;
and sending the filtered target audio data to a target application, so that the target application outputs the filtered target audio data.
Optionally, the performing sensitive word filtering on the target audio data to obtain filtered target audio data includes:
acquiring attribute information of a receiver corresponding to the audio data output by the target application;
determining sensitive words matched with the attribute information;
and filtering the sensitive words of the target audio data to obtain filtered target audio data.
Optionally, the performing sensitive word filtering on the target audio data to obtain filtered target audio data includes:
and deleting the audio clip corresponding to the sensitive word in the target audio data to obtain the deleted target audio data.
Optionally, the method further comprises:
acquiring associated information matched with the target audio data;
and deleting the associated information segment matched with the audio segment corresponding to the sensitive word to obtain the deleted associated information.
An audio processing apparatus applied to a virtual collector, the virtual collector being a part of an operating system of an electronic device, the apparatus comprising:
the receiving unit is used for receiving an audio acquisition instruction of a target application;
the acquiring unit is used for responding to the audio acquiring instruction, acquiring first audio data and second audio data at least matched with the first audio data, wherein the first audio data is audio generated based on a target object received by a microphone of the electronic equipment, and the second audio data is audio generated by the electronic equipment;
the processing unit is used for carrying out sound mixing processing on the first audio data and the second audio data to obtain target audio data;
an output unit configured to output the target audio data to the target application so that the target application outputs the target audio data.
An electronic device comprising a memory and a virtual collector that is part of an operating system of the electronic device, wherein,
the memory is used for storing an application program and data generated by the operation of the application program;
the virtual collector is used for executing the application program to realize that:
receiving an audio acquisition instruction of a target application;
responding to the audio acquisition instruction, acquiring first audio data and second audio data at least matched with the first audio data, wherein the first audio data is audio generated based on a target object received by a microphone of the electronic equipment, and the second audio data is audio generated by the electronic equipment;
performing sound mixing processing on the first audio data and the second audio data to obtain target audio data;
outputting the target audio data to the target application to cause the target application to output the target audio data.
A storage medium having stored thereon computer program code which, when executed by a processor, implements an audio processing method as described in any of the above.
According to the technical scheme, the audio processing method, the audio processing device and the electronic equipment are applied to a virtual collector, the virtual collector is a part of an operating system of the electronic equipment, responds to a received audio acquisition instruction, performs audio mixing processing on acquired first audio data and acquired second audio data to obtain target audio data, and outputs the target audio data to a target application, so that the target application outputs target audio, the audio mixing output of at least two audio data through the virtual collector is realized, the requirement that the target application needs to output a plurality of audio data in a mixed mode is met, and the processing effect of the audio data is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart of an audio processing method according to an embodiment of the present application;
fig. 2 is a schematic view of an application scenario of a teleconference according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating a streaming of audio data according to an embodiment of the present application;
fig. 4 is a schematic flowchart of another audio processing method according to an embodiment of the present application;
FIG. 5 is a flow chart illustrating filtering of sensitive words in audio data according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the Application provides an audio processing method, which is applied to a virtual collector, where the virtual collector is a part of an operating system of an electronic device, or a program file integrated in the operating system, and may provide audio data for an Application program (APP) in the electronic device. Referring to fig. 1, a flow chart of an audio processing method provided by an embodiment of the present application is shown, where the method may include the following steps:
s101, receiving an audio acquisition instruction of a target application.
The target application is an application installed on the electronic device, and the target application needs to input and output audio data, for example, the target application may be a conference application, a live application, and the like. The audio obtaining instruction is generated by the target application, for example, the audio obtaining instruction may be to obtain the voice of the current speaker, or to output local audio data through the target application.
And S102, responding to the target audio instruction, and acquiring first audio data and second audio data at least matched with the first audio data.
The audio processing method of the embodiment of the application is mainly applied to a scene where at least two paths of audio need to be subjected to audio mixing output, and therefore after an audio acquisition instruction is received, first audio data and second audio data at least matched with the first audio data are acquired, wherein the first audio data are audio generated by a target object received by a microphone of electronic equipment, and the second audio data are audio generated by the electronic equipment. The microphone of the electronic device refers to an audio receiving device associated with the electronic device, i.e. a physical microphone, and may be a microphone device integrated on the electronic device, or may be a microphone device connected to the electronic device through a data transmission interface, such as a USB interface. The first audio data is audio data of a current speaker received by a microphone of the electronic device in a current environment, such as sound data of a speaker in a conference scene, or audio data of a speaker in a conference, or audio data of an online education scene, and the first audio data may be audio of a teacher's explanation. The second audio data is audio generated by the electronic device, i.e. the second audio data is different from the first audio data and is not audio produced by the target object and objects having the same properties as the target object, i.e. is not audio data produced by the speaker. The second audio data may be audio data stored locally in the electronic device, audio data stored in an external storage device connected to the electronic device, or audio data transmitted to the electronic device through data transmission, such as audio data obtained through network search. The second audio data is matched with the first audio data, that is, has a certain association relationship, which may be an association relationship in time, such as setting when the second audio data is inserted in the process of generating the first audio data, or an association relationship in content, such as when the first audio data includes specific information, invoking the second audio data based on the specific information.
S103, performing sound mixing processing on the first audio data and the second audio data to obtain target audio data.
S104, outputting the target audio data to the target application so that the target application outputs the target audio data.
In the embodiment of the present invention, mixing the first audio data and the second audio data means that the first audio data and the second audio data are synthesized, for example, in a conference scene, sound data of a speaker may be synthesized with background music data to obtain target audio data. It should be noted that in the embodiment of the present application, the audio mixing processing of the audio data is completed in the virtual collector, and it is not necessary for the target application to separately receive each audio and then synthesize the audio, that is, the audio data received by the target application is the target audio data after the audio mixing processing. Therefore, the utilization and output of the mixed audio can be realized no matter whether the target application in the current electronic equipment has the mixing function or not. And after the virtual acquisition device performs sound mixing processing to obtain target audio data, outputting the target audio data to a target application so that the target application outputs the target audio data. Still taking the example of outputting the speaker's voice and background music in a mixed manner in the conference scene, the other persons participating in the conference remotely connect to the voice received in the conference application, i.e. the voices heard by these persons are the speaker's voice and background music.
The audio processing method provided in the embodiment of the application is applied to a virtual collector, the virtual collector is a part of an operating system of an electronic device, responds to a received audio acquisition instruction, performs audio mixing processing on acquired first audio data and acquired second audio data to obtain target audio data, and outputs the target audio data to a target application, so that the target application outputs a target audio, at least two audio data are output in an audio mixing mode through the virtual collector, the requirement that the target application needs to output a plurality of audio data in a mixing mode is met, and the processing effect of the audio data is improved.
In the embodiment of the present application, the second audio data may be generated based on the selection of the object generating the first audio data, or may be obtained based on the automatic recognition of the first audio data.
In a possible implementation, the obtaining first audio data and second audio data at least matching the first audio data includes: receiving first audio data; and generating an audio selection instruction in response to the target object generating the first audio data, and acquiring second audio data matched with the audio selection instruction in a local storage area of the electronic equipment.
In this embodiment, the second audio data is generated by a user, where the user is a user generating the first audio data, and if the user needs to explain according to the background audio in a conference scene, the target object generating the first audio data selects the background audio to be played in the local storage area of the electronic device. At this time, the background audio is the second audio data matched with the first audio data, and when the two paths of audio are output subsequently, the two paths of audio are merged and output, that is, the audio data obtained by other users of the target application includes the speaking audio and the background audio of the target object. Correspondingly, the termination output of the background audio may also be determined by the target object, i.e. the target object may control the playing, pausing and termination of the background audio. The background audio can be heard by the target object, and other users of the target application can receive the background audio and sound data of the target object at the same time.
In another possible implementation, the obtaining first audio data and second audio data at least matching the first audio data includes: receiving first audio data; analyzing the first audio data; and if the analysis result comprises the audio acquisition keywords, acquiring second audio data matched with the audio acquisition keywords.
In this embodiment, an automatic parsing of the first audio data is achieved to obtain whether to add the second audio data, and what kind of second audio data to add. For example, the first audio data may be the audio data output by the target object, specifically, "the text we explain today is … … from the second lesson of the third unit, and we listen to the song" … … "related to the text, and when the keyword" song "is recognized, search the local storage area for the song and play it, and then output the audio data of the song as the second audio data, and mix the audio data generated by the target object during playing to other users of the target application. It should be noted that, in addition to the search and call of the audio by using the keyword in the local storage area, the network module of the electronic device may send the search request to the corresponding audio cloud platform to obtain the corresponding network data as the second audio data.
In order to enable a better playing effect after audio data transmission after audio mixing processing, in an embodiment of the present application, the mixing processing the first audio data and the second audio data to obtain target audio data includes: respectively carrying out sampling processing on the first audio data and the second audio data according to a target sampling rate; and combining the sampled first audio data and the sampled second audio data to obtain target audio data, wherein the target audio data has the target sampling rate.
The process of synthesizing at least two audio channels is called mixing, and in the embodiment of the present application, the two audio channels are first audio data and second audio data, respectively. Audio data can be understood as the situation where a point is constantly vibrating in one axis (the vertical direction of the diaphragm) over time, the audio sampling rate refers to the number of times the sound signal of the recording device is sampled within one second, and the higher the sampling frequency, the more realistic and natural the sound reproduction. In the embodiment of the application, after the virtual acquisition unit acquires the first audio data and the second audio data, the first audio data and the second audio data are resampled, so that target audio data with a consistent sampling rate, namely a target sampling rate, is obtained, the resolution of audio mixing is improved, the output target audio data has higher tone quality, the audio heard by a user of a target application is clearer, and the audio processing effect is improved.
The following describes an audio processing method in the embodiment of the present application, taking a target application as a teleconference application as an example. Referring to fig. 2, a schematic diagram of an application scenario of a teleconference, provided by an embodiment of the present application, is shown.
In the scenario shown in fig. 2, the user a is a speaker of the current conference, and needs to share music theory application in combination with music played by the electronic device. The explanation audio of the user A, namely the audio a, and the music B played by the electronic device are mixed to obtain a target audio C, the target audio C is output to the teleconference application currently adopted by the user A, and the teleconference application outputs the target audio C to other users participating in the teleconference, such as the user B, the user C and the user D.
In the embodiment of the present invention, a virtual collector is created on an electronic device, and the virtual collector is not a part of an application program, but a part of an operating system, and may also be understood as a program file integrated on the operating system. The sound input device of the current operating system of the electronic device is set as the virtual collector, then the data of the real physical microphone is recorded, the sound data of the system playing device is captured, the two paths of audio data are combined and sent to the teleconference application through the virtual collector, and the participants can hear the speaking sound of the speaker and the sound played by the electronic device of the speaker.
Referring to fig. 3, a schematic diagram of audio data streaming provided by an embodiment of the present application is shown. It should be noted that the modules, devices, and data streams included in the embodiment shown in fig. 3 are only selected for explaining the audio processing method according to the embodiment of the present application, and the actual application process may be flexibly selected based on the structure of the electronic device and the specific application scenario.
The system sound input device is arranged as a virtual collector which is part of the operating system and which may also be understood as a virtual microphone driver. The microphone recording module can record voice data of a speaker speaking from a microphone (namely a physical microphone of the electronic equipment); the sound recording module records playing sound from the system output device, that is, it records sound played by the electronic device, and the starting of the sound recording module may be performed based on an instruction of the speaker, for example, when the speaker selects a certain local audio data to play, the sound recording module records the local audio data. And then the sound mixing module performs sound mixing processing on the speaker sound data transmitted by the microphone recording module and the local audio data transmitted by the sound recording module to obtain target audio data, the target audio data is sent to the virtual collector, and the virtual collector transmits the target audio data to the target application, so that the target application acquires the target audio data and transmits the target audio data to other participants.
Therefore, in the embodiment of the application, the virtual collector is used for mixing at least two paths of sounds and outputting the mixed sounds to the corresponding target application, and the mixed audio data can be output to other users no matter whether the target application has the sound mixing function, so that the software development cost is reduced.
In order to improve the experience effect of users, the output target audio data is suitable for each user, and the application environment of the audio data is improved. The audio processing method provided in the embodiment of the present application further includes filtering specific audio information, such as filtering audio related to sensitive words. Referring to fig. 4, a schematic flow chart of another audio processing method provided in the embodiment of the present application is shown, where the method may include the following steps:
s201, receiving an audio acquisition instruction of a target application;
s202, responding to the audio acquisition instruction, acquiring first audio data and second audio data at least matched with the first audio data;
s203, performing sound mixing processing on the first audio data and the second audio data to obtain target audio data;
s204, sensitive word filtering is carried out on the target audio data to obtain filtered target audio data;
s205, sending the filtered target audio data to a target application, so that the target application outputs the filtered target audio data.
In the embodiment of the application, after the target audio data is obtained, the sensitive words need to be filtered, and the filtered target audio data is obtained and output. When filtering the sensitive words, the target audio may be converted into a text, the text is then matched with the sensitive words, and the matched parts are filtered, or an audio segment having the audio data characteristic is filtered from the target audio data according to the audio data characteristic corresponding to the sensitive words. The specific operation of filtering can be flexibly selected by combining with an application scene, for example, an audio clip with sensitive words can be deleted and then output, or the audio clip can be output after being subjected to silencing treatment. The sensitive words can be determined by the user in different application scenes, or the sensitive words can be stored in a sensitive word library, and the audio data is compared with the words in the sensitive word library.
It should be noted that in the embodiment of the present application, sensitive word filtering may be performed on target audio data, or sensitive word filtering may be performed on first audio data and/or second audio data before audio mixing processing, and the filtered first audio data and second audio data are audio mixed to obtain target audio data. Or, the designated audio data may be filtered, for example, the first audio data is the voice of the speaker, the speaker may possibly avoid some sensitive words, and the second audio data is the local audio, at this time, the second audio data may be filtered for sensitive words, so that the output target audio data meets the requirement of sensitive word filtering.
The corresponding sensitive word may be determined according to different application scenarios or different information of the recipient of the target audio data, and the sensitive word may be filtered. That is, in this embodiment of the present application, the performing sensitive word filtering on target audio data to obtain filtered target audio data includes: acquiring attribute information of a receiver corresponding to audio data output by a target application; determining sensitive words matched with the attribute information; and performing sensitive word filtering on the target audio data to obtain filtered target audio data.
The receiver refers to a user corresponding to the target application after the virtual collector sends the target audio data to the target application. The attribute information may refer to characteristic information of the user, including information of the user's age, gender, occupation, and the like. For example, the sensitive words corresponding to users aged 6-10 years are different from those corresponding to users aged 18-35 years, and the sensitive words to be filtered may be determined based on the ages of the users, and then filtered. Specifically, in an online education application scene, juveniles usually use electronic equipment to watch corresponding videos or listen to audios, so that audio clips which are not suitable for juveniles can be deleted and played, the purpose of protecting the safety of the juveniles during online education audio data real-time transmission is met, namely, the audio data can be filtered without depending on a later-stage audio reprocessing mode, and the processing efficiency of the audio data is improved.
Correspondingly, sensitive word filtering is performed on the target audio data to obtain filtered target audio data, and the method comprises the following steps: and deleting the audio clip corresponding to the sensitive word in the target audio data to obtain the deleted target audio data.
In this embodiment, the identified audio segment matching the sensitive word is deleted, or a corresponding modification process, such as a silencing process, is performed, or the target audio segment is output instead of the audio segment matching the sensitive word.
In another embodiment of the present application, the associated information corresponding to the target audio data may also be synchronously filtered. Namely, the associated information matched with the target audio data is obtained, and the associated information segment matched with the audio segment corresponding to the sensitive word is deleted to obtain the deleted associated information.
If the audio segment A is identified to be matched with the sensitive word a, the audio segment A is deleted, and meanwhile, the associated information corresponding to the audio segment A is deleted, such as a text segment B, an image C and the like corresponding to the audio segment A.
It should be noted that, when the sensitive word filtering is performed in the embodiment of the present application, the content of the sensitive word may be dynamically updated, and different sensitive words may be used according to different user attribute characteristics, for example, different sensitive word libraries may be used according to the current user age. The method can filter out audio segments corresponding to the sensitive words in any audio data played and received on the electronic equipment.
The method for filtering the sensitive words of the audio data in the embodiment of the application is also applied to a virtual collector, that is, a sound playing device of a current system of the electronic device is set as the virtual collector, all audio applications or audio played by a network platform pass through the virtual collector, a voice recognition module is added in the virtual collector to recognize text content of voice, then the text content is analyzed, if the sensitive words exist, corresponding audio data segments are modified or deleted, and the processed audio data is played or output through the virtual collector.
Referring to fig. 5, which shows a schematic flow chart of filtering an audio data sensitive word provided in an embodiment of the present application, in the method, a system sound playing device is set as a virtual collector, and audio data played by all video applications, audio applications, browser applications, and the like are transmitted to the virtual collector. The virtual collector transmits the audio data to a filter, the filter can comprise a voice recognition module and a sensitive word bank, the voice recognition module converts the audio data into words, the words are compared with the sensitive words in the sensitive word bank, if the sensitive words appear in the text, the filter deletes corresponding audio segments, and the processed audio data is transmitted to physical player equipment to be played. In another possible implementation, the processed audio data may also be returned to the virtual collector, and the processed audio data is transmitted to other applications or electronic devices by the virtual collector for playing. Therefore, the audio data can be filtered in the application process, and the processing efficiency is improved. In addition, any audio content received or generated by the electronic equipment can be filtered, additional sensitive word recognition software is not needed, and recognition cost is reduced.
In an embodiment of the present application, an audio processing apparatus is further provided, where the audio processing apparatus is applied to a virtual collector, where the virtual collector is a part of an operating system of an electronic device, and referring to fig. 6, the apparatus includes:
a receiving unit 10, configured to receive an audio acquisition instruction of a target application;
an obtaining unit 20, configured to obtain, in response to the audio obtaining instruction, first audio data and second audio data at least matching the first audio data, where the first audio data is audio generated based on a target object received by a microphone of the electronic device, and the second audio data is audio generated by the electronic device;
a processing unit 30, configured to perform audio mixing processing on the first audio data and the second audio data to obtain target audio data;
an output unit 40, configured to output the target audio data to the target application, so that the target application outputs the target audio data.
The application discloses an audio processing method, the device is applied to a virtual collector, the virtual collector is a part of an electronic equipment operating system, a receiving unit receives an audio acquisition instruction, an acquisition unit and a processing unit respond to the received audio acquisition instruction, audio mixing processing is carried out on acquired first audio data and acquired second audio data, target audio data are acquired, an output unit outputs the target audio data to a target application, the target application outputs target audio, audio mixing output of at least two audio data through the virtual collector is achieved, the requirement that the target application needs to mix and output a plurality of audio data is met, and the processing effect of the audio data is improved.
In one embodiment, the acquisition unit 20 comprises:
a receiving subunit, configured to receive first audio data;
the first acquisition subunit is used for responding to a target object generating the first audio data to generate an audio selection instruction, and acquiring second audio data matched with the audio selection instruction in a local storage area of the electronic equipment.
In another embodiment, the obtaining unit includes:
a receiving subunit, configured to receive first audio data;
the analysis subunit is used for analyzing the first audio data;
and the second acquisition subunit is used for acquiring second audio data matched with the audio acquisition keywords if the analysis result comprises the audio acquisition keywords.
In one embodiment, the processing unit 30 includes:
the sampling subunit is used for respectively sampling the first audio data and the second audio data according to a target sampling rate;
and the synthesizing subunit is used for synthesizing the sampled first audio data and the sampled second audio data into target audio data, and the target audio data has the target sampling rate.
Optionally, the apparatus further comprises:
the filtering unit is used for filtering the sensitive words of the target audio data to obtain filtered target audio data;
and the sending unit is used for sending the filtered target audio data to a target application so that the target application outputs the filtered target audio data.
In one embodiment, the filter unit is specifically configured to:
acquiring attribute information of a receiver corresponding to the audio data output by the target application;
determining sensitive words matched with the attribute information;
and filtering the sensitive words of the target audio data to obtain filtered target audio data.
In another embodiment, the filtration unit is specifically configured to:
and deleting the audio clip corresponding to the sensitive word in the target audio data to obtain the deleted target audio data.
Optionally, the apparatus further comprises:
the associated information acquisition unit is used for acquiring associated information matched with the target audio data;
and the associated information filtering unit is used for deleting the associated information segment matched with the audio segment corresponding to the sensitive word to obtain the deleted associated information.
The embodiment of the application further provides the electronic device, and the technical scheme of the embodiment mainly performs mixing processing on the multiple paths of audio data, so that all users of the target application can acquire the mixed audio data. Specifically, the electronic device includes a memory and a virtual collector, the virtual collector is a part of an operating system of the electronic device, wherein,
the memory is used for storing an application program and data generated by the operation of the application program;
the virtual collector is used for executing the application program to realize that:
receiving an audio acquisition instruction of a target application;
responding to the audio acquisition instruction, acquiring first audio data and second audio data at least matched with the first audio data, wherein the first audio data is audio generated based on a target object received by a microphone of the electronic equipment, and the second audio data is audio generated by the electronic equipment;
performing sound mixing processing on the first audio data and the second audio data to obtain target audio data;
outputting the target audio data to the target application to cause the target application to output the target audio data.
The application discloses electronic equipment, the method is applied to a virtual collector, the virtual collector is a part of an operating system of the electronic equipment, a received audio acquisition instruction is responded, audio mixing processing is carried out on acquired first audio data and second audio data, target audio data are obtained, the target audio data are output to target application, the target application outputs target audio, at least two audio data are subjected to audio mixing output through the virtual collector, the requirement that the target application needs to mix and output a plurality of audio data is met, and the processing effect of the audio data is improved.
It should be noted that, for the specific implementation of the virtual collector in this embodiment, reference may be made to the corresponding contents in the foregoing, and details are not described here.
In an embodiment of the present application, a storage medium is further provided, where the storage medium stores computer program code, and the computer program code implements the audio processing method as described in any one of the above when executed by a processor.
It should be noted that, for the specific implementation of the storage medium in the present embodiment, reference may be made to the corresponding contents in the foregoing, and details are not described here.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An audio processing method is applied to a virtual collector, wherein the virtual collector is a part of an operating system of electronic equipment, and the method comprises the following steps:
receiving an audio acquisition instruction of a target application;
responding to the audio acquisition instruction, acquiring first audio data and second audio data at least matched with the first audio data, wherein the first audio data is audio generated based on a target object received by a microphone of the electronic equipment, and the second audio data is audio generated by the electronic equipment;
performing sound mixing processing on the first audio data and the second audio data to obtain target audio data;
outputting the target audio data to the target application to cause the target application to output the target audio data.
2. The method of claim 1, the obtaining first audio data and at least second audio data that matches the first audio data, comprising:
receiving first audio data;
and generating an audio selection instruction in response to the target object generating the first audio data, and acquiring second audio data matched with the audio selection instruction in a local storage area of the electronic equipment.
3. The method of claim 1, the obtaining first audio data and at least second audio data that matches the first audio data, comprising:
receiving first audio data;
analyzing the first audio data;
and if the analysis result comprises the audio acquisition keywords, acquiring second audio data matched with the audio acquisition keywords.
4. The method of claim 1, wherein the mixing the first audio data and the second audio data to obtain target audio data comprises:
respectively carrying out sampling processing on the first audio data and the second audio data according to a target sampling rate;
synthesizing the sampled first audio data and the sampled second audio data into target audio data, wherein the target audio data has the target sampling rate.
5. The method of claim 1, further comprising:
sensitive word filtering is carried out on the target audio data, and filtered target audio data are obtained;
and sending the filtered target audio data to a target application, so that the target application outputs the filtered target audio data.
6. The method of claim 5, wherein the sensitive word filtering the target audio data to obtain filtered target audio data comprises:
acquiring attribute information of a receiver corresponding to the audio data output by the target application;
determining sensitive words matched with the attribute information;
and filtering the sensitive words of the target audio data to obtain filtered target audio data.
7. The method of claim 5, wherein the sensitive word filtering the target audio data to obtain filtered target audio data comprises:
and deleting the audio clip corresponding to the sensitive word in the target audio data to obtain the deleted target audio data.
8. The method of claim 7, further comprising:
acquiring associated information matched with the target audio data;
and deleting the associated information segment matched with the audio segment corresponding to the sensitive word to obtain the deleted associated information.
9. An audio processing apparatus applied to a virtual collector, the virtual collector being a part of an operating system of an electronic device, the apparatus comprising:
the receiving unit is used for receiving an audio acquisition instruction of a target application;
the acquiring unit is used for responding to the audio acquiring instruction, acquiring first audio data and second audio data at least matched with the first audio data, wherein the first audio data is audio generated based on a target object received by a microphone of the electronic equipment, and the second audio data is audio generated by the electronic equipment;
the processing unit is used for carrying out sound mixing processing on the first audio data and the second audio data to obtain target audio data;
an output unit configured to output the target audio data to the target application so that the target application outputs the target audio data.
10. An electronic device comprising a memory and a virtual collector that is part of an operating system of the electronic device, wherein,
the memory is used for storing an application program and data generated by the operation of the application program;
the virtual collector is used for executing the application program to realize that:
receiving an audio acquisition instruction of a target application;
responding to the audio acquisition instruction, acquiring first audio data and second audio data at least matched with the first audio data, wherein the first audio data is audio generated based on a target object received by a microphone of the electronic equipment, and the second audio data is audio generated by the electronic equipment;
performing sound mixing processing on the first audio data and the second audio data to obtain target audio data;
outputting the target audio data to the target application to cause the target application to output the target audio data.
CN202110341744.XA 2021-03-30 2021-03-30 Audio processing method and device and electronic equipment Active CN113096674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110341744.XA CN113096674B (en) 2021-03-30 2021-03-30 Audio processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110341744.XA CN113096674B (en) 2021-03-30 2021-03-30 Audio processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113096674A true CN113096674A (en) 2021-07-09
CN113096674B CN113096674B (en) 2023-02-17

Family

ID=76671260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110341744.XA Active CN113096674B (en) 2021-03-30 2021-03-30 Audio processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113096674B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150381934A1 (en) * 2014-06-30 2015-12-31 Brother Kogyo Kabushiki Kaisha Teleconference Method, Storage Medium Storing Program for Teleconference, and Terminal Apparatus
CN105323534A (en) * 2014-07-14 2016-02-10 深圳市潮流网络技术有限公司 Conference processing method of third party application and communication equipment
CN109559763A (en) * 2017-09-26 2019-04-02 华为技术有限公司 A kind of method and device of real time digital audio signal audio mixing
CN109767777A (en) * 2019-01-31 2019-05-17 迅雷计算机(深圳)有限公司 A kind of sound mixing method that software is broadcast live
CN110534113A (en) * 2019-08-26 2019-12-03 深圳追一科技有限公司 Audio data desensitization method, device, equipment and storage medium
CN110826319A (en) * 2019-10-30 2020-02-21 维沃移动通信有限公司 Application information processing method and terminal equipment
CN111107442A (en) * 2019-11-25 2020-05-05 北京大米科技有限公司 Method and device for acquiring audio and video files, server and storage medium
CN112423009A (en) * 2020-11-09 2021-02-26 珠海格力电器股份有限公司 Method and equipment for controlling live broadcast audio

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150381934A1 (en) * 2014-06-30 2015-12-31 Brother Kogyo Kabushiki Kaisha Teleconference Method, Storage Medium Storing Program for Teleconference, and Terminal Apparatus
CN105323534A (en) * 2014-07-14 2016-02-10 深圳市潮流网络技术有限公司 Conference processing method of third party application and communication equipment
CN109559763A (en) * 2017-09-26 2019-04-02 华为技术有限公司 A kind of method and device of real time digital audio signal audio mixing
CN109767777A (en) * 2019-01-31 2019-05-17 迅雷计算机(深圳)有限公司 A kind of sound mixing method that software is broadcast live
CN110534113A (en) * 2019-08-26 2019-12-03 深圳追一科技有限公司 Audio data desensitization method, device, equipment and storage medium
CN110826319A (en) * 2019-10-30 2020-02-21 维沃移动通信有限公司 Application information processing method and terminal equipment
CN111107442A (en) * 2019-11-25 2020-05-05 北京大米科技有限公司 Method and device for acquiring audio and video files, server and storage medium
CN112423009A (en) * 2020-11-09 2021-02-26 珠海格力电器股份有限公司 Method and equipment for controlling live broadcast audio

Also Published As

Publication number Publication date
CN113096674B (en) 2023-02-17

Similar Documents

Publication Publication Date Title
CN104869467B (en) Information output method, device and system in media play
CN109951743A (en) Barrage information processing method, system and computer equipment
US11650790B2 (en) Centrally controlling communication at a venue
CN110390927B (en) Audio processing method and device, electronic equipment and computer readable storage medium
CN102170591A (en) Content playing device
CN109036374B (en) Data processing method and device
US20230005487A1 (en) Autocorrection of pronunciations of keywords in audio/videoconferences
CN102262344A (en) Projector capable of sharing images of slides played immediately
CN112688965B (en) Conference audio sharing method and device, electronic equipment and storage medium
CN113096674B (en) Audio processing method and device and electronic equipment
CN110149528B (en) Process recording method, device, system, electronic equipment and storage medium
JP2019176375A (en) Moving image output apparatus, moving image output method, and moving image output program
CN111798872B (en) Processing method and device for online interaction platform and electronic equipment
CN114694629A (en) Voice data amplification method and system for voice synthesis
CN116472705A (en) Conference content display method, conference system and conference equipment
JP4531013B2 (en) Audiovisual conference system and terminal device
CN113593568B (en) Method, system, device, equipment and storage medium for converting voice into text
Kemack Goot et al. A Spectrum of Online Rehearsal Applications: A Potential Means for Cultural Connection
CN111816183B (en) Voice recognition method, device, equipment and storage medium based on audio and video recording
US11830120B2 (en) Speech image providing method and computing device for performing the same
CN112992186B (en) Audio processing method and device, electronic equipment and storage medium
JP5326539B2 (en) Answering Machine, Answering Machine Service Server, and Answering Machine Service Method
KR20180099163A (en) Apparatus for expressing interactions corresponding to a user's situation based on internet and method for the same
JP2024031442A (en) Voice processing device, voice processing method, voice processing program, and communication system
CN117556066A (en) Multimedia content generation method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant