CN112863474A

CN112863474A - Real-time digital audio signal sound mixing method and device

Info

Publication number: CN112863474A
Application number: CN202110045076.6A
Authority: CN
Inventors: 张硕; 刘炜刚; 韩晓征
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-09-26
Filing date: 2017-09-26
Publication date: 2021-05-28
Also published as: WO2019062541A1; CN109559763A; CN109559763B

Abstract

The embodiment of the application discloses an audio signal processing method and a terminal, the method mixes at least one frame of audio data acquired by a media playing interface with a real-time audio signal originated from an audio input device in real time, the media playing interface acquires an audio file by taking a frame as a unit instead of acquiring the whole audio file, and the audio file can be flexibly switched to any frame of audio data of other audio files at any time through the media playing interface in the audio mixing process. And a data path in system software is used for transmitting at least one frame of audio data, and external environment noise is not introduced. The method also introduces an audio detection mechanism, can adaptively adjust the volume of at least one frame of audio data participating in the audio mixing, and improves the audio mixing experience.

Description

Real-time digital audio signal sound mixing method and device

Technical Field

The present application relates to the field of communications, and in particular, to a method and an apparatus for processing an audio signal.

Background

With the continuous development of mobile communication services and internet services, more and more users tend to record or share the stream of life in a short voice, small video and network video live broadcast manner, wherein the network video live broadcast is popular among internet users because of the advantages of intuition, rapidness, strong interactivity and the like.

In some audio recording scenes, in order to obtain an audio or video file with a specific effect, it is necessary to mix background music specified by a user into a captured voice file through a sound mixing technology, or to insert a specific accompaniment or song into a recorded video when performing video recording or network video live broadcasting. In the existing mixing technology, no matter analog mixing or digital mixing, the background audio files participating in mixing cannot be adjusted and modified in the mixing process, and the flexibility is poor.

Disclosure of Invention

The embodiment of the application provides an audio signal processing method and device, so as to realize flexible real-time digital audio mixing.

A first aspect of the application provides a method of audio signal processing, the method comprising: acquiring at least one frame of audio data through a media playing interface, wherein the media playing interface is an application programming interface; transmitting the at least one frame of audio data to a digital mixing module through a data path in system software; acquiring a real-time audio signal; and mixing the at least one frame of audio data with the real-time audio signal in a digital audio mixing module to obtain an audio-mixed audio signal.

The audio signal processing method mixes at least one frame of audio data acquired by the media playing interface with a real-time audio signal from the audio input device in real time, and the media playing interface acquires an audio file instead of acquiring the whole audio file by taking a frame as a unit, and can flexibly switch to any frame of audio data of other audio files at any time through the media playing interface in the audio mixing process, so that real-time digital audio mixing is realized. And a data path in system software is used for transmitting at least one frame of audio data, and external environment noise is not introduced.

In one possible design, before the at least one frame of audio data is transferred to the digital mixing module through a data path in the system software, the method further includes: the at least one frame of audio data is decoded from a compressed form to an original data form.

Optionally, a media play interface may be coupled to the music player. When the music player is started, the music playing flow of the music player is used for finishing the decoding of at least one frame of audio data, and when any frame of audio data reaches the digital mixing module, the audio data is in an original data form and is not compressed data, so that the audio data does not need to be decoded in the digital mixing module any more, and the efficiency of real-time mixing is improved.

In one possible design, the system software includes operating system software. Optionally, the system software may also include other driver software or platform software besides application software, such as open source system software, middleware, or widgets.

In one possible design, the data path includes at least one of: a track source node, an audio regulator, an audio hardware abstraction layer, or a hardware driver layer.

Optionally, the data path is a transmission path of audio data inside the operating system, and the audio track source node is a starting point of multiple audio data streams to multiple different audio tracks; the audio frequency controller is used for processing audio frequency data in the system software and managing audio frequency hardware equipment; the audio hardware abstraction layer is a software interface obtained by abstracting audio hardware equipment; the hardware driver layer is a driver of the audio hardware device.

In one possible design, during the process of transferring the at least one frame of audio data to the digital mixing module through the data path in the system software, the playing of the at least one frame of audio data is prohibited.

The playing flow of at least one frame of audio data in the method is different from the common playing flow, the at least one frame of audio data is not played outside after being output from the media player interface, but is directly sent to the digital audio mixing module, the decoding of the at least one frame of audio data is finished by means of the existing playing flow, and unnecessary background noise introduced by the playing outside is avoided.

In one possible design, inhibiting playback of the at least one frame of audio data includes at least one of: closing an audio output data stream of the at least one frame of audio data by an audio regulator in the data path; or controlling a hardware driver layer in the data path to disable an audio output device for the at least one frame of audio data based on an audio hardware abstraction layer in the data path.

In one possible design, prior to acquiring the real-time audio signal, the method further includes: detecting whether the real-time audio signal is input; when the real-time audio signal input is detected, the volume of the at least one frame of audio data is reduced.

In one possible design, prior to acquiring the real-time audio signal, the method further includes: and acquiring a real-time analog audio signal, and converting the real-time analog audio signal into the real-time audio signal.

The above design adaptively adjusts the volume of at least one frame of audio data participating in audio mixing based on the presence or absence of the real-time audio signal, so as to highlight the real-time audio signal.

In one possible design, reducing the volume of the at least one frame of audio data includes at least one of: reducing the volume of the at least one frame of audio data by controlling an audio regulator in the data path; or the volume of the at least one frame of audio data is reduced by controlling the digital mixing module.

In one possible design, before mixing the at least one frame of audio data with the real-time audio signal in the digital mixing module, the method further includes: the real-time audio signal is subjected to at least one of the following processes: signal aliasing cancellation, jitter cancellation, oversampling cancellation, noise suppression, echo cancellation, or gain control.

The real-time audio signals are processed in the above mode before sound mixing, the tone quality of the obtained real-time audio signals can be improved, noise contained in the real-time audio signals is reduced, unnecessary interference is prevented from being introduced in the sound mixing process, audio overflow in the sound mixing process can be avoided, and sound mixing audio distortion is avoided.

In one possible design, after obtaining the mixed audio signal, the method further includes: acquiring a video image signal; and mixing the video image signal and the audio mixing audio signal to obtain an audio mixing video signal.

In one possible design, after obtaining the mixed audio signal, the method further includes at least one of: playing the mixed audio signal, storing the mixed audio signal locally, and sending the mixed audio signal to other devices or uploading the mixed audio signal to a network.

The mixed audio signal can be played in real time after being obtained, and can also be stored for later playback or shared to other terminal users or internet users in real time.

In one possible design, the method further includes: and activating the digital sound mixing module through a digital sound mixing interface, wherein the digital sound mixing interface is an application programming interface.

A second aspect of the present application provides an apparatus for audio signal processing, the apparatus comprising: the system comprises a media playing interface, a data path positioned in system software, a real-time signal acquisition module and a digital audio mixing module; the media playing interface is used for acquiring at least one frame of audio data, and is an application programming interface; the data path is used for transmitting the at least one frame of audio data to the digital mixing module; the real-time signal acquisition module is used for acquiring a real-time audio signal; the digital audio mixing module is used for mixing the at least one frame of audio data with the real-time audio signal to obtain an audio mixing audio signal.

In one possible design, the apparatus further includes: a decoding module for decoding the at least one frame of audio data from a compressed form to an original data form before the data path delivers the at least one frame of audio data to the digital mixing module.

In one possible design, the data path is further configured to: and prohibiting the playing of the at least one frame of audio data in the process of transmitting the at least one frame of audio data to the digital mixing module.

In one possible design, an audio regulator in the data path to turn off an audio output data stream of the at least one frame of audio data; or an audio hardware abstraction layer in the data path for controlling a hardware driver layer in the data path to disable an audio output device for the at least one frame of audio data.

In one possible design, the apparatus further includes: an audio detection module to: before the real-time signal acquisition module acquires a real-time audio signal, detecting whether the real-time audio signal is input; and when the real-time audio signal input is detected, controlling to reduce the volume of the at least one frame of audio data.

In one possible design, the audio detection module is configured to perform at least one of: sending the control signal to the audio controller in the data path to decrease the volume of the at least one frame of audio data, for example, sending a control signal to decrease the volume to the audio controller; or sending the control signal to the digital mixing module to reduce the volume of the at least one frame of audio data, for example, sending the control signal to reduce the volume to the digital mixing module.

In one possible design, the apparatus further includes a preprocessing module to: the real-time audio signal is subjected to at least one of the following processes: signal aliasing cancellation, jitter cancellation, oversampling cancellation, noise suppression, echo cancellation, or gain control.

In one possible design, the apparatus further includes: a video processing module to: acquiring a video image signal; and mixing the video image signal and the audio mixing audio signal to obtain an audio mixing video signal.

In one possible design, the apparatus further includes at least one of the following modules: the playing module is used for playing the audio mixing audio signal; the storage module is used for storing the audio mixing audio signal; the transmitting module is used for transmitting the audio mixing audio signal to other devices; or the uploading module is used for uploading the mixed audio signal to a network.

In one possible design, the apparatus further includes: and the digital sound mixing interface is used for receiving the enabling information and forwarding the enabling information to activate the digital sound mixing module, and is an application programming interface.

A third aspect of the present application provides an apparatus for audio signal processing, the apparatus comprising: a processor and an audio processor; the processor is configured to read software instructions stored in the memory, execute the software instructions to perform operations comprising: acquiring at least one frame of audio data through a media playing interface, wherein the media playing interface is an application programming interface; transmitting the at least one frame of audio data to the audio processor via a data path in the system software; the audio processor is configured to: acquiring a real-time audio signal; and mixing the at least one frame of audio data with the real-time audio signal to obtain a mixed audio signal.

In one possible design, the apparatus further includes the memory.

In one possible design, the apparatus further includes: a decoder for decoding at least one frame of audio data from a compressed form to an original data form before the data path delivers the at least one frame of audio data to the audio processor.

In one possible design, the processor is configured to execute the software instructions to further perform the following: and prohibiting the playing of the at least one frame of audio data in the process of transmitting the at least one frame of audio data to the digital mixing module.

In one possible design, the processor is configured to execute the software instructions to further perform the following: closing an audio output data stream of the at least one frame of audio data by the audio modulator; or controlling a hardware driver layer in the data path to disable an audio output device of the at least one frame of audio data through the audio hardware abstraction layer.

In one possible design, the audio processor is further to: before acquiring a real-time audio signal, detecting whether the real-time audio signal is input; and when the real-time audio signal input is detected, controlling to reduce the volume of the at least one frame of audio data.

In one possible design, the audio processor is specifically configured to: sending the control signal to the audio controller in the data path to decrease the volume of the at least one frame of audio data, for example, sending a control signal to decrease the volume to the audio controller; or sending the control signal to a digital mixing module in the audio processor to reduce the volume of the at least one frame of audio data.

In one possible design, the audio processor is further to: the real-time audio signal is subjected to at least one of the following processes: signal aliasing cancellation, jitter cancellation, oversampling cancellation, noise suppression, echo cancellation, or gain control.

In one possible design, the processor is configured to execute the software instructions to further perform the following: acquiring a video image signal; and mixing the video image signal and the audio mixing audio signal to obtain an audio mixing video signal.

In one possible design, the processor is configured to execute the software instructions to further perform the following: playing the mixed audio signal, storing the mixed audio signal in the memory, and sending the mixed audio signal to other devices through a transmission interface or uploading the mixed audio signal to a network through a network interface. The network interface may be a wireless transceiver, a Radio Frequency (RF) circuit, or a wired interface, etc. The transmission interface may be an input/output interface.

A fourth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer or processor, cause the computer or processor to perform a method as set forth in the first aspect or any one of its possible designs above.

A fifth aspect of the present application provides a computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the method as set forth in the first aspect or any one of its possible designs above.

A sixth aspect of the present application provides an apparatus comprising a processor configured to read software instructions in a memory and execute the software instructions to implement a method as described in the first aspect above or any one of its possible designs.

Optionally, the apparatus further comprises said memory for storing said software instructions.

Drawings

Fig. 1 is a schematic structural diagram of an apparatus according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a peripheral device according to an embodiment of the present disclosure;

fig. 3 is a schematic view of an application scenario of a sound mixing technique according to an embodiment of the present application;

fig. 4 is a block diagram of a structure including a specific architecture of system software and corresponding hardware components according to an embodiment of the present application;

fig. 5 is a flowchart of a method for real-time digital audio mixing according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of another apparatus provided in an embodiment of the present application;

Detailed Description

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such as a list of steps or elements. A method, system, article, or apparatus is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, system, article, or apparatus.

Fig. 1 is a schematic diagram of an apparatus 100 according to an embodiment of the present disclosure, where the apparatus 100 may include an antenna system 110, the antenna system 110 may be one or more antennas, and the antenna system 110 may also be an antenna array composed of multiple antennas. The apparatus 100 may also include Radio Frequency (RF) circuitry 120, the RF circuitry 120 may include one or more analog RF transceivers, the RF circuitry 120 may also include one or more digital RF transceivers, and the RF circuitry 120 is coupled to the antenna system 110. It should be appreciated that in various embodiments of the present application, coupled refers to being interconnected in a particular manner, including being directly connected or indirectly connected through other devices, such as through various interfaces, transmission lines, buses, and the like. The rf circuit 120 may be used for various types of cellular wireless communications.

The apparatus 100 may also include a processing system 130, and the processing system 130 may include a communication processor operable to control the RF circuitry 120 to enable reception and transmission of signals, which may be voice signals, media signals, or control signals, through the antenna system 110. The communication processor in the processing system 130 may also be used to manage the above signals, and specifically, the signal management may include signal enhancement, signal filtering, coding, signal modulation, signal mixing, signal separation, or other various known signal processing procedures, as well as new signal processing procedures that may occur in the future. The Processing System 130 may include various general-purpose Processing devices, such as a general-purpose Central Processing Unit (CPU), a System On Chip (SOC), a processor integrated on the SOC, a separate processor Chip or controller, etc.; the processing system 130 may also include a special purpose processing device, such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), or the like. The processing system 130 may be a processor complex of multiple processors coupled to each other via one or more buses. The processing system may include an Analog-to-Digital Converter (ADC), a Digital-to-Analog Converter (DAC) to realize signal connection between different components of the apparatus, for example, an Analog voice signal collected by a microphone may be converted into a Digital voice signal by the ADC and transmitted to a Digital signal processor for processing, or a Digital voice signal in the processor may be converted into an Analog voice signal by the DAC and played through a speaker. The processing system may include a media processing system 131, the media processing system is configured to implement processing of media signals such as images, audio and video, and specifically, the media processing system 131 may include a sound processing system 132, and specifically, the sound processing system 132 may be a general-purpose or special-purpose sound processing device, such as an audio processing subsystem integrated on an SOC, or a sound processing module integrated on a processor chip, and optionally, the sound processing module may be a software module or a hardware module, and the sound processing module may also be a self-contained sound processing chip, and the sound processing system 132 is configured to implement related processing of audio signals.

The apparatus 100 may also include a memory 140, the memory 140 coupled to the processing system 130, and in particular, the memory 140 may be coupled to the processing system 130 through one or more memory controllers. The memory 140 may be used to store computer program instructions, including a computer Operating System (OS) and various user applications, such as audio processing programs with mixing functionality, applications with live broadcast functionality, video players, music players, and possibly other applications; the memory 140 may also be used to store user data such as calendar information, contact information, captured image information, audio information or other media files, and the like. Processing system 130 may read computer program instructions or user data from memory 140 or store computer program instructions or user data to memory 140 to implement the associated processing functions. For example, the audio file stored in the memory 140 may be read by the processing system and played by a music player, or the audio file in the memory 140 may be read into the processor for a series of operations such as decoding, mixing, encoding, and the like. The Memory 140 may be a nonvolatile Memory, such as an EMMC (Embedded multimedia Card), a UFS (Universal Flash Storage) or a Read-Only Memory (ROM), or other types of static Storage devices capable of storing static information and instructions, a nonvolatile Memory, such as a Random Access Memory (RAM), or other types of dynamic Storage devices capable of storing information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM), or other optical Disc Storage, optical Disc Storage (including Compact Disc, laser Disc, digital versatile Disc, optical Disc, etc.), a Storage medium, or other magnetic Storage devices, Or any other computer-readable storage medium that can be used to carry or store program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 140 may be stand-alone or the memory 140 may be integrated with the processing system.

The apparatus 100 may further include a Wireless transceiver 150, where the Wireless transceiver 150 may provide Wireless connection capability to other devices, and the other devices may be peripheral devices such as a Wireless headset, a bluetooth headset, a Wireless mouse, a Wireless keyboard, and may also be a Wireless Network, such as a Wireless Fidelity (WiFi) Network, a Wireless Personal Area Network (WPAN), or other Wireless Local Area Networks (WLAN). The wireless transceiver 150 may be a bluetooth compatible transceiver for wirelessly coupling the processing system 130 to a peripheral device such as a bluetooth headset, wireless mouse, etc., or a WiFi compatible transceiver 150 for wirelessly coupling the processing system 130 to a wireless network or other device.

The apparatus 100 may also include audio circuitry 160, the audio circuitry 160 coupled with the processing system 130. The audio circuitry 160 may include a microphone 161 and a speaker 162, the microphone 161 may receive sound input from the outside, the sound input may be user speech input, music input from outside, noise input, or other forms of sound from the outside, the microphone 161 may be a built-in microphone integrated with the apparatus 100, or may be an external microphone coupled to the apparatus 100 via an interface, such as a headphone microphone coupled to the apparatus via a headphone interface; the speaker 162 may enable playback of audio data from a microphone, or a music file stored in memory or an audio file processed by the processing system, wherein the speaker is one type of audio transducer, may enable enhancement of audio signals, and may be replaced with other types of audio transducers. It should be understood that the device 100 may have one or more microphones and one or more earphones, and the number of the microphones and the earphones is not limited in the embodiments of the present application. The processing system 130 drives or controls the audio circuit through an audio controller (not shown in fig. 1), specifically, enables or disables at least one of the microphone or the speaker according to instructions of the processing system 130, enables the microphone through related control instructions when it is required to receive an audio signal from the microphone, and receives an audio signal input by the microphone, which can be processed in the processing system 130, stored in the memory 140, or played by the speaker, and transmitted to a network or other device through the antenna system 110 through the RF circuit 120, or transmitted to the network or other device through the wireless transceiver 150; when the audio file needs to be played, the processing system enables the loudspeaker through the related control instruction to realize the playing of the audio signal. Correspondingly, when the microphone and speaker are not needed, the microphone and speaker are disabled by the associated control instructions.

The device 100 may also include a display screen 170 for displaying information entered by the user, various menus of information provided to the user, which menus are associated with particular modules or functions within, and the display screen 170 may also accept user input, such as accepting control information, such as enabling or disabling. Specifically, the display screen 170 may include a display panel 171 and a touch panel 172. The Display panel 171 may be configured by a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), a Light-Emitting Diode (LED) Display device, a Cathode Ray Tube (CRT), or the like. The touch panel 172, also referred to as a touch screen, a touch sensitive screen, etc., may collect contact or non-contact operations (e.g., operations performed by a user on or near the touch panel 172 using any suitable object or accessory such as a finger, a stylus, etc., and may also include body sensing operations; including single-point control operations, multi-point control operations, etc.) on or near the touch panel 172, and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 172 may include two parts, a touch detection device and a touch controller. The touch detection device detects a signal brought by touch operation of a user and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into information that can be processed by the processing system 130, sends the information to the processing system 130, and receives and executes commands sent by the processing system 130. Further, the touch panel 172 may cover the display panel 171, a user may operate on or near the touch panel 172 covered on the display panel 171 according to the content displayed on the display panel 171 (the display content includes, but is not limited to, a soft keyboard, a virtual mouse, virtual keys, icons, etc.), the touch panel 172 detects the operation on or near the touch panel 172, and transmits the operation to the processing system 130 through the I/O subsystem 10 to determine a user input, and then the processing system 130 provides a corresponding visual output on the display panel 171 through the I/O subsystem 10 according to the user input. Although in fig. 1 the touch panel 172 and the display panel 171 are shown as two separate components to implement the input and output functions of the device 100, in some embodiments the touch panel 172 may be integrated with the display panel 171 to implement the input and output functions of the device 100. In the following description, with reference to the digital mixing operation, a user touches an enable button associated with the digital mixing module on a display screen, a touch detection device detects an enable signal caused by the touch operation and transmits the enable signal to a touch controller, the touch controller converts the enable signal into information that can be processed by a processor and transmits the information to the processor in the processing system through a display controller 13 in the I/O subsystem 10, the processor activates the digital mixing module and completes the digital mixing operation after receiving the enable signal, transmits the processed mixed audio to a player for playing, and displays related information of the mixed audio playing on the display screen through the display controller 13 in the I/O subsystem 10, where the related information may include information such as playing time, real-time lyrics, and the like.

The apparatus 100 may also include one or more sensors 180 coupled with the processing system 130, which sensors 180 may include image sensors, motion sensors, proximity sensors, ambient noise sensors, sound sensors, accelerometers, temperature sensors, gyroscopes, or other types of sensors, as well as various forms of combinations thereof. The processing system 130 drives the sensor 180 through the sensor controller 12 in the I/O subsystem 10 to receive various information such as audio signals, image signals, motion information, etc., and the sensor 180 transmits the received information to the processing system 130 for processing.

The apparatus 100 may also include other input devices 190 coupled to the processing system 130 to receive various user inputs, such as to receive entered numbers, names, addresses, and selections of media, such as music or other audio files, video files in various video formats, still and motion pictures, etc., and other input devices 190 may include a keyboard, physical buttons (push buttons, rocker buttons, etc.), dials, slide switches, a joystick, a click wheel, a light mouse (a light mouse is a touch-sensitive surface that does not display visual output, or is an extension of a touch-sensitive surface formed by a touch screen), etc.

The apparatus 100 may also include the I/O subsystem 10 described above, the I/O subsystem 10 may include other input device controllers 11 for receiving signals from other input devices 190 or transmitting control or drive information for the processing system 130 to other input devices 190, the I/O subsystem 10 may also include the sensor controller 12 and the display controller 13 described above for enabling the exchange of data and control information between the sensors 180 and the display screen 170, respectively, and the processing system 130.

The device 100 may also include a power source 101, which may be a rechargeable or non-rechargeable lithium-ion battery or nickel-metal hydride battery, to provide power to other components of the device 100, including the

battery pack

110 and 190. Further, when the power source 101 is a rechargeable battery, it may be coupled to the processing system 130 via a power management system, so as to implement functions of managing charging, discharging, and power consumption adjustment via the power management system.

Although not shown, the apparatus 100 may further include a camera for acquiring an image of a single frame or a video image of a plurality of consecutive frames according to an operation mode of the apparatus 100, and transmitting the acquired image information to the processing system 130 for processing, specifically, the processing system 130 may integrate an image processing unit therein, or include a separate image processor or image processing chip; optionally, the processing system 130 may further include a Video Codec module for fusing the image signal and the audio signal to obtain a Video signal, and when the apparatus operates in the Video shooting mode, the Video signal with sound is obtained by fusing the Video image of the consecutive frames acquired by the camera and the audio signal acquired by the audio circuit 160 in the Video Codec. When the mixed Video needs to be recorded, the Video images of continuous frames acquired by the camera and the mixed audio signal acquired by the digital mixed module are fused in the Video Codec to obtain the mixed Video signal. Optionally, the Video Codec may be a processing module integrated in the processing system 130, or may be a Video Codec chip existing separately.

It should be understood that the apparatus 100 of fig. 1 is merely an example, the specific configuration of the apparatus 100 is not limited, and the apparatus 100 may include other components not shown in fig. 1, which may be added in the future or in the present.

In an alternative, the RF circuit 120, the processing system 130 and the memory 140 may be partially or completely integrated on one chip, or may be three chips independent of each other. RF circuitry 120, processing system 130, and memory 140 may include one or more integrated circuits disposed on a Printed Circuit Board (PCB).

Fig. 2 is an example of a peripheral device 200 according to an embodiment of the present application, where the peripheral device 200 exchanges data and signals with the apparatus 100, and the peripheral device may include a processor 210, a microphone 220, an audio transducer 230, a wireless transceiver 240, and one or more sensors 250.

Wherein the processor 210 is coupled to the microphone 220 and the audio transducer 230, the processor 210 can drive or disable the microphone 220 and the audio transducer 230, the processor 210 can accept audio data from the microphone 220 or transmit audio data to the audio transducer 230, in particular, the processor 210 can be coupled to an audio codec to enable signal exchange with the microphone 220 and the audio transducer 230, in particular, the audio transducer 230 can be a speaker.

The wireless transceiver 240 is also coupled to the processor 210, and the wireless transceiver 240 is described with reference to the wireless transceiver 150 in fig. 1, and the peripheral device 200 is wirelessly connected to the apparatus 100 or other devices with similar functions via the wireless transceiver 240, for example, the peripheral device 200 can transmit signals to the apparatus 100 or receive signals from the apparatus 100 via the wireless transceiver 240 under the control of the processor 210.

The processor 210 is coupled to one or more sensors 250, the sensors 250 being operable to detect data information such as user activity, various environmental information, etc., the sensors 250 being partially or wholly identical to the sensors 180 of the device 100, the sensors 250 passing the detected data to the processor 210 for processing.

The peripheral device 200 shown in fig. 2 may be a wireless headset, and the peripheral device 200 may exchange data with the apparatus 100 through a bluetooth interface, and it should be understood that although fig. 2 shows a specific form of peripheral device, this is not a limitation on the form of peripheral device, and in some alternatives, the peripheral device 200 may also be a wired headset, a wired keyboard, a wireless keyboard, a wired or wireless cursor control device, and other wired or wireless input or output devices.

The system of apparatus 100 and peripheral device 200 may be a variety of types of data processing systems, for example, an audio data processing system, an image data processing system, a temperature data processing system, or other data processing system.

Fig. 3 is a schematic view of an application scenario of a mixing technique provided in an embodiment of the present application, where 310 is a terminal including a mixing module, the terminal 310 may be a specific form of the apparatus 100 shown in fig. 1, and the terminal 310 includes all or part of the structure of the apparatus 100; 311 is a microphone, which may be a microphone carried by the terminal 310, or may be a separate microphone, and as a peripheral device 200, data exchange is realized with the terminal 310 through wired communication or wireless communication, for example, the microphone may be a microphone of a wireless earphone, or a microphone on a wired earphone, or may be a sound-sensitive device or an external sound card with a sound collection function, or the like.

In the application scene, the user can realize the scene recording, and the voice content shared by the user is fused with the specific background music, so that the interest of the shared content is improved, and optionally, the voice content shared by the user can be song singing, talk show, voice lesson or other forms of voice sharing. The terminal 310 mixes the user voice collected by the microphone with the preset background music to obtain the audio-mixed audio, and optionally, the user can play the obtained audio-mixed audio locally in real time through the loudspeaker 312; optionally, the user may also uplink the obtained mixed audio data to the network 320 based on the wireless communication technology, and share the mixed audio data to a plurality of internet users through the network. An internet user can acquire the mixed audio from the network 320 through various devices with wireless communication functions, as shown in fig. 3, the listener 1 acquires the mixed audio data transmitted downstream through the network through the smartphone 330; illustratively, listener 2 acquires the mixed audio data transmitted downstream from the network via a desktop computer 340 connected to the network; illustratively, the listener 3 acquires the mixed audio data transmitted downstream through the network via the notebook computer 350. It should be understood that there may be a greater number of listeners simultaneously acquiring mixed audio data shared by the user on the network, and only 3 listeners are shown in fig. 3 as an example. It should be understood that the device used by the internet user may also be an audio player, a car device, a wearable device, a tablet, a bluetooth headset, a radio, etc. with wireless communication capabilities. Alternatively, the user may store the mixed audio data in a storage device of the local terminal after obtaining the mixed audio data, for example, the storage device may be the memory 140 of the apparatus 100, so that the user can play back or share the stored mixed audio data when necessary.

Optionally, in the application scenario, the user can realize network video live broadcast, the live video content can be dance performance, handicraft exhibition and explanation, online etiquette teaching and other content capable of being exhibited through video, and the user can mix different background music according to the video content style to improve the interestingness and entertainment of the video content. The terminal 310 obtains image information of a user through a camera (not shown in fig. 3), obtains voice information of the user through a microphone 311, mixes the voice of the user collected by the microphone with background music specified by the user in a mixing module of the terminal 310 to obtain mixed audio, and further fuses the image information obtained by the camera with the obtained mixed audio information in a Video coding and decoding Video Codec module (not shown in fig. 3) of the terminal 310 to obtain mixed Video data. The user uplinks the obtained mixed video data to the network 320 based on the wireless communication technology, and the internet user can acquire the mixed video from the network 320 through various devices with wireless communication function and video playing function and watch the mixed video. Optionally, the user may watch the effect of the mixed video through the local terminal 310; alternatively, the user may save the recorded mixed video in a storage device of the terminal 310 for subsequent playing and sharing.

It should be understood that the wireless Communication technology used in the application scenario described above may be various technologies capable of providing various cellular wireless Communication services such as voice call, video, data, broadcast, or others, and may be, for example, Global System for Mobile Communication (GSM) technology, Code Division Multiple Access (CDMA) technology, Wideband Code Division Multiple Access (WCDMA) technology, Long Term Evolution (Long Term Evolution, LTE) technology, future fifth Generation (5th Generation, 5G) Mobile Communication technology, or future Evolution Public Land Mobile Network (PLMN) technology, and the type of the cellular wireless Communication technology is not limited in the present embodiment.

It should be understood that the method for obtaining the background music specified by the user in the above scenario is not limited, and alternatively, the background music specified by the user may be put out into the environment and enter the mixing module of the terminal 310 through the microphone as the user voice. Alternatively, the user-specified background music may be read directly from the local memory of the terminal without going through the play-out. Optionally, the cached background music may also be obtained from the network.

Fig. 4 is a block diagram of a structure including a specific system software architecture and corresponding hardware components, where the specific system software architecture and the corresponding hardware components are capable of implementing a real-time digital audio mixing method according to an embodiment of the present disclosure. As shown, the device 410 is an Application Processor (AP) in the processing system 130, or an Application chip, the Application Processor is used for running system software, for example, the Application Processor is used for running operating system software, for example, the operating system software may be at least one of an Android system, an iOS system, or a Linux system; the system software may also include other driver software or platform software besides application software (also called application), such as open source system software, middleware, or widgets. Optionally, the application processor may be further configured to run code related to an application program, where the application program may be a webcast platform, an audio player, a video player, a beauty camera, an application with a communication capability, or the like; optionally, the application processor supports application extensions; optionally, the application processor may be configured to run user interface related code. The memory 420 may be memory 140 that may be used to store operating system software running in the application processor, application software, user interface related software, or other computer program code that may be executed by the application processor, and may also store local audio files, local video files, a user phonebook or other user data. The touch screen 430 may be the display screen 170, and the touch screen 430 may be used to display various menus and function icons of the apparatus, which are associated with specific modules or functions inside the apparatus, for example, icons associated with an audio player APP, an icon associated with a video player APP, an icon associated with a beauty camera APP, a live web platform APP, or icons associated with other applications, and the user runs the corresponding module by touching the relevant icon located on the touch screen. The sound processing module 440 may be configured to implement audio-related processing, for example, an independent hardware processor that may be configured to implement real-time mixing of multiple audio signals, where the sound processing module 440 may specifically be a High Fidelity (High Fidelity, High fi) device, the sound processing module 440 may be an independent sound processing chip, or may be a processing module integrated in an application processor, or a processing module integrated in a processor chip outside the application processor, or the sound processing module 440 may be implemented by software executed by the application processor, which is not limited in this embodiment. For example, the sound processing module 440 may be implemented by software, and in this case, it is a processing unit formed by an application processor executing instructions of driver software, similar to an application layer or an application framework layer formed by the application processor. Optionally, the sound processing module 440 includes a digital mixing module 441, which is configured to implement a real-time digital mixing algorithm; optionally, the sound processing module 440 further includes a pre-processing module 442 for performing acoustic processing on the audio signal, and optionally, the pre-processing module 442 may perform processing including signal aliasing cancellation, jitter removal, oversampling removal, noise suppression, echo cancellation, gain control, or other acoustic processing algorithms. The audio codec 450 is configured to perform at least one of analog-to-digital conversion or digital-to-analog conversion on an audio signal, for example, the audio codec 450 may convert audio data processed by the sound processing module into an analog signal and play the analog signal through the speaker 470, a wired headset or a bluetooth headset (not shown in fig. 4), and optionally, the audio codec 450 may convert an analog audio signal collected by the microphone 460 or other audio input device into a digital audio signal and transmit the digital audio signal to the sound processing module or other processor for various audio processing; the audio codec 450 may be a separate audio codec chip, or an audio codec module integrated in a processor chip or a HiFi chip.

Alternatively, the application processor 410, the sound processing module 440, and the audio codec 450 may collectively form the sound processing system 132 of fig. 1 for implementing various forms of audio signal processing procedures.

As mentioned above, the application processor 410 may be used for operating system software, and a specific application framework of the operating system software of the audio-video architecture is shown in fig. 4, for example, the application framework is an Android application framework. As shown in fig. 4, the application framework includes:

and an Application (APP) layer, wherein the APP layer is positioned at the uppermost layer of the whole audio and video software architecture. Optionally, the layer is implemented based on a Java architecture. The APP layer comprises an audio related Application Programming Interface (API), a video related API or other types of APIs, the API is bound with an internal specific Application program APP, and the API sends control parameters to call the corresponding APP or receive a return value of the APP.

And applying a Framework layer, wherein the Framework layer is a logic scheduling layer of the whole audio and video software architecture. The strategy control center is a strategy control center of the whole audio and video software architecture, and can carry out scheduling and strategy distribution on the whole audio and video processing process. The layer also includes some API interfaces for implementing audio and video data stream processing and control of audio and video hardware devices, and optionally, the core architecture of the layer is formed by at least one of Java, C + +, or C.

A Hardware Abstraction Layer (HAL), which is an interface Layer between the audio/video architecture operating system software and the audio/video Hardware device. Which provides an interface for interaction between the upper layer software and the lower layer hardware. The HAL layer abstracts the underlying hardware into software including corresponding hardware interfaces, and the setting of the underlying hardware devices can be realized by accessing the HAL layer, for example, the related hardware devices can be enabled or disabled in the HAL layer, and optionally, the core architecture of the HAL layer is formed by at least one of C + + or C.

The Kernel layer includes a hardware driver layer, and is configured to implement direct control over a bottom-layer hardware device according to control information input by the hardware abstraction layer, for example, drive or disable the hardware device, and optionally, a Kernel architecture of the Kernel layer is formed by at least one of C + + or C.

The application processor 410 and the sound processing module 440 implement interaction of data and control information through a relay communication layer, specifically, the relay communication layer may be a MailBox (MailBox) communication mechanism, and implement interaction between system software or application software of the application processor 410 and the sound processing module 440. When the sound processing module 440 is formed of software instructions run by the application processor 410, the MailBox is an interface of the sound processing module 440 and the system software. When the sound processing module 440 is a separate hardware or software outside the application processor 410, the MailBox may be an interface including hardware. In a typical case, the sound processing module 440 is a separate piece of hardware, such as a co-processor, microprocessor or logic circuit, that performs a different function than the application processor.

Fig. 5 is a logic block diagram of an apparatus for implementing the real-time digital mixing method according to an embodiment of the present application, and fig. 6 is a flowchart of the method for real-time digital mixing. The method of real-time digital mixing in fig. 5 is explained based on the operating system software architecture and corresponding hardware components shown in fig. 4 and the apparatus shown in fig. 6. For ease of understanding, the embodiments of the present application describe a method for real-time digital mixing in the form of steps, and although the order of the method is shown in the method flowchart of fig. 5, in some cases, the described steps may be performed in an order different from that here. It should be understood that the block diagram of fig. 4 and the apparatus of fig. 6 are not limited to each other.

The real-time digital sound mixing method comprises the following steps: step 501, at least one frame of audio data is obtained through a media playing interface. The at least one frame of audio data may be already existing audio data including background sound, music, or accompanying music, etc.

The media playing interface corresponds to the media playing interface 610 of the device 600, and the media playing interface is specifically an application programming interface API, and is located in the application layer of the application processor 410 in fig. 4, and optionally, the media playing interface may be specifically an audio player API, a video player API, a live network platform API, or another application API with an audio/video playing function. Taking the structure shown in fig. 4 as an example, in an alternative scheme, when real-time mixing is required, an audio player function icon located on the touch screen 430 is touched, where the function icon is associated with the audio playing interface API to enable the audio playing APP to be called through the audio playing interface to read at least one frame of audio data, where the at least one frame of audio data may be at least one frame of audio data in a local audio file stored in a memory (for example, the memory 140 in the apparatus 100 or the memory 420 in the block diagram of the structure in fig. 4), and where the at least one frame of audio data may also be at least one frame of audio data in an audio file cached or downloaded from the internet.

In the method, the audio file is acquired by calling the media player in a frame unit, and the audio file is flexibly processed through the playing process, for example, the audio file can be flexibly switched to any frame of audio data of other audio files at any time through the media playing interface, or the volume of at least one frame of audio data is adjusted in the playing process. The playing flow herein specifically includes: the entire flow of calling an audio file from a media player and ultimately playing the audio file through a speaker or other audio output device.

Optionally, the method for real-time digital mixing includes step 502, decoding at least one frame of audio data from a compressed state to an original data form.

Decoding at least one frame of audio data of the called audio file from a compressed state to an original data form is performed in a playing process, optionally, this step may be performed by a decoding module 650 of the apparatus 600, optionally, this decoding module 650 may be the audio codec 450 in fig. 4; optionally, the decoding module 650 may be an audio codec of the media player, wherein the audio codec of the media player may be implemented by a software module or hardware; optionally, the decoding module 650 may be a separate audio codec chip, or an audio codec module integrated in a processor chip or a HiFi chip. Alternatively, the audio data in the form of raw data may be a Pulse Code Modulation (PCM) data stream. Alternatively, the Audio data in the compressed state may be data compressed by using techniques such as microsoft Audio format (WMA), Adaptive Predictive Encoding (APE), Free Lossless Audio Coding (FLAC), motion Picture Experts Group Audio Layer 3 (MP 3), lossy compression, and Lossless compression. And the audio data in the original data form is a decoding result obtained by decoding the audio data in the compressed state by adopting a related decoding technology.

The real-time digital mixing method includes step 503, transferring at least one frame of audio data to the digital mixing module through a data path in the system software. Optionally, the data path may be data path 620 of device 600. Optionally, the system software may be operating system software, for example, an Android operating system, a Linux operating system, an iOS system, or other types of operating system software. The data path is a transmission path of audio data inside the operating system, where the data path runs through the whole architecture of the operating system software, and specifically, the data path may include at least one of the following: a track source node 621, an audio tuner 622, an audio hardware abstraction layer 623, or a hardware driver layer 624.

After obtaining at least one frame of audio data of the audio file, the at least one frame of audio data is transferred from the media player interface of the application layer to the track source node of the application framework layer, and optionally, the at least one frame of audio data arriving at the track source node is in the form of original data, for example, the at least one frame of audio data may be a PCM data stream. As shown in fig. 4, the track source node is located in a Framework layer of the operating system software, and specifically, the track source node is an AudioTrack interface in an Audio system. The AudioTrack interface is an API interface externally provided by the Audio system. The AudioTrack is a source node of multiple audio tracks or may be referred to as a starting point of multiple audio tracks. Audio data with different parametric characteristics are aggregated at the AudioTrack interface. The AudioTrack selects different tracks for the audio data according to the parametric characteristics of the audio data, the tracks being audio standards with fixed parametric characteristics. The AudioTrack may implement output of audio data on the operating system platform, and optionally, the parameter characteristics of the audio data may include a sampling rate, a bit depth, a number of channels, a type of an audio stream, and the like.

At least one frame of audio data streamed from the track source node 621 arrives at the audio tuner 622, which is located at the Framework layer of the operating system software, as shown in fig. 4. Specifically, the Audio controller is an Audio flag, which is a working engine of the Audio system. The AudioFlinger manages all the input and output audio streams of the audio system and can control the reading and writing of the underlying hardware devices. For example, the AudioFlinger may adjust the volume of the audio data or close the audio data output stream by the AudioFlinger to prohibit the audio data from reaching the underlying hardware device, which may optionally be the speaker 162 of the apparatus 100, or may be the speaker 470 in fig. 4, or may be another audio output device. In an alternative scheme, when the audio data which does not want to go through the playing process is played to the outside through the audio output device, the audio data output stream can be closed through the AudioFlinger to prohibit the audio data from reaching the underlying hardware device.

At least one frame of audio data passes through the audio tuner 622 before reaching the audio hardware abstraction layer 623, which is located at the HAL layer, as shown in fig. 4. Specifically, the Audio hardware abstraction layer is an Audio HAL, the Audio HAL is a software abstraction for underlying Audio hardware devices, and each underlying Audio hardware device has a corresponding software interface in the Audio HAL layer. Through which the corresponding audio hardware device can be controlled, for example certain audio hardware devices can be enabled or disabled.

At least one frame of audio data passes through the audio hardware abstraction layer 623 before reaching the hardware Driver layer Driver 624, which is the direct executor of control actions. The control command for the underlying hardware device at the Audio HAL layer is implemented by a Driver, for example, a "drive speaker operation" is set in the Audio HAL layer, and the Driver performs driving of the speaker under the control command of the "drive speaker operation". In an alternative scheme, when the Audio data that is not desired to go through the playing process is played to the outside through the Audio output device, a "disable speaker" may be set in the Audio HAL layer, and the Driver disables the speaker under the control command of the "disable speaker".

At least one frame of audio data from the application layer of the os software architecture to the digital mixing module further needs to pass through a relay communication layer, as shown in fig. 4, the relay communication layer may be a mailbox communication mechanism for implementing interaction of data or control information between the system software or application software of the application processor and the sound processing module.

Optionally, the real-time digital mixing method may include steps 504 to 506. Wherein, step 504: detecting whether a real-time audio signal is input, and executing step 505 when the real-time audio signal is input; when no real-time audio signal input is detected, step 506 is performed.

When mixing two audio signals, one of the audio signals is usually desired to be emphasized, and in the real-time digital mixing method provided by the embodiment of the application, the real-time audio signal is emphasized, so that a better mixing experience can be obtained when at least one frame of audio data acquired through the media playing interface is weakened.

In an alternative, this step can be performed by the audio detection module 660 of the apparatus 600, and the audio detection module 660 can be a software module or a hardware circuit integrated in the processor or the sound processing module, or can be a separate chip. In an alternative scheme, an audio Detection module may be added to the sound processing module 440 in fig. 4, and after the digital mixing module is activated through the digital mixing interface, whether a real-time audio signal is input to the audio input device may be detected based on the audio Detection module, and optionally, the audio Detection module may be a Voice Activity Detection (VAD) module.

Step 505: reducing the volume of the at least one frame of audio data.

Step 506, increasing the volume of the at least one frame of audio data.

Specifically, when the audio detection module detects that a real-time audio signal is input, the audio detection module 660 sends a control signal to the audio tuner 622 in the data path 620 to decrease or increase the volume of the at least one frame of audio data. Optionally, the audio tuner 622 is an AudioFlinger, and the AudioFlinger decreases or increases the volume of the at least one frame of audio data after receiving the control signal.

In an alternative scheme, the audio detection module 660 sends control information to the digital mixing module 640 to decrease or increase the volume of the at least one frame of audio data. Alternatively, the volume of the at least one frame of audio data may be decreased by changing a volume-related variable in the digital mixing module 660, for example, by decreasing the volume-related variable, and the volume of the at least one frame of audio data may be increased by increasing the volume-related variable. Optionally, the digital mixing module may also be the digital mixing module 441 in fig. 4.

The real-time digital audio mixing method includes step 507, acquiring a real-time audio signal. The real-time audio signal may be a digital audio signal obtained after processing sounds from humans or nature.

Optionally, the real-time audio signal may be acquired by the real-time signal acquisition module 630 of the apparatus 600. Optionally, the real-time signal obtaining module 630 may be an interface for receiving a real-time audio signal sent by another device, where the real-time audio signal may be a digital signal that is processed or not processed by at least one of the following: signal aliasing cancellation, jitter cancellation, oversampling cancellation, noise suppression, echo cancellation, or gain control. It should be understood that real-time herein means that there is no time delay between the sound source emitting sound and the acquisition of the sound emitted by the sound source, or that there is negligible little time delay.

Optionally, before acquiring the real-time audio signal, the method for real-time digital mixing further includes: a real-time analog audio signal is acquired. Optionally, the apparatus 600 may include an audio input device and an analog-to-digital converter ADC, where the real-time analog audio signal may be obtained through the audio input device, and further, the real-time analog audio signal may be obtained through ADC conversion. Optionally, the audio input device may be a microphone of the apparatus itself, a sound-sensitive device, or other devices with sound collection function, such as the microphone 161 in fig. 1, the microphone 220 in fig. 2, or the microphone 460 in fig. 4; alternatively, the audio input device may be an additional device shown in fig. 2, for example, a wireless headset or a wired headset; alternatively, the audio input device may be a chip with sound collection capability, or may be an audio Codec (Codec) connected to a microphone, sound sensitive device, or other device with sound collection capability.

The real-time digital audio mixing method includes step 508 of mixing at least one frame of audio data with a real-time audio signal in a digital audio mixing module to obtain a mixed audio signal. For example, the at least one frame of audio data may be background music, voice-over, accompaniment music, etc., and the real-time audio signal may be sound from human or natural sources, thereby realizing audio mixing.

Optionally, the digital mixing module may be the digital mixing module 640 of the apparatus 600; alternatively, the digital mixing module may be the digital mixing module 441 in the sound processing module 440 shown in fig. 4; optionally, the digital mixing module may be a software module, for example, a function; optionally, the digital mixing module may also be implemented by hardware logic; optionally, the digital mixing module may be a separate hardware, for example, a coprocessor, a microprocessor, or other processor core. In an optional scheme, when the digital audio mixing module is a software module, a new API interface corresponding to the digital audio mixing module may be added to an application layer of an audio/video software architecture, and interaction with the internal digital audio mixing module is implemented through the digital audio mixing interface. For example, the API interface may be a digital mixing interface, and may send control information (as indicated by a downward dotted arrow in fig. 4) to the digital mixing module through the digital mixing interface to control the digital mixing module, or the digital mixing interface may receive audio data from the digital mixing module (as indicated by an upward solid arrow from the digital mixing module to the digital mixing interface in fig. 4), specifically, the audio data may be a mixed audio signal obtained by mixing in real time, and specifically, the control information may include digital mixing module enable information or digital mixing module disable information, and the like. Optionally, the audio source management interface shown in fig. 4 may be an AudioRecord, where the AudioRecord manages audio sources, and is responsible for collecting audio data on an operating system platform (for example, an Android platform), and for example, the audio source management interface may record audio data using audio input hardware of the platform.

In an alternative, before mixing at least one frame of audio data with the real-time audio signal, the real-time audio signal may be subjected to at least one of the following processes: signal aliasing cancellation, jitter cancellation, oversampling cancellation, noise suppression, echo cancellation, or gain control. Optionally, the above-described processing may be performed in a preprocessing module 680 of the apparatus 600. The preprocessing module 680 may be implemented by a software module or hardware logic, for example, a software module in HiFi or a hardware logic integrated in HiFi, and optionally, the preprocessing module may be integrated with the digital mixing module 640 in a chip, or may be an independent chip. Optionally, the pre-processing module may be the pre-processing module 442 of fig. 4 or may be a pre-processing module in the sound processing system 132 of the apparatus 100.

In an alternative, at least one of the above processes may be implemented in an audio codec, which may be the audio codec 450 in fig. 4, or an audio codec module integrated in the audio circuit 160 of the apparatus 100 or an audio codec located in the processing system 130. Further, the Analog audio signal is acquired before the real-time audio signal is acquired, and the processing may further include Analog-to-Digital Conversion (ADC) for converting the Analog audio signal into a Digital real-time audio signal.

In an alternative scheme, when real-time digital mixing is required, the enabling information may be sent to the digital mixing module by touching a function icon associated with the digital mixing interface on the display 430, and the digital mixing module is activated to mix the real-time audio signal from the audio input device with at least one frame of audio data called by the player, so as to obtain a mixed audio signal.

Further, as shown in fig. 4, after the mixed audio signal is obtained, it can be directly transmitted to the speaker 470 coupled with the audio codec for real-time playing; optionally, the obtained mixed audio signal may also be transmitted to a digital mixing interface located at an application layer through an upward data path in the system software, and provided to an upper layer application for subsequent operations. Optionally, the subsequent operation may include at least one of: the obtained mixed audio signal is stored in the local storage 420, and the obtained mixed audio signal is uploaded to a network terminal or transmitted to a third party media player for playing, for example, the third party media player may be a live network platform, various music players, a video player, and the like.

Optionally, the real-time digital mixing method may include step 509, acquiring a video image signal.

In an alternative scheme, the audio input device acquires a real-time audio signal and simultaneously acquires a video image signal, specifically, the video image signal may be acquired through a camera of the device or other devices with image capturing functions, and the video image signal is an image signal of a plurality of consecutive frames. In an alternative scheme, the video image signal may be obtained from a local storage or a network side, and the video image signal may be a video signal formed by a plurality of temporally or spatially consecutive pictures or several non-consecutive pictures.

Optionally, the real-time digital mixing method may include step 510 of mixing the video image signal with the mixed audio signal to obtain a mixed video signal.

Optionally, the mixed audio signal obtained in step 508 and the video image signal obtained in step 509 may be transmitted to a video processing module for fusion to obtain a mixed video signal, and optionally, the video processing module may be the video processing module 670 of the apparatus 600. Optionally, the video processing module 670 may be a software module stored in a memory, or may be implemented by a hardware logic circuit, and the video processing module may also be a separate chip; for example, the video processing module may be a video codec. In an alternative, the video processing module may be a software module or a piece of hardware circuitry in the media processing system 131 of the apparatus 100, or may be a software module stored in the memory 420 in fig. 4 for implementing video mixing.

Further, after the mixed video signal is obtained, the mixed video signal can be transmitted to a video player for playing, or can be shared to a plurality of internet users in real time through a network, or the mixed video signal can be stored in a local storage for subsequent playback by the users.

It should be understood that the device embodiments provided in the present application are merely schematic, the cell division in fig. 6 is only one logic function division, and other division ways may be available in actual implementation. For example, multiple modules may be combined or may be integrated into another system. The coupling of the various modules to each other may be through interfaces that are typically electrical communication interfaces, but mechanical or other forms of interfaces are not excluded. Thus, modules described as separate components may or may not be physically separate, may be located in one place, or may be distributed in different locations on the same or different devices.

An embodiment of the present application further provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform one or more steps of any one of the above-mentioned real-time digital mixing methods. The respective constituent modules of the above-described apparatus may be stored in the computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.

Based on such understanding, the embodiments of the present application also provide a computer program product containing instructions, where a part of or all or part of the technical solution that substantially contributes to the prior art may be embodied in the form of a software product stored in a storage medium, and the computer program product includes instructions for causing a computer device, a mobile terminal, or a processor therein to execute all or part of the steps of the method described in the embodiments of the present application. The storage medium is described with reference to the memory 140. For example, the sound processing module 440 may be implemented in software. The sound processing module 440 at this time may be an arithmetic unit formed of software running in the application processor 410. That is, the application processor 410 implements the related method flow of the embodiment of the present invention by executing software instructions.

In the embodiment, at least one frame of audio data acquired by the media playing interface is mixed with a real-time audio signal from the audio input device in real time, and the media playing interface acquires an audio file in a frame unit instead of acquiring the whole audio file, so that any frame of audio data of other audio files can be flexibly switched to through the media playing interface at any time in the audio mixing process, and real-time digital audio mixing is realized. And a data path in system software is used for transmitting at least one frame of audio data, and external environment noise is not introduced. Furthermore, because the sound mixing is carried out in real time and the existing music playing process is multiplexed, at least one frame of audio data of the application layer software is called through the media playing interface, and the method is simple to realize and high in flexibility.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application. For example, some specific operations in an apparatus embodiment may refer to previous method embodiments.

Claims

1. An apparatus for audio signal processing, the apparatus comprising: an audio processor and a transmission interface;

the audio processor is configured to:

acquiring at least one frame of audio data transmitted from a data channel in system software through the transmission interface; the system software runs on an application processor;

acquiring a real-time audio signal, wherein the real-time audio signal is obtained by processing an original audio signal acquired from a microphone;

and mixing the at least one frame of audio data with the real-time audio signal to obtain a mixed audio signal.

2. The apparatus of claim 1, wherein the at least one frame of audio data is obtained through a media playback interface, and wherein the media playback interface is an application programming interface.

3. The apparatus of claim 2, wherein the media playing interface is coupled to a music player, and wherein the audio processor is configured to:

and acquiring the at least one frame of audio data from the music player through the data path in the system software, wherein the at least one frame of audio data is decoded original audio data.

4. The apparatus of any of claims 1 to 3, wherein the system software comprises operating system software.

5. The apparatus of any of claims 1 to 4, wherein the data path comprises at least one of:

a track source node, an audio regulator, an audio hardware abstraction layer, or a hardware driver layer.

6. The apparatus of any of claims 1 to 5, wherein the audio processor is further configured to:

and prohibiting the playing of the at least one frame of audio data in the process of acquiring the at least one frame of audio data from the data path in the system software.

7. The apparatus of claim 6, wherein the audio processing appliance is configured to:

closing an audio output data stream of the at least one frame of audio data by an audio regulator in the data path; or

Controlling a hardware driver layer in the datapath to disable an audio output device for the at least one frame of audio data based on an audio hardware abstraction layer in the datapath.

8. The apparatus of any of claims 1 to 7, wherein prior to said acquiring the real-time audio signal, the audio processor is further configured to:

detecting whether the real-time audio signal is input;

when the real-time audio signal input is detected, the volume of the at least one frame of audio data is reduced.

9. The apparatus of any of claims 1 to 8, wherein prior to said acquiring the real-time audio signal, the audio processor is further configured to:

acquiring the original audio signal from the microphone;

processing the original audio signal to obtain the real-time audio signal by at least one of the following processes:

signal aliasing cancellation, jitter cancellation, oversampling cancellation, noise suppression, echo cancellation, or gain control.

10. A method of audio signal processing, the method comprising:

acquiring at least one frame of audio data transmitted from a data path in system software; the system software runs on an application processor;

11. The method of claim 10, wherein the at least one frame of audio data is obtained through a media playback interface, and wherein the media playback interface is an application programming interface.

12. The method of claim 11, wherein the media playing interface is coupled to a music player, the method further comprising:

13. The method of any of claims 10 to 12, wherein the system software comprises operating system software.

14. The method of any of claims 10 to 13, wherein the data path comprises at least one of: a track source node, an audio regulator, an audio hardware abstraction layer, or a hardware driver layer.

15. The method according to any one of claims 10 to 14, further comprising:

16. The method according to claim 15, wherein said prohibiting the playback of the at least one frame of audio data comprises:

17. The method of any of claims 10 to 16, wherein prior to said acquiring the real-time audio signal, the method further comprises:

detecting whether the real-time audio signal is input;

18. The method of any of claims 10 to 17, wherein prior to said acquiring the real-time audio signal, the method further comprises:

acquiring the original audio signal from the microphone;

19. An apparatus for audio signal processing, the apparatus comprising a processor and a transmission interface;

the processor is configured to read software instructions in the memory, execute the software instructions to:

acquiring at least one frame of audio data through a media playing interface, wherein the media playing interface is an application programming interface;

transmitting the at least one frame of audio data to a digital mixing module through a data path in system software;

acquiring a real-time audio signal;

and mixing the at least one frame of audio data with the real-time audio signal in the digital audio mixing module to obtain an audio mixing audio signal.

20. The apparatus of claim 19, wherein the media playing interface is coupled to a music player, and wherein the processor is configured to read the software instructions in the memory to:

21. The apparatus of claim 19 or 20, wherein the system software comprises operating system software.

22. The apparatus of any of claims 19 to 21, wherein the data path comprises at least one of: a track source node, an audio regulator, an audio hardware abstraction layer, or a hardware driver layer.

23. The apparatus according to any one of claims 19 to 22, wherein the processor is further configured to:

and prohibiting the playing of the at least one frame of audio data in the process of transmitting the at least one frame of audio data to the digital mixing module through a data path in system software.

24. The apparatus of claim 23, wherein the treatment appliance is configured to:

25. The apparatus of any of claims 19 to 24, wherein prior to said acquiring the real-time audio signal, the processor is further configured to:

detecting whether the real-time audio signal is input;

26. The apparatus of any of claims 19 to 25, wherein prior to said acquiring the real-time audio signal, the processor is further configured to:

acquiring the original audio signal from the microphone;

27. A terminal, characterized in that the terminal comprises: the system comprises an application processor and an audio processor, wherein system software runs on the application processor;

the application processor is configured to:

transmitting the at least one frame of audio data to the audio processor via a data path in the system software;

the audio processor is configured to:

28. The terminal of claim 27, further comprising: the microphone is used for acquiring the original audio signal.

29. The terminal of claim 27, wherein the media playing interface is coupled to a music player, and wherein the application processor is configured to:

and acquiring the at least one frame of audio data from the music player through the media playing interface, wherein the at least one frame of audio data is decoded original audio data.

30. A terminal according to any of claims 27 to 29, wherein the system software comprises operating system software.

31. A terminal according to any of claims 27 to 30, wherein the data path comprises at least one of: a track source node, an audio regulator, an audio hardware abstraction layer, or a hardware driver layer.

32. The terminal according to any of claims 27 to 31, wherein the application processor is further configured to:

and prohibiting the playing of the at least one frame of audio data in the process of transmitting the at least one frame of audio data to the audio processor through a data path in system software.

33. The terminal of claim 32, wherein the application handler is configured to:

34. The terminal of any of claims 27 to 33, wherein prior to said acquiring the real-time audio signal, the audio processor is further configured to:

detecting whether the real-time audio signal is input;

35. The terminal of any of claims 27 to 34, wherein prior to said acquiring the real-time audio signal, the audio processor is further configured to:

acquiring the original audio signal from the microphone;

36. A computer-readable storage medium having stored therein instructions, which when run on a computer or processor, cause the computer or processor to perform the method of any one of claims 10-18.

37. A computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the method of any of claims 10-18.