EP4093046A1

EP4093046A1 - Multi-microphone audio capture

Info

Publication number: EP4093046A1
Application number: EP21175256.3A
Authority: EP
Inventors: Lasse Juhani Laaksonen; Miikka Tapani Vilermo; Arto Juhani Lehtiniemi
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2022-11-23

Abstract

An apparatus, computer program and method is described comprising: identifying a microphone of a multi-microphone audio capture device that is blocked; and determining a position of at least one first visual indication of the blocked microphone for output to a user on a display of the audio capture device based, at least in part, on a location of the respective microphone on the audio capture device, wherein the position of the at least one first visual indication corresponds more closely to the position of the respective blocked microphone than to the position of any other microphone of the multi-microphone audio capture device.

Description

Field

Example embodiments relate multi-microphone audio capture, such as identifying when one or more microphones of an audio capture device are blocked.

Background

The number of microphones in many audio capture devices, such as smartphones, is tending to increase. Increasing the number of microphones can have advantages, such as aiding noise suppression techniques and beamforming, and enables new features such as spatial audio capture. Spatial audio capture may be useful for communications and user-generated content (UGC). For example, a user may generate videos that incorporate spatial audio. Spatial audio capture is extending to also other domains outside mobile phones. Examples include standalone cameras, police wearable cameras, action cameras and portable recorders. There remains a need for further developments in this field.

Sum m ary

In a first aspect, this specification describes an apparatus comprising means for performing: identifying at least one microphone of a multi-microphone audio capture device that is blocked (e.g. blocked by a user of the device); and determining a position of at least one first visual indication of the blocked microphone(s) for output to a user on a display (e.g. an integrated display) of the audio capture device based, at least in part, on a location of the respective microphone on the audio capture device, wherein the positions of the at least one first visual indication correspond more closely to the position of the respective blocked microphone than to the position of any other microphone of the multi-microphone audio capture device. The audio capture device may be used to capturing spatial audio, for example for use to generate user-generated content (UGC) such as video with spatial audio content. The display may be a screen, such as an integrated screen of the user device. In some example embodiments, the display may be separate to the user device (e.g. a monitor or a headset display).
Some example embodiments further comprise means for performing: determining whether any part of the display is blocked (e.g. such that the relevant part of the display is not visible to the user), wherein the position of the at least one first visual indication of the blocked microphone(s) is based, at least in part, on a location of any part of the display that is blocked. The means for performing determining the position of the at least one first visual indication of the blocked microphone(s) may further comprise means for performing determining the position of the at least one first visual indication such that at least part of the or each first visual indication is on a part of the display not blocked. The means for performing determining the position of the at least one first visual indication of the blocked microphone(s) may further comprise determining the position of the first visual indication such that the or each first visual indication corresponds as closely as possible to the position of the blocked microphone.
Some example embodiments further comprise means for performing: determining an extent of the at least one first visual indication of the blocked microphone(s). The extent of the at least one first visual indication of the blocked microphone(s) may be based, at least in part, on a location of any part of the display that is blocked. The extent of the at least one first visual indication of the blocked microphone(s) may be based, at least in part, on one or more of: the location of the respective blocked microphone; a functionality of the respective blocked microphone; or an impact of the respective blocked microphone to an audio output (e.g. audio degradation). A priority order for unblocking multiple blocked microphones may be provided.
In some example embodiments, a nature of the at least one first visual indication of the blocked microphone(s) is based, at least in part, on one or more of: whether the respective blocked microphone is on a front face of the audio capture device; whether the respective blocked microphone is on a rear face of the audio capture device; whether the respective blocked microphone is neither on the front face nor on the rear face of the audio capture device; whether the respective blocked microphone is on a face of the audio capture device visible to and/ or being viewed by the user; a distance between the first visual indication and the position of the respective blocked microphone; or a microphone type of the respective blocked microphone.
Some example embodiments further comprise means for performing: determining a position of a user gaze on the display of the audio capture device. For example, means for performing: providing a further visual indication on the display directing the user gaze from the position of the user gaze to the position of the at least one first visual indication may be provided. Some example embodiments further comprise means for performing: determining whether to provide the further visual indication on the display based, at least in part, on one or more of: a distance between the user gaze and the at least one first visual indication (e.g. whether the distance is above a threshold distance); an impact of the respective blocked microphone to an audio output; or whether the at least one first visual indication is provided on a part of the display that is blocked.
The at least one first visual indication may be based on an impact of the respective blocked microphone on possible audio formats.
Some example embodiments further comprise means for performing: setting an audio capture mode based, at least in part, on the at least one blocked microphone (e.g. based on reduced performance capabilities). Auser may be required to confirm a change in audio capture mode (or to cease blocking the relevant microphone(s)).
The at least one first visual indication may be provided on the display only in the event that a selected audio mode is not possible due to the respective blocked microphone.
Some example embodiments further comprise means for performing: providing a virtual indication to the user on an augmented reality display, wherein the virtual indication directs a user gaze to the at least one first visual indication.
Some example embodiments further comprise mans for performing: providing the at least one first visual indication on the display in the determined position(s).
The means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured, with the at least one processor, to cause the performance of the apparatus.
In a second aspect, this specification describes an audio capture device comprising: a plurality of microphones for capturing audio; a display for outputting at least one first visual indication of at least one blocked microphone in response a determination that one of the plurality of microphone is blocked, wherein the display provides the at least one first visual indication based, at least in part, on a location of the respective blocked microphone on the audio capture device, wherein the position of the at least one first visual indication corresponds more closely to the position of the respective blocked microphone than to the position of any other microphone of the multi-microphone audio capture device.
One or more microphones of the plurality may be on a front side of the audio capture device and one or more other microphones of the plurality may be on a rear side of the audio capture device.
The audio capture device may comprise means (such as one or more sensors) for performing: determining whether any part of the display is blocked (e.g. such that the relevant part of the display is not visible to the user), wherein the position of the at least one first visual indication of the blocked microphone(s) is based, at least in part, on a location of any part of the display that is blocked.
The audio capture device may comprise means for performing: determining a position of a user gaze on the display of the audio capture device.
The at least one first visual indication may be provided on the display only in the event that a selected audio mode is not possible due to the respective blocked microphone.
The audio capture device may further comprise any aspect of the apparatus as described with reference to the first aspect.
In a third aspect, this specification describes a method comprising: identifying at least one microphone of a multi-microphone audio capture device that is blocked; and determining a position of at least one first visual indication of the at least one blocked microphones for output to a user on a display of the audio capture device based, at least in part, on a location of the respective microphone on the audio capture device, wherein the position of the at least one first visual indication corresponds more closely to the position of the respective blocked microphone than to the position of any other microphone of the multi-microphone audio capture device.
Some example embodiments further comprise: determining whether any part of the display is blocked (e.g. such that the relevant part of the display is not visible to the user), wherein the position of the at least one first visual indication of the blocked microphone(s) is based, at least in part, on a location of any part of the display that is blocked. The position of the at least one first visual indication of the blocked microphone(s) may be such that at least part of the or each first visual indication is on a part of the display not blocked. The position of the at least one first visual indication of the blocked microphone(s) may be such that the or each first visual indication corresponds as closely as possible to the position of the blocked microphone.
The method may further comprise determining an extent of the at least one first visual indication of the blocked microphone(s). The extent of the at least one first visual indication of the blocked microphone(s) may be based, at least in part, on a location of any part of the display that is blocked. The extent of the at least one first visual indication of the blocked microphone(s) may be based, at least in part, on one or more of: the location of the respective blocked microphone; a functionality of the respective blocked microphone; or an impact of the respective blocked microphone to an audio output (e.g. audio degradation). Apriority order for unblocking multiple blocked microphones may be provided.
In some example embodiments, a nature of the at least one first visual indication of the blocked microphone(s) is based, at least in part, on one or more of: whether the respective blocked microphone is on a front face of the audio capture device; whether the respective blocked microphone is on a rear face of the audio capture device; whether the respective blocked microphone is neither on the front face nor on the rear face of the audio capture device; whether the respective blocked microphone is on a face of the audio capture device visible to and/ or being viewed by the user; a distance between the first visual indication and the position of the respective blocked microphone; or a microphone type of the respective blocked microphone.
The method may further comprise: determining a position of a user gaze on the display of the audio capture device. For example: providing a further visual indication on the display directing the user gaze from the position of the user gaze to the position of the at least one first visual indication may be provided. Some example embodiments further: determining whether to provide the further visual indication on the display based, at least in part, on one or more of: a distance between the user gaze and the at least one first visual indication (e.g. whether the distance is above a threshold distance); an impact of the respective blocked microphone to an audio output; or whether the at least one first visual indication is provided on a part of the display that is blocked.
The at least one first visual indication may be based on an impact of the respective blocked microphone on possible audio formats.
Some example embodiments further comprise: setting an audio capture mode based, at least in part, on the at least one blocked microphone (e.g. based on reduced performance capabilities). A user may be required to confirm a change in audio capture mode (or to cease blocking the relevant microphone(s)).
The at least one first visual indication may be provided on the display only in the event that a selected audio mode is not possible due to the respective blocked microphone. Some example embodiments further comprise: providing a virtual indication to the user on an augmented reality display, wherein the virtual indication directs a user gaze to the at least one first visual indication.
Some example embodiments further comprise: providing the at least one first visual indication on the display in the determined position(s).
In a fourth aspect, this specification describes an apparatus configured to perform any (at least) any method as described with reference to the third aspect.
In a fifth aspect, this specification describes computer-readable instructions which, when executed by a computing apparatus, cause the computing apparatus to perform (at least) any method as described with reference to the third aspect.
In a sixth aspect, this specification describes a computer-readable medium (such as a non-transitory computer-readable medium) comprising program instructions stored thereon for performing (at least) any method as described with reference to the third aspect.
In a seventh aspect, this specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to perform (at least) any method as described with reference to the third aspect.
In an eighth aspect, this specification describes a computer program comprising instructions for causing an apparatus to perform at least the following: identifying at least one microphone of a multi-microphone audio capture device that is blocked; and determining a position of at least one first visual indication of the at least one blocked microphones for output to a user on a display of the audio capture device based, at least in part, on a location of the respective microphone on the audio capture device, wherein the position of the at least one first visual indication corresponds more closely to the position of the respective blocked microphone than to the position of any other microphone of the multi-microphone audio capture device.
In a ninth aspect, this specification describes an apparatus comprising: one or more sensors (or some other means) for identifying at least one microphone of a multi-microphone audio capture device that is blocked; and a display controller (or some other means) for determining a position of at least one first visual indication of the at least one blocked microphones for output to a user on a display of the audio capture device based, at least in part, on a location of the respective microphone on the audio capture device, wherein the position of the at least one first visual indication corresponds more closely to the position of the respective blocked microphone than to the position of any other microphone of the multi-microphone audio capture device.

Brief description of the drawings

Example embodiments will now be described, by way of example only, with reference to the following schematic drawings, in which:

FIG. 1 is a front view of a device in accordance with an example embodiment;
FIG. 2 is a rear view of the device of FIG. 1;
FIG. 3 is a block diagram showing an example use of the device of FIGS. 1 and 2 in accordance with an example embodiment;
FIG. 4 is a block diagram of a device in accordance with an example embodiment;
FIG. 5 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 6 shows a view from above of the device of FIGS. 1 and 2 in accordance with an example embodiment;
FIG. 7 is a block diagram of a device in accordance with an example embodiment;
FIG. 8 is a block diagram of a display in accordance with an example embodiment;
FIG. 9 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 10 shows a view from above of the device of FIGS. 1 and 2 in accordance with an example embodiment;
FIG. 11 is a block diagram of a device in accordance with an example embodiment;
FIG. 12 is a block diagram of a display in accordance with an example embodiment;
FIG. 13 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 14 shows a view from above of the device of FIGS. 1 and 2 in accordance with an example embodiment;
FIG. 15 is a block diagram of a display in accordance with an example embodiment;
FIG. 16 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 17 is a block diagram of a device in accordance with an example embodiment;
FIG. 18 is a block diagram of a display in accordance with an example embodiment;
FIG. 19 is a block diagram showing an example use of the device of FIGS. 1 and 2 in accordance with an example embodiment;
FIG. 20 is a block diagram of a display in accordance with an example embodiment;
FIG. 21 is a block diagram of a display in accordance with an example embodiment;
FIG. 22 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 23 is a block diagram of a display in accordance with an example embodiment;
FIG. 24 is a block diagram of a display in accordance with an example embodiment;
FIG. 25 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 26 is a block diagram showing an example use of the device of FIGS. 1 and 2 in accordance with an example embodiment;
FIG. 27 is a block diagram of components of a system in accordance with an example embodiment; and
FIGS. 28A and 28B show tangible media, respectively a removable non-volatile memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to example embodiment.

Detailed description

The scope of protection sought for various embodiments of the invention is set out by the independent claims. The embodiments and features, if any, described in the specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention.
FIG. 1 is a front view of a device, indicated generally by the reference numeral 10, in accordance with an example embodiment. FIG. 2 is a rear view of the device 10.
As shown in FIG. 1, a display 11 is provided on the front side of the device 10. As shown in FIG. 2, a camera hump 12 is provided on the rear side of the device 10. The display 11 may be provided by a screen of the device 10.
The device 10 includes six microphones (although alternative embodiments could include more or fewer microphones). As shown in FIG. 1, a first microphone 21 is provided near an earpiece of the device 10, and second and third microphones 22 and 23 are provided near the bottom of the device 10. As shown in FIG. 2, fourth and fifth microphones 24 and 25 are provided at opposite ends of the rear side and a sixth microphone 26 forms part of the camera hump 12.
FIG. 3 is a block diagram, indicated generally by the reference numeral 30, showing an example use of the device 10 in accordance with an example embodiment.
The system 30 shows a user 32 holding the device 10 whilst capturing audio-visual content. For example, the user 32 may be recording a video with spatial audio capture.
As shown in FIG. 3, the right hand 34 of the user 32 is blocking some of the microphones of the device 10 (e.g. the second and third microphones 22 and 23 at the bottom of the device). While it is often apparent to the user if they are blocking the camera, it can be quite difficult for the user to observe that they are blocking any of the microphones. For example, spatial audio monitoring is seldom carried out when a user records user generated content (UGC). Furthermore, it can be difficult for a user to hear a specific spatial audio problem in some circumstances (e.g., there is no apparent sound source in a direction that is mainly affected).
As discussed in detail below, some type of indication can be provided to alert the user to the blockage of one or more microphones. For example, an indication of which of a plurality of microphones is blocked and/ or the impact of the blockage may be provided. For example, the indication may seek to prevent microphones that are particularly important for high-quality capture of a currently selected spatial audio format from being blocked.
FIG. 4 is a block diagram of a device, indicated generally by the reference numeral 40, in accordance with an example embodiment. The device 40 is a simplified illustration of the device 10 described above. In the device 40, the positions of the microphones 21 to 26 are indicated by the labels 1 to 6 respectively. The first to third microphones (that are on the front side of the device) are indicated with filled circles and the fourth to sixth microphones (that are on the rear side of the device) are indicated with empty circles (or donut shapes).
FIG. 5 is a flow chart showing an algorithm, indicated generally by the reference numeral 50, in accordance with an example embodiment.
The algorithm 50 starts at operation 52 where a microphone of a multi-microphone audio capture device (such as the device 10) that is blocked is identified. A microphone may be blocked by a user of the device (as in the system 30). The user might not know where the microphones are and hence may accidentally block a microphone. Of course, two or more blocked microphones may be identified in the operation 52. One or more sensors (or some other means) may be provided for identifying blocked microphone(s).
At operation 54, a position of a first visual indication of the blocked microphone(s) for output to a user on a display or of the audio capture device (such as the display 11) is determined. The position of a visual indication may be determined (for example by a display controller or some other means) based, at least in part, on a location of the blocked microphone on the audio capture device. For example, the position of the first visual indication may correspond more closely to the position of the blocked microphone than to the position of any other microphone of the multi-microphone audio capture device. The display may be an integrated display (as in the device 10), but this is not essential to all example embodiments. For example, the display may be a separate display (e.g. a monitor or a headset display).
At operation 56, the first visual indication is provided on the display in the position determined in the operation 54.
FIG. 6 shows a view from above of the device 10 in accordance with an example embodiment. The camera hump 12 is visible of the rear side of the device 10. A finger 62 of a user is shown in the front side of the device 10. As discussed further below, the finger blocks the first microphone 21.
FIG. 7 is a block diagram of a device, indicated generally by the reference numeral 70, in accordance with an example embodiment. The device 70 is a simplified illustration of the device 10 and is similar to the device 40 described above.
The device 70 includes a representation 72 of the user's finger (i.e. the finger 62 described above) that is blocking the first microphone 21 at position 1 of the device 10. Note that the fifth microphone 25 at position 5 is not blocked since that microphone is on the rear of the device 10 and the user's finger is covering a portion of the front of the device only.
In an implementation of the algorithm 50, the operation 52 determines that the first microphone 21 at position 1 is blocked. In operation 54, a position of a visual indication of the blocked microphone for output to a user on a display of the audio capture device is determined.
FIG. 8 is a block diagram of a display, indicated generally by the reference numeral 80, in accordance with an example embodiment. The display 80 may be the display 11 of the device 10 described above.
The display 80 includes a first visual indication 82 provided in an example implementation of the operation 56 of the algorithm 50. Also shown in FIG. 8 is the representation of the user's finger 72 such that the position and extent of the visual indication 82 relative to the user's finger can be seen.
In this way, a visual indication is provided to the user. This indication may be provided as close as possible to the position of the blocked microphone (the first microphone 21 in this example).
Thus, the device 10 is an audio capture device (e.g. an audio-visual capture device) having a plurality of microphones for capturing audio and a display for outputting a first visual indication of a blocked microphone in response to a determination that one of the plurality of microphone is blocked. The display provides the first visual indication based, at least in part, on a location of the microphone on the audio capture device, wherein the position of the first visual indication corresponds more closely to the position of the blocked microphone than to the position of any other microphone of the multi-microphone audio capture device.
FIG. 9 is a flow chart showing an algorithm, indicated generally by the reference numeral 90, in accordance with an example embodiment.
The algorithm 90 starts at operation 92, where a determination is made regarding whether any part of the display is blocked (e.g. by the user), such that the relevant part of the display is not visible to the user. A region of the display being blocked may be determined, for example based on camera-based detection, hover-based detection, touchscreen-based detection, or any other suitable method.
At operation 94, the position of the first visual indication of the blocked microphone is based, at least in part, on a location of any part of the display that is blocked (i.e. not visible to the user).
Thus, consideration may be given to parts of the display that are blocked when providing the indication, for example by providing the indication as close as possible to the microphone position in a visible part of the display. In the example display 80, the user's finger does not block a very large portion of the display, and therefore the indication location need not be changed. However, the system may, in some example embodiments, modify the extent (e.g. size) of the visual modification in order to make it clearly visible and intuitive enough for the user to understand which microphone is intended. The system may also check that the indication is not closer to an unblocked microphone than the blocked microphone. If this is not possible, then some other indication may be provided. For example, the indication may be provided as close as possible to the blocked microphone within the visible part of the display. In this configuration, the indication may take a different form (e.g. a different colour or the form of an arrow pointing to the blocked microphone) to distinguish this scenario from a default condition of providing the indication not closer to an unblocked microphone than a blocked microphone.
In some example embodiments, if an indication of a blocked microphone is only partially blocked, then it is not modified. This may be advantages since the user may move their hand to prevent the indication from being partially blocked and may unblock the respective microphone in the process.
FIG. 10 shows a view from above of the device 10 in accordance with an example embodiment. The camera hump 12 is visible of the rear side of the device 10. A hand 102 of a user is shown. As discussed further below, the user blocks the second microphone 22 and third microphone 23 and also obscures some of the display 11.
FIG. 11 is a block diagram of a device, indicated generally by the reference numeral 110, in accordance with an example embodiment. The device 110 is a simplified illustration of the device 10 and is similar to the devices 40 and 70 described above. FIG. 11 shows a scenario in which a user's hand 112 is grabbing the device at the edge and on top of the display.
The device 110 includes a representation of the user's hand 112 (i.e. the hand 102 described above) that is blocking the second and third microphones 22 and 23 at positions 2 and 3 of the device 10. Note that the fourth microphone 24 at position 4 (on the rear of the device 10) is not blocked.
FIG. 12 is a block diagram of a display, indicated generally by the reference numeral 120, in accordance with an example embodiment. The display 120 may be the display 11 of the device 10 described above.
The display 120 includes a first visual indication 122 and a second visual indication 123 provided in an example implementation of the operation 56 of the algorithm 50. Also shown in FIG. 12, the representation of the user's hand 102 is such that the position and extent of the first and second visual indications 122, 123 relative to the user's hand 112 can be seen.
In this way, a visual indication is provided to the user. This indication may be provided as close as possible to the position of the blocked microphones (the second and third microphones 22 and 23 at positions 2 and 3 in this example). However, the positions of the indications may take into account the partial blocking the display by the user's hand 112. In this example, even though the microphones are at the very edge of the device, the indications are moved slightly towards the centre such that the user can see them. If user would now drag their hand towards the edge of the device, the indications could correspondingly move towards the edge to show which microphones are blocked. Note that if the visual indications are only partially blocked, then they may not be modified in this way in some example embodiments.
FIG. 13 is a flow chart showing an algorithm, indicated generally by the reference numeral 130, in accordance with an example embodiment. The algorithm 130 may be used to generate the display 120 described above.
The algorithm 130 starts at operation 52 where (as described above with respect to the algorithm 50) one or more microphones of a multi-microphone audio capture device (such as the device 10) that are blocked are identified. In an implementation of the algorithm 130, the operation 52 determines that the second and third microphones 22 and 23 are blocked.
The algorithm moves to operation 54 where, as described above, a position of a first visual indication of the blocked microphone for output to a user on a display of the audio capture device is determined.
In operation 132 of the algorithm 130, the nature and/or the extent of the visual indication of the blocked microphone(s) is determined.
Finally, at operation 134, the visual indication(s) (such as the first and second visual indications 122 and 123 described above) are provided on the display.
The extent of the visual indication of a blocked microphone may be based, at least in part, on a location of any part of the display that is blocked by the user. Thus, as described above, the first and second visual indications 122 and 123 may be moved to avoid the portion of the display blocked by the user's finger/hand.
Alternatively, or in addition, the extent of the visual indication of the blocked microphone may based, at least in part, on one or more of:

the location of the blocked microphone;
a functionality of the blocked microphone; and
an impact of the blocked microphone to an audio output.

Alternatively, or in addition, the nature of the visual indication of the blocked microphone may be based, at least in part, on one or more of:

whether the blocked microphone is on a front face of the audio capture device;
whether the blocked microphone is on a rear face of the audio capture device;
whether the blocked microphone is neither on the front face nor on the rear face of the audio capture device;
whether the blocked microphone is on a face of the audio capture device visible to and/or being viewed by the user;
a distance between the first visual indication and the position of the blocked microphone; and
a microphone type of the blocked microphone.

FIG. 14 shows a view from above of the device 10 in accordance with an example embodiment. The camera hump 12 is visible of the rear side of the device 10. A finger or hand 142 of a user is shown. As discussed further below, the user blocks the second microphone 22, the third microphone 23 and the fourth microphone 24 and also obscures some of the display 11. Thus, microphones on both the front and the rear of the device 10 are blocked.
FIG. 15 is a block diagram of a display, indicated generally by the reference numeral 150, in accordance with an example embodiment. The display 150 may be the display 11 of the device 10 described above.
The display 150 includes a first visual indication 152, a second visual indication 153 and a third visual indication 154 provided in an example implementation of the operation 56 of the algorithm 50. Also shown in FIG. 15, the representation of the user's finger or hand 142 is such that the position and extent of the visual indications 152 to 154 relative to the user's finger or hand 142 can be seen.
The third visual indication 154 corresponding to the fourth microphone 24 is different to the first and second visual indications 152 and 153 corresponding to the second and third microphones 22 and 23, since the fourth microphone is on the rear of the device and the second and third microphone are on the front.
By way of example, the display 150 shown in FIG. 15 may be provided on the front of the device, with the visual indications 152 and 153 being provided in solid form and the visual indication 154 in dotted form. Of course, many alternative display types are possible, such as the use of different colours to distinguish between which side of the device a particular blocked microphone is on.
Some example embodiment makes use of gaze tracking. This can be particularly useful in case of larger displays. For example, a user may be looking at a first part of display while blocking one or more microphones relating to a second part of the display. Gaze tracking can be used to obtain the position of the user's gaze on the display and, in some circumstances, direct the user's attention to an indication of a blocked microphone.
FIG. 16 is a flow chart showing an algorithm, indicated generally by the reference numeral 160, in accordance with an example embodiment.
FIG. 17 is a block diagram of a device, indicated generally by the reference numeral 170, in accordance with an example embodiment. The device 170 is a simplified illustration of the device 10 and is similar to the devices 40, 70 and 110 described above. FIG. 17 shows a scenario in which a user's hand 172 is grabbing the device at one edge of the display but is looking at the other side of the display. As discussed further below, the device 170 may be used in an implementation of the algorithm 160.
FIG. 18 is a block diagram of a display, indicated generally by the reference numeral 180, in accordance with an example embodiment. The display 180 may be the display 11 of the device 10 described above. The display 180 includes first visual indications 182 and 183 relating to position of blocked microphones, as discussed above. The display also includes a further visual indication 184 discussed further below.
The algorithm 160 starts at operation 52 where, as described above, a microphone of a multi-microphone audio capture device (such as the device 10) that is blocked is identified. Then, at operation 54, a position of a first visual indication of the blocked microphone for output to a user on a display of the audio capture device (such as the display 11) is determined. The position may be determined based, at least in part, on a location of the blocked microphone on the audio capture device.
By way of example, the second and third microphones 22 and 23 of the device 10 may be blocked by the user's hand 172 (as shown in FIG. 17).
At operation 162 of the algorithm 160, a position of a user gaze on the display of the audio capture device is determined. An example user gaze position 174 is shown in FIG. 17. The user gaze position may be determined, for example, by imaging the pupils of the user; alternative methods will be apparent to those of ordinary skill in the art.
At operation 164 of the algorithm 160, a determination is made regarding whether a further visual indication directing the user gaze from the position of the user gaze to the position of the first visual indication should be provided on the display. The further visual indication 184 shown in FIG. 18 is an example of such a further visual indication.
The decision made in the operation 164 may be based on a variety of factors. These may include one or more of:

a distance between the user gaze and the first visual indication(s), such as whether the distance is above a threshold distance.;
an impact of the blocked microphone to an audio output; and
whether the first visual indication is provided on a part of the display that is blocked by the user.

If, in the operation 164, a decision is taken to provide a further visual indication, then the algorithm 160 moves to operation 166, where the first and further visual indication are provided (such as the first visual indications 182 and 183, and the further visual indication 184 shown in FIG. 18).
Alternatively, if, in the operation 164, a decision is taken not to provide a further visual indication, then the algorithm 160 moves to operation 168, where the first visual indication are provided (such that the further visual indication 184 shown in FIG. 18 is not provided).
FIG. 19 is a block diagram, indicated generally by the reference numeral 190, showing an example use of the device 10 in accordance with an example embodiment.
The block diagram 190 includes the user 32 holding the device 10 whilst capturing audio-visual content, as described above. As in the example described above, the right hand 34 of the user 32 may be blocking some of the microphones of the device 10.
The spatial audio capture of the microphones of the device 10 may determine 3D audio consisting of audio in a horizontal direction 192 and audio in a vertical direction 194. In the event that one or more of the microphones is blocked, the audio in one direction (e.g. the vertical direction 194) may be degraded. Visual information may be provided that is based on the impact of the blocked microphone(s) on the 3D audio format that can be provided.
FIG. 20 is a block diagram of a display, indicated generally by the reference numeral 200, in accordance with an example embodiment. The display 200 provides a user interface consisting of a horizontal audio indication 202 and a vertical audio indication 204. The horizontal audio indication is shown in solid form, but the vertical audio indication is shown in dashed form. This output may be used to indicate that one or more blocked microphones is inhibiting the vertical direction of a 3D audio format. Thus, the user interface may provide a visual indication based on an impact of the blocked microphone(s) on possible audio formats. Of course, the solid and dashed forms are provided by way of example only; many alternative display types are possible.
FIG. 21 is a block diagram of a display, indicated generally by the reference numeral 210, in accordance with an example embodiment. The display 210 provides the horizontal audio indication 202 and the vertical audio indication 204 described above. In addition, the display 210 provides a visual indication 212 of the location of a blocked microphone. The visual indication 212 may be provided in accordance with the principles described in detail above.
In some example embodiments, the user interface indicates to user when audio degradation happens due to blocked microphone(s). For example, a system may allow blocking of at least one microphone if this blocking does not affect (or does not significantly affect) the (spatial) audio capture quality according to at least one of: current spatial audio content format, current device orientation (e.g., portrait or landscape, or free orientation).
By way of example, the system can indicate to user at least one of: degradation of capture according to current spatial audio format (as shown in FIG. 20), recommended switch to lesser spatial audio format, no ability to switch to higher spatial audio format (if blocking not removed).
In order to provide the displays 200 or 210 described above, a control module may be provide that:

Determines at least one microphone that is currently blocked (e.g., based on any suitable microphone blocking detection technique).
Obtains at least currently selected (spatial) audio format information. This step may also obtain, e.g., at least one lesser or one higher (spatial) audio format.
Determines whether the at least one blocked microphone affects at least the currently selected (spatial) audio format capture performance. This step may also determine, e.g., whether the at least one blocked microphone affects the at least one lesser or one higher (spatial) audio format capture performance.
If the at least one blocked microphone affects the capture performance, an indication of the fact that audio capture according to current format is degraded and optionally how the audio capture is degraded may be provided (e.g. using the horizontal and vertical audio indications 202 and 204 described above).
If the at least one blocked microphone affects the capture performance, the blocked microphone maybe indicated (e.g. as shown in FIG. 21).

Many variants are possible. For example, the control module may indicate and availability to switch to at least one lower or one higher (spatial) audio format without or with degradation due to at least one blocking microphone. Alternatively, or in addition, the control module may indicate changing device orientation to overcome effect of at least one microphone being blocked (as discussed further below).
In some examples, the severity of a degrading effect may be indicated. Example indications of how a specific format may be degraded include: indication that height information will be lost, indication that differentiating between front and back will not be possible, indication that a specific direction (e.g., sector) will have unreliable direction or distance estimates for audio sources, indication that audio zoom is not possible or will be limited (e.g., to 50% effect).
FIG. 22 is a flow chart showing an algorithm, indicated generally by the reference numeral 220, in accordance with an example embodiment.
The algorithm 220 starts at operation 52 where, as discussed in detail above, one or more blocked microphones is identified.
At operation 222 of the algorithm 220, an audio capture mode is set based, at least in part, on the identified blocked microphone(s). For example, the audio capture mode may be set based on reduced performance capabilities. In one example embodiment a user may be required to confirm a change in audio capture mode, but in other example embodiments the change in audio capture mode may be automatic.
In the operation 224, a visual indication of the blocked microphone is provided on a display. As discussed in detail above, the position of the visual indication may be based, at least in part, on a location of the microphone on the audio capture device, wherein the position of the first visual indication corresponds more closely to the position of the blocked microphone than to the position of any other microphone of the multi-microphone audio capture device. In some example embodiments, the visual indication of the blocked microphone is provided on the display only in the event that a selected audio mode is not possible due to the blocked microphone.
Thus, in some examples, the algorithm 220 may automatically switch to, e.g., a lesser spatial audio format or spatial audio capture mode and indicate this switching in addition to indicating the at least one blocked microphone. A user maybe able to confirm this selection by simply continuing the capture. If user decides that the mode switch due to reduced capability should not be made, the user can act to remove the blocking of the at least one microphone indicated on the display.
FIG. 23 is a block diagram of a display, indicated generally by the reference numeral 230, in accordance with an example embodiment.
FIG. 24 is a block diagram of a display, indicated generally by the reference numeral 240, in accordance with an example embodiment.
The display 230 provides an indication to a user of a way to hold based on capture format to not block important microphone(s), such as microphones most critical to a selected (spatial) audio format. For example, in the example of FIG. 23, a user has selected a planar-FOArepresentation (a first order Ambisonics (FOA) format that does not have height information). Such reduced FOA representation can be useful for certain applications, e.g., audio calls where height information may not be particularly important. As the user has the device in portrait orientation, the system may first suggest rotating the device to landscape orientation. This may, for example, maximize separation between the microphones that will be used to derive the spatial audio according to the selected format.
When user has the device in the correct orientation, the system may then indicate, using the display 240, how user should hold it in order to avoid blocking the most important microphones. For example, the system may indicate the positions of the microphones on the device it intends to use for the (spatial) audio capture.
There may be situations where the current capture is adjusted to the available non-blocked microphones, but a user wants to use a feature that would require some of the currently blocked microphones to operate optimally.
In a first example, a user has a device with microphones on the front and back side. Assume that the user's grip is blocking some of the front side microphones but it does not matter in his current operation as he is shooting video with spatial audio using the back camera, thus being able to capture spatial audio with three back microphones. Now, the user wants to switch to the front camera with spatial audio. In this case the use of front microphones would result in much better quality but user is blocking (some of) them.
In a second example, the user has a device with five microphones on the back side on a 'super-zoom device'. Assume now that the user's grip is blocking two out of the five microphones. Spatial audio capture is available when the user is shooting video with the back camera. Now, the user wishes to do serious audio zooming with maximum values that would require the use of the blocked microphones as well.
In both of these examples, the user may receive an indication that the requested functionality is not available in the current state followed by an indication as described earlier regarding which microphone(s) need to be unblocked. The preferred functionality or state may be automatically applied and displayed on the user interface when the user changes his grip, and the required microphones are free.
The device may also indicate a fall-back operation for such function if the microphones are still being blocked. Examples include suboptimal spatial audio, revert to stereo or in the zooming example only modest zoom values being available in the user interface complemented with the visualization of which microphones would need to be free for other higher values.
FIG. 25 is a flow chart showing an algorithm, indicated generally by the reference numeral 250, in accordance with an example embodiment.
The algorithm 250 starts at operation 52 where, as discussed in detail above, one or more blocked microphones is identified.
At operation 252, a position of a visual indication(s) of the one or more blocked microphones for output to a user on a display of the audio capture device (such as the display 11) is determined. As discussed in detail above, the position may be determined based, at least in part, on a location of the blocked microphone on the audio capture device. The visual indication is then provided to the user in operation 254.
At operation 256, a further visual indication is provided to the user on an augmented reality display, wherein the further visual indication directs a user gaze to the first visual indication.
FIG. 26 is a block diagram, indicated generally by the reference numeral 260 showing an example use of the device of FIGS. 1 and 2 in accordance with an example embodiment. That use may, for example, implement the algorithm 250.
The block diagram 260 includes the user 32 holding the device 10 whilst capturing audio-visual content, as described above. As in the example described above, the right hand 34 of the user 32 may be blocking some of the microphones of the device 10. The user 32 is wearing augmented reality (AR) glasses 262.
A visual indication 264 is shown on the device 10. That visual indication may be provided in the operation 254 of the algorithm 250.
A virtual indication 265 is an augmented reality indication that is visible to the user through the AR glasses 262. The virtual indication 265 may be aligned, for example to point to the side of the device where the problem is. This virtual indication 265 may be particularly useful if the AR system is not able to track the device precisely enough. Also, it can help a second user, who is not wearing AR glasses, to realize there is a problem. In one example embodiment, the virtual indication 265 may be used to direct the user's gaze to the device 10, since the position of the visual indication 264 on the device 10 may be more accurate that the position of the virtual indication 265 provided in augmented reality.
The example embodiments described above may be implemented using a smartphones or some other audio capture device that implement spatial audio capture or any type of enhanced mono or stereo capture. In other words, at least two microphones are used to capture audio, where the audio signal resulting from the capture processing may be mono, stereo, or spatial audio signal. As discussed above, user attention may be drawn to at least one specific microphone being blocked by placement of a visual indication on the device display in a way that allows user to intuitively localize the blocked microphone in an intuitive manner.
For completeness, FIG. 27 is a schematic diagram of components of one or more of the example embodiments described previously, which hereafter are referred to generically as a processing system 300. The processing system 300 may, for example, be the apparatus referred to in the claims below.
The processing system 300 may have a processor 302, a memory 304 closely coupled to the processor and comprised of a RAM 314 and a ROM 312, and, optionally, a user input 310 and a display 318. The processing system 300 may comprise one or more network/ apparatus interfaces 308 for connection to a network/ apparatus, e.g. a modem which maybe wired or wireless. The network/apparatus interface 308 may also operate as a connection to other apparatus such as device/ apparatus which is not network side apparatus. Thus, direct connection between devices/ apparatus without network participation is possible.
The processor 302 is connected to each of the other components in order to control operation thereof.
The memory 304 may comprise a non-volatile memory, such as a hard disk drive (HDD) or a solid state drive (SSD). The ROM 312 of the memory 304 stores, amongst other things, an operating system 315 and may store software applications 316. The RAM 314 of the memory 304 is used by the processor 302 for the temporary storage of data. The operating system 315 may contain code which, when executed by the processor implements aspects of the algorithms 50, 90, 130, 160, 220 and 250 described above. Note that in the case of small device/ apparatus the memory can be most suitable for small size usage i.e. not always a hard disk drive (HDD) or a solid state drive (SSD) is used.
The processor 302 may take any suitable form. For instance, it may be a microcontroller, a plurality of microcontrollers, a processor, or a plurality of processors.
The processing system 300 may be a standalone computer, a server, a console, or a network thereof. The processing system 300 and needed structural parts may be all inside device/ apparatus such as IoT device/ apparatus i.e. embedded to very small size.
In some example embodiments, the processing system 300 may also be associated with external software applications. These may be applications stored on a remote server device/ apparatus and may run partly or exclusively on the remote server device/ apparatus. These applications maybe termed cloud-hosted applications. The processing system 300 may be in communication with the remote server device/ apparatus in order to utilize the software application stored there.
FIGS. 28A and 28B show tangible media, respectively a removable memory unit 365 and a compact disc (CD) 368, storing computer-readable code which when run by a computer may perform methods according to example embodiments described above. The removable memory unit 365 may be a memory stick, e.g. a USB memory stick, having internal memory 366 storing the computer-readable code. The internal memory 366 may be accessed by a computer system via a connector 367. The CD 368 may be a CD-ROM or a DVD or similar. Other forms of tangible storage media may be used. Tangible media can be any device/ apparatus capable of storing data/ information which data/ information can be exchanged between devices/ apparatus/ network.
Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a "memory" or "computer-readable medium" may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
Reference to, where relevant, "computer-readable medium", "computer program product", "tangibly embodied computer program" etc., or a "processor" or "processing circuitry" etc. should be understood to encompass not only computers having differing architectures such as single/ multi-processor architectures and sequencers/ parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices/ apparatus and other devices/ apparatus. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device/ apparatus as instructions for a processor or configured or configuration settings for a fixed function device/ apparatus, gate array, programmable logic device/ apparatus, etc.
If desired, the different functions discussed herein may be performed in a different order and/ or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagrams of FIGS. 5, 9, 13, 16, 22 and 25 are examples only and that various operations depicted therein may be omitted, reordered and/or combined.
It will be appreciated that the above described example embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present specification.
Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/ or combination of such features.
Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described example embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.
It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims.

Claims

An apparatus comprising means for performing:
identifying at least one microphone of a multi-microphone audio capture device that is blocked; and

determining a position of at least one first visual indication of the blocked microphone(s) for output to a user on a display of the audio capture device based, at least in part, on a location of the respective microphone on the audio capture device, wherein the positions of the at least one first visual indication correspond more closely to the position of the respective blocked microphone than to the position of any other microphone of the multi-microphone audio capture device.
An apparatus as claimed in claim 1, further comprising means for performing:
determining whether any part of the display is blocked,

wherein the position of the at least one first visual indication of the blocked microphone(s) is based, at least in part, on a location of any part of the display that is blocked.
An apparatus as claimed in claim 2, wherein the means for performing determining the position of the at least one first visual indication of the blocked microphone(s) further comprises means for performing determining the position of the at least one first visual indication such that at least part of the or each first visual indication is on a part of the display not blocked.
An apparatus as claimed in claim 2 or claim 3, wherein the means for performing determining the position of the at least one first visual indication of the blocked microphone(s) further comprises determining the position of the first visual indication such that the or each first visual indication corresponds as closely as possible to the position of the blocked microphone.
An apparatus as claimed in any one of the preceding claims, further comprising means for performing:
determining an extent of the at least one first visual indication of the blocked microphone(s) based, at least in part, on a location of any part of the display that is blocked.
An apparatus as claimed in any one of the preceding claims, wherein a nature of the at least one first visual indication of the blocked microphone(s) is based, at least in part, on one or more of:
whether the respective blocked microphone is on a front face of the audio capture device;

whether the respective blocked microphone is on a rear face of the audio capture device;

whether the respective blocked microphone is neither on the front face nor on the rear face of the audio capture device;

whether the respective blocked microphone is on a face of the audio capture device visible to and/or being viewed by the user;

a distance between the first visual indication and the position of the respective blocked microphone; or

a microphone type of the respective blocked microphone.
An apparatus as claimed in any one of the preceding claims, further comprising means for performing:
determining a position of a user gaze on the display of the audio capture device.
An apparatus as claimed in claim 7, further comprising means for performing:
providing a further visual indication on the display directing the user gaze from the position of the user gaze to the position of the at least one first visual indication.
An apparatus as claimed in claim 8, further comprising means for performing:
determining whether to provide the further visual indication on the display based, at least in part, on one or more of:
a distance between the user gaze and the at least one first visual indication;

an impact of the respective blocked microphone to an audio output; or

whether the at least one first visual indication is provided on a part of the display that is blocked.
An apparatus as claimed in any one of the preceding claims, wherein the at least one first visual indication is based on an impact of the respective blocked microphone on possible audio formats.
An apparatus as claimed in any one of the preceding claims, further comprising means for performing:
setting an audio capture mode based, at least in part, on the at least one blocked microphone.
An apparatus as claimed in any one of the preceding claims, wherein the at least one first visual indication is provided on the display only in the event that a selected audio mode is not possible due to the respective blocked microphone.
An apparatus as claimed in any one of the preceding claims, further comprising means for performing:
providing a virtual indication to the user on an augmented reality display,
wherein the virtual indication directs a user gaze to the at least one first visual indication.
An apparatus as claimed in any one of the preceding claims, further comprising means for performing:
providing the at least one first visual indication on the display in the determined position(s).
A method comprising:
identifying at least one microphone of a multi-microphone audio capture device that is blocked; and

determining a position of at least one first visual indication of the at least one blocked microphones for output to a user on a display of the audio capture device based, at least in part, on a location of the respective microphone on the audio capture device, wherein the position of the at least one first visual indication corresponds more closely to the position of the respective blocked microphone than to the position of any other microphone of the multi-microphone audio capture device.