NL1038762C2 - Voice immersion smartphone application or headset for reduction of mobile annoyance. - Google Patents
Voice immersion smartphone application or headset for reduction of mobile annoyance. Download PDFInfo
- Publication number
- NL1038762C2 NL1038762C2 NL1038762A NL1038762A NL1038762C2 NL 1038762 C2 NL1038762 C2 NL 1038762C2 NL 1038762 A NL1038762 A NL 1038762A NL 1038762 A NL1038762 A NL 1038762A NL 1038762 C2 NL1038762 C2 NL 1038762C2
- Authority
- NL
- Netherlands
- Prior art keywords
- sound
- intensity
- foreground
- user
- background
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/58—Anti-side-tone circuits
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/60—Substation equipment, e.g. for use by subscribers including speech amplifiers
- H04M1/6025—Substation equipment, e.g. for use by subscribers including speech amplifiers implemented as integrated speech networks
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Telephone Function (AREA)
Description
5
Voice Immersion Smartphone Application or Headset for reduction of mobile annoyance
FIELD OF THE INVENTION
The invention is in the field for processing sound.
10 BACKGROUND OF THE INVENTION
Today's billions of mobile phones cause a lot of people talking too loud at congested places. The French scientist Étienne Lombard discovered in 1909 that people have an involuntary reflex to increase the intensity of 15 their voice when speaking in noisy environments. This so called Lombard reflex is studied to be too strong to be inhibited by providing instructions. Only feedback has shown results.
The invention relates to a method for providing a 20 dynamic feedback signal closely resembling the speaking behaviour of the user of e.g. a smartphone or a headset that is connected thereto in social sensitive environments, as well as to a device.
US 2009017670 (Al) discloses systems and methods 25 for altering a cellular phone user's speech so that the speech can be less bothersome to third parties in the surrounding area and so that the user has more privacy.
Sound cancellation can be used to cancel, reduce, or modify the user's voice so third parties cannot hear the voice as 30 easily or so that the user's voice cannot be understood. Furthermore, the user device can encourage the user to speak in a lower voice. The user device can accomplish this encouragement by indicating to the user their level of speech. In this manner, the user knows when he may lower 35 his voice and yet still provide an adeguate volume of speech for the cellular phone. Additionally, the user device can encourage the user to speak in a lower voice by audibly playing back the user's voice in real time.
1038762 - 2 - US 2004242160 (Al) discloses a mobile phone with a means of measuring a background sound level of an environment. When a user either initiates or receives a phone call, sound levels before and during the call are 5 compared. Once it is decided that the voice is too loud based on a predetermined criteria, the phone gives the user a feedback indicating that the voice is too loud and potentially a disruption to other people. Furthermore, the present invention provides a feedback for a voice 10 adjustment by utilizing a sidetone adaptive signal that filters the user own speech directly to an earpiece.
Various problems are associated with prior art solutions.
For instance, it is difficult to distinguish 15 whether the user is speaking if a second microphone is positioned at the same distance to the mouth as the first.
It is not always possible to provide an accurate and reliable feedback signal using only one microphone because of the mixing of foreground and background sound 20 input. Often such a sidetone signal may be provided when not needed, and vice versa be absent when needed.
It is not possible to be understood by a listener when speaking at low voice intensity.
As measurements are not very accurate, it is 25 difficult to adapt the above solutions to boundary conditions, such as an environment.
Background sound levels are not measured accurately. Therefore the application does not function properly.
30 Also foreground sound levels are not measured accurately. Therefore the application does not function properly.
The prior art solutions do not relate to real time solutions. Typically these solutions do correct for 35 variations in boundary conditions, such as variations in background noise levels. If corrections are provided, these are slow, and therefore annoying to a user.
- 3 -
Prior art solutions are further typically optimised in one aspect, for instance in optimizing block diagrams or flow charts illustrating the performance.
The present invention is aimed at overcoming one 5 or more of the above mentioned problems without jeopardizing advantage effects.
SUMMARY OF THE INVENTION
10 The present invention relates in a first aspect to a method for providing a real time feedback signal to a user providing foreground sound input comprising the steps of: providing at least two audio input means, such as 15 a microphone, the at least two input means being spaced apart at a mutual distance, such that a distance from a first audio input means to a source of foreground sound is substantially different from a distance from a second audio input means to the same source of foreground sound, 20 providing at least one means for providing an output signal, such as a speaker, and at least one processor, obtaining a total sound signal intensity comprising one or more of foreground sound input intensity 25 and background sound input intensity, obtaining the foreground sound input intensity and the background sound input intensity, optionally obtaining a running average of the foreground sound input intensity and/or of the background 30 sound input intensity, such as a running average over 1 msec 'millisecond' - 2 sec, comparing the intensity of the optionally averaged foreground sound input and the intensity of the optionally averaged background sound input, and 35 adding part of the foreground sound input intensity as a feedback signal to the output signal in case the foreground sound input intensity is larger than an - 4 - intensity comprising a predetermined threshold value and the intensity of the optionally averaged frozen background sound input.
The present invention solves one or more of the 5 above mentioned problems and provides excellent performance, as is highlighted and detailed below.
With the term "real time" it is meant that within processing time, typically within 0,1-10 msec, or in other words without a substantial delay, the method is carried 10 out. In an example the processing time is less than 2 msec.
The feedback signal is provided to a user. The user may be a person using a mobile telephone, a smartphone, etc. The feedback signal provides a user for instance with information on relative sound intensity of 15 his/her voice, relative to a background. The use may subsequently adapt his behaviour, e.g. by lowering his voice.
At least two audio input means are provided, such as a microphone, being adapted to determine sound 20 intensity. To quantify a sound intensity decibel (dB) is used, being a logarithmic unit that indicates the ratio of a sound pressure level quantity (usually power or intensity in Watt/m2) relative to a specified or implied reference level. A ratio in decibels is ten times the logarithm to 25 base 10 of the ratio of two power quantities. A decibel is one tenth of a bel, a seldom-used unit. The decibel is used for a wide variety of measurements in science and engineering, most prominently in acoustics, electronics, and control theory. In order to improve quality of the 30 present method and device in terms of accuracy, reliability, etc., preferably three or four microphones are provided, or even more. Therewith foreground sound and background sound can be determined more accurately.
The audio input means are spaced apart, such that 35 a distance from a first audio input means to a source of foreground sound such as the users voice is substantially different from a distance from a second audio input means - 5 - to that source. In an example the input means are spaced apart at a distance of at least a few cm, more preferably at least 5 cm, even more preferably at least 10 cm, in order to obtain superior results. As a consequence a 5 foreground sound intensity received by a first audio input means is substantially different from a foreground sound intensity received by a second audio input means. The difference in intensity is in an example at least 1 dB(A), preferably at least 2 dB(A), more preferably at least 3 10 dB(A), such as at least 10 dB(A).
In an example the speaker relates to an ear speaker, such as one being present in a mobile phone, smartphone, or headset. Clearly two or more speakers may be present, such as at least one speaker per ear.
15 It is noted that the output means may also relate to an optical means, providing an optical signal, such as text, light, etc. The output means may also provide a signal to any other sense, such as taste, smell or touch. As such one or more of a variety of feedback signals may be 20 provided.
The present processor is typically a microprocessor or the like, capable of processing analogue and/or digital data. In an example the processor forms part of a device, the device further comprising other elements 25 mentioned above in the present method.
The at least one audio input means will in use receive a sound signal. This sound signal is referred to as "total sound signal", which signal typically has an intensity. For an audio signal the intensity is typically 30 expressed in dB(A), being a logarithmic scale.
The sound signal typically comprises various elements, being categorised in foreground elements and background elements. In an example a user of a headset will provide a foreground sound signal when speaking into the at 35 least one audio input means, i.c. microphones. In an example a background sound signal will originate from other people speaking to each other, from information being made - 6 - public, e.g. by a speaker, from traffic, such as cars and trains, from the wind, etc. Sometimes a source of background sound signal may be very close to a source of foreground sound signal, such as when a user is speaking 5 into a headset, and a neighbouring person is talking.
It is noted that a foreground sound signal need not be present, such as when a user is quiet. Likewise, a background sound signal need not be present, or at least be below a certain threshold, such as in a very quiet 10 environment, e.g. in nature.
By determining the background sound signal intensity, specifically when the user is quiet, the total sound signal intensity can be split (or divided) into a background sound signal intensity and a foreground sound 15 signal intensity. In an example the foreground sound signal intensity is determined by subtracting the background sound signal intensity from the total sound signal intensity.
In an example a running average of the foreground sound input intensity and/or of the background sound input 20 intensity, such as a running average over 1 msec - 2 sec is obtained. Thereby peaks in intensity are smoothened out. Preferably a median value of the intensity is obtained, in order to correct for peaks possibly being present. It has been found experimentally that a running average over a 25 time frame of 5-250 msec is typically sufficient. In an example an average of 100 msec is used.
Once the foreground sound signal intensity and background sound signal intensity have been obtained, optionally averaged over time, the intensities can be 30 compared. The result of this comparison provides relative intensities, e.g. in terms of dB(A). An aim of the invention is to provide a feedback signal to a user, especially when the foreground sound signal intensity is relatively large compared to the background sound signal 35 intensity. Thereto a predetermined threshold is provided, which serves as a guide to determine if the foreground sound signal intensity is relatively large enough to - 7 - provide a feedback signal. If the foreground sound signal intensity is larger than the sum of the background sound signal intensity and the predetermined threshold, a proportional feedback signal is provided. The feedback 5 signal is in an example chosen to be an audio sidetone signal. In a further example an audio signal is provided by adding a foreground sound signal to the output means, at a certain intensity level. The intensity level may depend on various parameters, such as relative difference between 10 foreground and background intensities. At a large difference a stronger, more intense, signal may be provided, and vice versa. In an example the user hears his own voice, real time, i.c. less than a few msec later, at an intensity level as determined a short period earlier, 15 e.g. 100 msec earlier. Adaptively the intensity of the feedback signal may decrease as a user lowers his voice, and vice versa. In an example the intensity is proportional to the foreground sound level. The intensity is preferably not too low, as it will than possibly not be noticed. Also 20 the intensity is preferably not too high, as it will become annoying and the like. A certain level is loud enough. In an example the upper intensity of the feedback signal is limited to a level far below a level where ear damage could occur.
25 The method may be provided as an application, such as a downloadable application. The application may be provided with a switch, for activating or de-activating the application.
The application may be provided with means for 30 calibration, in order to adapt the application for a specific device being used, such as type of headset, type of phone, brand of phone, brand of headset, varying circumstances, such as voice input intensity, background sound intensity, etc.
35 In an example of the present method it further provides the first audio input means, the second audio input means and the means for providing the output signal - 8 - as part of one device, such as a smartphone, telephone, mobile telephone, headset, computer, providing at least the first audio input means with a directional sensitivity, such as with a polar pattern, directing the first audio 5 input means towards a mouth of the user, and aligning the directional sensitivity with a virtual axis that, when seen in top view parallel to a cranial axis of the user, is under an angle of 0-60 degrees with a forward-backward axis of the user.
10 For a user one device offers improved usability.
In an example the first audio input means is adapted to pick an audio signal selectively from a spatially limited region and/or direction, such as a direction wherein a sound source, such as a mouth of the 15 user, is located. In an example one of the input means is directed at a source of foreground sound input, such as a mouth of a user. In order to obtain superior results the input means is directed as indicated above. A further advantage hereof is that a second input means, not having 20 the directional sensitivity and/or input, receives a significant different input intensity. As a consequence hereof the reliability, accuracy, etc. of the device is improved significantly.
In a further example such a first audio input 25 means is located as mentioned above, in order to receive a sound signal in an optimal way. Thereby also background input is reduced, which improves a quality of a signal transferred to a receiver of spoken information of the user.
30 In an example the intensity of the background sound input is obtained when the running average foreground sound input intensity is less than the running average background sound intensity plus 3dB(A), corrected for the characteristics of the microphones.
35 In an example the intensity of the background sound input is frozen to the last but one logged value when the running average foreground sound input intensity is - 9 - higher than the running average background sound intensity plus 3dB(A), corrected for the characteristics of the microphones, e.g. during the speaking period of the user.
In an example of the present method it further 5 comprises the step of filtering the foreground sound input and/or the background sound input, such as by filtering out low frequency and high frequency, thereby obtaining a frequency window.
The filtering improves the quality and accuracy 10 of the method significantly. In an example it reduces noise and unwanted frequencies by some 10 dB(A).
In an example of the present method a low frequency threshold for the frequency window is 200 Hz, preferably 100 Hz, and wherein a high frequency threshold 15 for the frequency window is 1.000 Hz, preferably 2.000 Hz, wherein preferably the frequency window is a weighted window correcting the sound intensity for the sensitivity of the human ear by an A-weighting curve.
It has been found experimentally that a low 20 frequency threshold can be set at 200 Hz, preferably at 100 Hz. It has been found experimentally that a high frequency threshold can be set at 1.000 Hz, preferably at 2.000 Hz. Thereby optimal results are obtained, e.g. in terms of noise reduction, accuracy, reliability, reduction of 25 possibly annoying sounds, etc.
In an example of the present method the intensity of the foreground sound input is amplified or reduced in view of the intensity of the background sound input. As such a receiver of the foreground sound signal, such as a 30 listener on an other end of a line to the voice of the user, can better pick up the signal.
In an example of the present method a feedback signal, such as a sidetone, is added in the means for output proportional in intensity of a difference between 35 the intensity of the optionally averaged foreground sound input and the intensity of the optionally averaged background sound input plus the threshold, preferably with - 10 - a negligible latency, such as less than 20 msec, preferably less than 5 msec, such as less than 2 msec, such as about 0,1 msec. The latency is typically determined by electronic limitations, and is in an example virtually absent.
5 As such the user perceives that the feedback signal is provided real time, not able to detect a delay between his voice provided and the feedback signal. Only an intensity may be different, as is aimed at. Experimentally good results were obtained with a delay of less than 2 10 msec, such as 1 msec. In an optimal configuration the latency was less than 0,02 msec.
In an example of the present method a first input means obtains a first sound intensity, wherein a second input means obtains a second sound intensity, wherein an 15 optional difference in sound intensity is used to determine the foreground sound input intensity, and wherein the background sound input intensity is determined when the first and second sound intensities differ less than 3 dB(A), preferably are substantially the same.
20 In the example the first and second sound intensities are compared. If these are substantially the same, it is assumed no foreground sound intensity is present, as otherwise one of the input means, e.g.
microphones, would detect a larger intensity than another. 25 Thus the intensity obtained is regarded as the background sound input intensity. Typically a difference between input intensities will be negligible small, as a source of background sound input will be relatively far away, and the at least one input means, e.g. microphones, will detect a 30 similar input intensity, if not the same input intensity. If a difference is determined, the difference is attributed to a foreground sound input intensity, e.g. a user speaking into an input means. A previously, i.c. a few msec earlier, frozen determined background sound input level can now be 35 used to determine the difference between the current foreground sound input intensity, by subtracting this frozen background sound input intensity plus threshold.
- 11 -
In an example of the present method the feedback signal intensity comprises a delayed foreground sound input intensity, wherein the delay is preferably smaller than 500 msec, more preferably smaller than 250 msec, even more 5 preferably smaller than 100 msec, such as smaller than 50 msec.
As indicated above the intensities need to be determined, which relates to a process which inevitably involves some time, in the order of msecs. Therefore a 10 result of such a process is always somewhat later available, at a delay in time. Providing the intensity of the feedback signal therefore is delayed, whereas the content of the signal is provided without (substantial) delay, i.e. real time.
15 In an example of the present method the part of the foreground sound input intensity being added is proportional to the foreground sound input intensity, such as from l%-300% thereof, preferably from 5% - 200% thereof, more preferably from 30% - 125% thereof, such as from 50%-20 100% thereof. It has been found experimentally that the intensity is preferably not too small and not too large as indicated above. Good results were obtained at the above levels.
In an example the feedback signal intensity may 25 be a linear, exponential, logarithmical, step-function, etc. of the foreground sound input intensity. Even further the intensity may vary, such as increase or decrease, likewise in time, i.e. become larger or smaller. Even further, the function, variation, time interval, etc. may 30 be adjustable by a user. In a further example the feedback signal may be switched off or on, as desired by the user. Typically the maximum output is limited, e.g. in terms of maximum pressure of an output means, such as a speaker, and/or in view of possible ear damage. In other words, an 35 upper limit is provided.
In an example of the present method the predetermined threshold value is at least 3 dB(A), - 12 - preferably at least 5 dB(A), more preferably at least 10 dB(A), such as at least 20 dB(A). The threshold is preferably not set too low, as otherwise feedback is provided when a user speaks soft, though somewhat louder 5 than a background sound intensity. It is noted that a level of 3 dB(A) reflects a factor two louder noise. Guided by the sidetone keeping his voice intensity below twice the volume of the background noise, the users voice will immerse in the background and can hardly be heard by people 10 in the vicinity. In crowded places this will increase privacy and efficiency of the call and prevent irritations. At that point no adaptation of voice level seems needed.
In an example of the present method no feedback signal is added to the output signal when the absolute 15 value of the optional running average foreground sound signal intensity is below the sum of the threshold value and the absolute value of the optional running average of the frozen background sound signal intensity.
20 In an example the present method is aimed at providing a preset default static minimal background sound signal intensity, and preventing a feedback signal if the background sound signal intensity is below the preset default static minimal background sound signal intensity.
25 In an example the present method the dynamic foreground sound signal that is proportionally added to the output signal for the ear speaker can not be heard by the caller on the other end of the line.
In an example the present method when switched on 30 the amplification of the noise cancelled foreground sound microphone signal towards the person on the other end of the line is automatically increased with pre-programmed steps when the running average continuous background signal stays in categorized volume ranges below the normal preset 35 volume range of standard mobile communication.
In an example the present method if the microprocessor has issues meeting the latency demand - 13 - because of the required real-time processing of the signals, a possible embodiment of the feedback circuitry is an analogue mix circuit. Steered by the digital delayed Delta(x) value, the circuit injects the analogue foreground 5 sound signal from microphone 31 with enhancement factor Delta(x) to the Earplug signal. Optionally analogue noise cancelling can be implemented by subtracting the analogue background signal from microphone 32 with factor Delta(x) from the analogue voice signal corrected for the 10 characteristics of microphones 31 and 32.
In an example the present method software is implemented in a headset positioned at the users ear, and where microphones and the ear speaker from part of the same headset, wherein the headset is connected via Bluetooth to 15 the smartphone of which its microphones and ear speaker are switched off, where the distance between the voice microphone of the Headset and the ear speaker is at least 10 centimetre.
20 In a second aspect the present invention relates to a device comprising at least two audio input means, such as a microphone, the at least two input means being spaced apart at a mutual distance, such that a distance from a first 25 audio input means to a source of foreground sound is substantially different from a distance from a second audio input means to the same source of foreground sound, at least one means for providing an output signal, such as a speaker, and 30 at least one processor, the processor being adapted for processing sound input, providing an output signal and providing a feedback signal.
In an example according to the invention the device is selected from the group of smartphone, telephone, 35 mobile telephone, headset, computer and combinations thereof.
The various aspects and features described and - 14 - shown in the specification can be applied, individually, wherever possible. These individual aspects, in particular the aspects and features described in the attached dependent claims, can be made subject of divisional patent 5 applications.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be elucidated on the basis of an exemplary embodiment shown in the attached drawings, 10 wherein elements therein are of exemplary nature, in which: Figure 1 shows a smartphone containing the Application according to the invention;
Figure 2A shows a side view of the head of a user of a headset according to the invention; 15 Figure 2B shows the opposite side of the headset according to figure 2A, as seen from the head of the user;
Figure 3 is a schematic top view of the user of the headset according to figures 2A and 2B.
Figure 4 shows the human hearing sensitivity with 20 horizontal the frequency and vertical the intensity in dB.
Figure 5 shows the A, B and C-weighting curves.
DETAILED DESCRIPTION OF THE INVENTION 25 Only accurate feedback to the user with minimal latency will motivate him to inhibit the Lombard reflex. Guided by the sidetone keeping his voice intensity below twice the volume of the background noise, the user's voice will immerse in the background and can hardly be heard by 30 people in the vicinity. In crowded places this will increase privacy and efficiency of the call and prevent irritations .
Figure 1 shows a smartphone 30 on which the invention 40 is downloaded as Application. Alternatively, 35 the invention is implemented in the headset 10 as will be explained later.
The smartphone 30 typically comprises a housing - 15 - 34 that carries an interactive display 35, a foreground microphone 31 to pick up the users voice and a background noise microphone 32 for measuring the background sound signal, an ear speaker 33 to reproduce the sound of the 5 voice of the other end of the line into the users ear including the feedback sidetone. The smartphone 30 is controlled by an inner electronic circuit comprising a microprocessor that can process in real-time digital signals, a rechargeable power supply and a wireless 10 Bluetooth transceiver or optionally a connector for a cable that are connected to the microprocessor, and analogue-digital converters that are connected to the microprocessor to convert a digital output signal from the microprocessor into an analogue electronic signal to power the ear speaker 15 33 and to provide the microprocessor with a digital foreground sound signal from the foreground microphone 31 and ambient/ background sound microphone 32. The foreground sound signal is proportional to an average voice sound pressure at the voice sound microphone 31 for the moments 20 the user is speaking. The background sound signal is proportional to an average background sound pressure at the background sound microphone 32. To enhance the voice quality of the user to the other end of the line, a part of the background signal from microphone 32 is distracted from 25 the foreground sound microphone 31, so called noise cancelling. The above-mentioned corrections are adjusted for the respective microphone characteristics.
To activate the invention, the user presses the embodiment "App" 40 on the display. The embodiment can be 30 enriched by adding the input of user settings: e.g. to adjust the threshold value of the feedback as well as feedback volume characteristics e.g. linear, logarithmic, exponential, constant, height of step-in volume etc. By going back to the menu of the smartphone the App can be 35 started and stopped before, during and after the user activates the smartphone telephone application 36 to make a call.
- 16 -
As a first functionality when switched on, the invention takes the following steps to keep the feedback sidetone signal continuously closely resembling reality: a) Continuous comparison of Foreground and 5 Background sound intensities to determine whether the user is speaking or not b) Running average Background sound intensity and freeze c) Determining Delta of Foreground minus frozen 10 Background sound intensity value a) Continuous comparison of Foreground and Background sounds.
Microphone 31 continuously measures the foreground sound 15 intensity. Microphone 32 continuously measures the background sound intensity.
When the absolute value of the running average of the foreground intensity is more than 3dB(A) than the absolute value of the running average background intensity times m, 20 the invention decides that the user is speaking.
If this is not the case, the user is not speaking.
Correction factor m is such that when the foreground sound source, usually the voice of the user, is silent and a 25 remote sound source is loudly available, e.g. a machine at more than 4 meters of the invention, the intensity of the foreground sound microphone equals the intensity of the background sound microphone times m: FI = BI x m.
30 b) Running average Background sound intensity and freeze
When detected that the user is not speaking, the absolute value of the running average background intensity is being logged. As soon as is detected that the user is speaking, the last but one value of the background sound intensity is 35 frozen for that speaking slot.
- 17 - c) Determining Delta of Foreground minus Background Signal When detected that the user is speaking, at the start x of the speaking slot the factor Delta(x-1), being the difference between the absolute value of the running 5 average Foreground Signal Intensity FI(x) and the frozen Background Signal Intensity BI(x-l), is set to zero. The microprocessor needs time to acquire reliable background and foreground signal levels. To calculate the first accurate Delta, typically 1-10 seconds sample time is 10 required in which the value stays zero.
When speaking the foreground signal intensity is calculated, corrected for the final acoustic specifications of the different microphones 31 and 32, and compared to the frozen BI(X-l) an accurate Delta (x) is calculated after 15 every sample. To prevent Delta to become a negative value, Delta(x) is the maximum of the values zero and the difference of the absolute value of Foreground Signal Intensity(x) and the absolute value of Background Signal Intensity(x). In formula: Delta(x) = MAX (0, ABS(FI(x))- 20 ABS(BI(x))). To prevent irritating stochastic feedback to the user, also the value Delta is softened by running average: Delta(x) = (Delta(x-1) + c * Delta(x))/(1+c). The value of c will be larger that the value of the progressing averages of the background- and foreground sound signals 25 but must be optimised in the final design.
When Delta is more than a threshold value of 3 dB(A), a personalised part of the foreground sound signal is added to the output signal for the ear speaker 12, whereby the user starts hearing his own voice in the ear 30 speaker 12, mixed up with the sound of the caller on the other end of the line. It is crucial that any latency of the users own voice in his ear is kept minimal to prevent irritations. Typical maximum latency value is 2 msec.
The feedback of his own voice gets stronger when 35 his voice is increasingly stronger compared to the average dynamic frozen background sound. The caller on the other end of the line continues clearly hearing the user without - 18 - hearing the increasing feedback.
This feedback functionality motivates the user to stop speaking too loud with respect to people in his vicinity, in other words to inhibit his Lombard reflex.
5 The value of Delta(x-1) is set to zero after each ending of the users speaking slot. This means when the user remembers the inventions feedback during his last too loud sentence and starts speaking much softer next slot, he is immediately rewarded with the absence of the feedback in 10 his ear. In case he continuous his too loud conversation, within seconds the progressing average provides the proportional feedback in his ear.
As the invention contains a preset default threshold of background sound signal, the user remains able 15 to continue speaking softly but normally in case the background sound signal falls to silence, preventing the user to get uncontrollable feedback or is being forced to whisper.
20 As a second functionality, if the microprocessor of the invention has troubles meeting the latency demand because of the required processing of the signals, a possible embodiment of the feedback circuitry is an analogue mix circuit. Steered by the digital slightly 25 delayed Delta(x) value, the circuit injects the analogue foreground voice signal from microphone 31 with enhancement factor Delta(x) to the Earplug signal.
Optionally analogue noise cancelling can be reached by subtracting the analogue background signal from microphone 30 32 with factor Delta(x) from the analogue voice signal.
As a third functionality, when switched on the amplification of the noise cancelled voice microphone signal towards the person on the other end of the line is 35 automatically increased with pre-programmed steps when the running average continuous background signal stays in categorized volume ranges below the normal preset volume - 19 - range of standard mobile communication. This means that the softer speaking user in quiet places remains clearly audible on the other end of the line.
5 Above-mentioned functionalities one, two and three of the Application 20 can also be implemented in the headset accessory 10 that is connected with the smartphone 30 via Bluetooth. In this configuration the smartphone 30 or optionally a normal cellular phone works with the 10 headset with the functionality according to invention running on in the headset microprocessor. In this wirelessly interconnected mode the ear speaker 33 and the microphones 31 and 32 of the smartphone 30 are not in use during a phone call.
15 The microprocessor of the headset 10 is loaded with software to control the headset 10 according to the invention during a phone call. With the push button 8 on the headset 10, the user can switch on and off the "Voice Immersion" accurate behaviour feedback functionality 20 according to the invention.
Figure 2A shows the head 1 of a user of a headset 10 according to the invention. The headset 10 is shaped to fit around the ear 2. The user carries the headset 10 around only one of his ears 2. The headset 10 comprises an 25 elongated, curved housing 11 and an ear speaker 12 that is connected to the housing 11 via a first rod 9. The ear speaker 12 is partly inserted in the ear canal 3 of the user. The ear speaker 12 has a shotgun sound output directionality or polar pattern 13 that is aligned with the 30 ear canal axis A. From the ear speaker 12 a curved second rod 14 extends along the cheek towards the mouth 4 of the user.
The carrying rod 14 carries a voice sound microphone 17 at its free end and a background sound 35 microphone 15 situated between the voice sound microphone 17 and the ear speaker 12. The voice sound microphone 17 has a shotgun sensitivity directionality or polar pattern - 20 - 18 that directed towards the mouth 4 of the user to optimally pick up his voice. The polar pattern 18 is aligned with an axis D that, when seen in top view parallel to the cranial axis of the user, is under an angle E of 0-5 60 degrees with the forward-backward axis B of the user. In this top view the forward-backward axis is perpendicular to the ear canal axis A. The distance C between the centre of the ear speaker 12 and the centre of the voice sound microphone 17 is typically larger than 10 centimetre to 10 position the voice sound microphone 17 sufficiently close to the mouth 4 of the user to optimally pick up his voice. The background sound microphone 15 has an omni-directional sensitivity directionality or polar pattern 16 that faces away from the user to pick up the background noise. The 15 headset 10 is further provided with a push button 8 that can be reached behind the ear 2.
The housing 11 comprises an inner space 20 wherein an electronic circuit has been enclosed. The electronic circuit comprises a microprocessor, a 20 rechargeable power supply and a wireless Bluetooth transceiver that are connected to the microprocessor, an electronic connection between the push button 8 and the microprocessor, and analogue-digital converters that are connected to the microprocessor to convert a digital output 25 signal from the microprocessor into an analogue electronic signal to power the ear speaker 12 with the voice sound signal of the caller on the other end of the line and to provide the microprocessor with a digital voice sound signal from the voice sound microphone 17 and a digital 30 background sound signal from the background microphone 15. The voice sound signal is proportional to an average voice sound pressure at the voice sound microphone 17 when speaking. The background sound signal is proportional to an average background sound pressure at the background sound 35 microphone 15.
During a phone call with the attached headset, the headsets microprocessor is provided with a digital - 21 - sound signal from the foreground sound microphone 17 and a digital background sound signal from the microphone 15. Continuously repeated the microprocessor samples the foreground- and the background sound signals to detect 5 whether the user is speaking.
When detected that the user is not speaking, the absolute value of the running average background intensity is logged. As soon as is detected that the user is speaking, the last but one value of the background sound 10 intensity is frozen for that speaking slot.
When detected that the user is speaking, at the start x of the speaking slot the factor Delta (x-1), being the difference between the absolute value of the running average Foreground Signal Intensity FI(x) and the frozen 15 Background Signal Intensity BI(x-l), is set to zero. The microprocessor needs time to acquire reliable background and foreground signal levels. To calculate the first accurate Delta, typically 1-10 seconds sample time is required in which the value stays zero.
20 When speaking the foreground signal intensity is calculated, corrected for the final acoustic specifications of the different microphones 15 and 17, and compared to the frozen BI(X-l) an accurate Delta (x) is calculated after every sample. To prevent Delta to become a negative value, 25 Delta (x) is the maximum of the values zero and the difference of the absolute value of Foreground Signal Intensity(x) and the absolute value of Background Signal Intensity(x). In formula: Delta(x) = MAX (0, ABS(FI(x))- ABS(BI(x))). To prevent irritating stochastic feedback to 30 the user, also the value Delta is softened by running average: Delta(x) = (Delta(x-1) + c * Delta(x))/(1+c). The value of c will be larger that the value of the progressing averages of the background- and foreground sound signals but must be optimised in the final design.
35 When Delta is more than a threshold value of 3 dB(A), a personalised part of the foreground sound signal is added to the output signal for the ear speaker 12, - 22 - whereby the user starts hearing his own voice in the ear speaker 12, mixed up with the sound of the caller on the other end of the line. It is crucial that any latency of the users own voice in his ear is kept minimal to prevent 5 irritations. Typical maximum latency value is 2 msec.
The feedback of his own voice gets stronger when his voice is increasingly stronger compared to the average dynamic background sound. The caller on the other end of the line continues clearly hearing the user without hearing 10 the increasing feedback.
This feedback functionality motivates the user to stop speaking too loud with respect to people in his vicinity.
The value of Delta(x-1) is set to zero after each 15 ending of the users speaking slot. This means when the user remembers the inventions feedback during his last too loud sentence and starts speaking much softer next slot, he is immediately rewarded with the absence of the feedback in his ear. In case he continuous his too loud conversation, 20 within seconds the progressing average provides the proportional feedback in his ear.
As the invention contains a preset default threshold of background sound signal, the user remains able to continue speaking softly but normally in case the 25 background sound signal falls to silence, preventing the user to fall into whispering or uncontrollable feedback.
Advantages of the headset embodiment over the "Application" embodiment are the superior acoustical characteristics of microphone 17 and the hands-free 30 operation. Drawbacks are a potential higher price and user comfort.
Figure 4 shows the human hearing sensitivity and Figure 5 shows a graph of an A-weighting curve, with horizontal the frequency and vertical the intensity in dB.
35 It is to be understood that the above description is included to illustrate the operation of the preferred embodiments and is not meant to limit the scope of the - 23 - invention. From the above discussion, many variations will be apparent to one skilled in the art that would yet be encompassed by the spirit and scope of the present invention.
5 l i i j i 1038762
Claims (15)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NL1038762A NL1038762C2 (en) | 2011-04-19 | 2011-04-19 | Voice immersion smartphone application or headset for reduction of mobile annoyance. |
PCT/NL2012/000026 WO2012144887A1 (en) | 2011-04-19 | 2012-04-13 | Voice immersion smartphone application or headset for reduction of mobile annoyance |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NL1038762A NL1038762C2 (en) | 2011-04-19 | 2011-04-19 | Voice immersion smartphone application or headset for reduction of mobile annoyance. |
NL1038762 | 2011-04-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
NL1038762C2 true NL1038762C2 (en) | 2012-10-22 |
Family
ID=46319883
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
NL1038762A NL1038762C2 (en) | 2011-04-19 | 2011-04-19 | Voice immersion smartphone application or headset for reduction of mobile annoyance. |
Country Status (2)
Country | Link |
---|---|
NL (1) | NL1038762C2 (en) |
WO (1) | WO2012144887A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201506206D0 (en) * | 2015-04-13 | 2015-05-27 | Soundchip Sa | Audio comunnication apparatus |
US11804113B1 (en) | 2020-08-30 | 2023-10-31 | Apple Inc. | Visual indication of audibility |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070021958A1 (en) * | 2005-07-22 | 2007-01-25 | Erik Visser | Robust separation of speech signals in a noisy environment |
WO2007017810A2 (en) * | 2005-08-11 | 2007-02-15 | Koninklijke Philips Electronics N.V. | A headset, a communication device, a communication system, and a method of operating a headset |
WO2010009345A1 (en) * | 2008-07-16 | 2010-01-21 | Qualcomm Incorporated | Method and apparatus for providing audible, visual or tactile sidetone feedback notification to a user of a communication device with multiple microphones |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7142894B2 (en) | 2003-05-30 | 2006-11-28 | Nokia Corporation | Mobile phone for voice adaptation in socially sensitive environment |
KR100773542B1 (en) | 2005-07-19 | 2007-11-07 | 삼성전자주식회사 | A microfluidic device for electrochemically regulating the pH of a fluid therein and method for regulating the pH of a fluid in a microfuidic device using the same |
JP5380800B2 (en) | 2007-07-12 | 2014-01-08 | ヤマハ株式会社 | Manufacturing method of electronic parts |
-
2011
- 2011-04-19 NL NL1038762A patent/NL1038762C2/en not_active IP Right Cessation
-
2012
- 2012-04-13 WO PCT/NL2012/000026 patent/WO2012144887A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070021958A1 (en) * | 2005-07-22 | 2007-01-25 | Erik Visser | Robust separation of speech signals in a noisy environment |
WO2007017810A2 (en) * | 2005-08-11 | 2007-02-15 | Koninklijke Philips Electronics N.V. | A headset, a communication device, a communication system, and a method of operating a headset |
WO2010009345A1 (en) * | 2008-07-16 | 2010-01-21 | Qualcomm Incorporated | Method and apparatus for providing audible, visual or tactile sidetone feedback notification to a user of a communication device with multiple microphones |
Also Published As
Publication number | Publication date |
---|---|
WO2012144887A1 (en) | 2012-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8315400B2 (en) | Method and device for acoustic management control of multiple microphones | |
US8081780B2 (en) | Method and device for acoustic management control of multiple microphones | |
ES2286017T3 (en) | PROCEDURE AND APPLIANCE TO ADJUST THE SPEAKER AND MICROPHONE PROFITS AUTOMATICALLY IN A MOBILE PHONE. | |
US8744091B2 (en) | Intelligibility control using ambient noise detection | |
JP6374529B2 (en) | Coordinated audio processing between headset and sound source | |
CN106463107A (en) | Collaboratively processing audio between headset and source | |
CN101552823B (en) | Volume management system and method | |
EP3777114B1 (en) | Dynamically adjustable sidetone generation | |
EP3038255A2 (en) | An intelligent volume control interface | |
US8923522B2 (en) | Noise level estimator | |
US20210219051A1 (en) | Method and device for in ear canal echo suppression | |
US20120076321A1 (en) | Single Microphone for Noise Rejection and Noise Measurement | |
NL1038762C2 (en) | Voice immersion smartphone application or headset for reduction of mobile annoyance. | |
US20230328461A1 (en) | Hearing aid comprising an adaptive notification unit | |
US11856375B2 (en) | Method and device for in-ear echo suppression | |
US20120076320A1 (en) | Fine/Coarse Gain Adjustment | |
JP2643877B2 (en) | Telephone | |
US20220139414A1 (en) | Communication device and sidetone volume adjusting method thereof | |
TWI425818B (en) | Volume management system and method | |
CN114446315A (en) | Communication device and method for adjusting output side tone | |
JPH1023114A (en) | Telephone set | |
JPH05110637A (en) | Telephone set | |
JPH0818647A (en) | Telephone set | |
GB2538165A (en) | Audio communication apparatus | |
JPH05235789A (en) | Voice communication terminal equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM | Lapsed because of non-payment of the annual fee |
Effective date: 20150501 |