WO2014010290A1 - 情報処理システムおよび記憶媒体 - Google Patents
情報処理システムおよび記憶媒体 Download PDFInfo
- Publication number
- WO2014010290A1 WO2014010290A1 PCT/JP2013/061647 JP2013061647W WO2014010290A1 WO 2014010290 A1 WO2014010290 A1 WO 2014010290A1 JP 2013061647 W JP2013061647 W JP 2013061647W WO 2014010290 A1 WO2014010290 A1 WO 2014010290A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- unit
- specific user
- signal processing
- predetermined target
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 25
- 238000012545 processing Methods 0.000 claims abstract description 220
- 230000005236 sound signal Effects 0.000 claims description 46
- 238000003384 imaging method Methods 0.000 claims description 8
- 238000004148 unit process Methods 0.000 claims description 2
- 238000000034 method Methods 0.000 description 40
- 230000006854 communication Effects 0.000 description 34
- 238000004891 communication Methods 0.000 description 34
- 238000007726 management method Methods 0.000 description 33
- 230000008569 process Effects 0.000 description 28
- 238000010586 diagram Methods 0.000 description 9
- 230000004044 response Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 238000013480 data collection Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000004807 localization Effects 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 230000001629 suppression Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000003825 pressing Methods 0.000 description 3
- 239000013589 supplement Substances 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 210000005069 ears Anatomy 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000008929 regeneration Effects 0.000 description 2
- 238000011069 regeneration method Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 230000007175 bidirectional communication Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007654 immersion Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H04R1/403—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers loud-speakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/12—Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H04R1/406—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/40—Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
- H04R2201/405—Non-uniform arrays of transducers or a plurality of uniform arrays with different transducer spacing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2420/00—Details of connection covered by H04R, not provided for in its groups
- H04R2420/07—Applications of wireless loudspeakers or wireless microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/20—Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
- H04R2430/23—Direction finding using a sum-delay beam-former
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/027—Spatial or constructional arrangements of microphones, e.g. in dummy heads
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/13—Application of wave-field synthesis in stereophonic audio systems
Definitions
- This disclosure relates to an information processing system and a storage medium.
- Patent Document 1 a technique related to an M2M (Machine-to-Machine) solution is proposed.
- the remote management system described in Patent Document 1 uses the Internet Protocol (IP) Multimedia Subsystem (IMS) platform (IS) to publish presence information by a device and between a user and a device.
- IP Internet Protocol
- IMS Internet Multimedia Subsystem
- UC authorized user client
- DC machine client
- Patent Document 2 describes an array speaker in which a plurality of speakers are attached to one cabinet with a common wavefront, and the delay amount and level of sound emitted from each speaker are controlled.
- Patent Document 2 below describes that an array microphone based on the same principle has been developed, and the array microphone can adjust its sound collection level by adjusting the level and delay amount of the output signal of each microphone. The points can be set arbitrarily, which enables efficient sound collection.
- Patent Documents 1 and 2 there is no mention of a technique and a communication method that are regarded as means for realizing a user's body expansion by arranging a large amount of image sensors, microphones, speakers, and the like in a wide range.
- the present disclosure proposes a new and improved information processing system and storage medium capable of interlinking the space around the user with other spaces.
- a recognition unit that recognizes a predetermined target and the predetermined target recognized by the recognition unit are identified based on signals detected by a plurality of sensors arranged around a specific user.
- an estimation unit that estimates the position of the specific user according to a signal detected by any of the plurality of sensors, and a plurality of actuators arranged around the specific user
- a signal processing unit that processes a signal acquired from a sensor around the predetermined target identified by the identification unit so as to be localized near the position of the specific user estimated by the estimation unit Propose a system.
- a recognition unit that recognizes a predetermined target based on a signal detected by sensors around a specific user, an identification unit that identifies the predetermined target recognized by the recognition unit, A signal processing unit that generates a signal output from an actuator around the specific user based on signals acquired from a plurality of sensors arranged around the predetermined target identified by the identification unit.
- the computer recognizes a predetermined target based on signals detected by a plurality of sensors arranged around a specific user, and the predetermined target recognized by the recognition unit. Output from an identification unit for identifying the position, an estimation unit for estimating the position of the specific user according to a signal detected by any of the plurality of sensors, and a plurality of actuators arranged around the specific user. A signal processing unit that processes a signal acquired from a sensor in the vicinity of the predetermined target identified by the identification unit so as to be localized near the position of the specific user estimated by the estimation unit. A storage medium storing a program for functioning is proposed.
- the computer recognizes a predetermined target based on a signal detected by sensors around a specific user, and an identification unit that identifies the predetermined target recognized by the recognition unit And a signal processing unit that generates a signal to be output from an actuator around the specific user, based on signals acquired from a plurality of sensors arranged around the predetermined target identified by the identification unit,
- a storage medium storing a program for functioning is proposed.
- the space around the user can be linked with other spaces.
- FIG. 1 is a diagram for describing an overview of an acoustic system according to an embodiment of the present disclosure.
- a large number of microphones 10, image sensors (not shown), speakers 20, and the like can be found all over the world such as rooms, houses, buildings, outdoors, regions, and countries. Assume a situation in which various sensors and actuators are arranged.
- a plurality of microphones (hereinafter referred to as microphones) 10 ⁇ / b> A and a plurality of actuators are provided as an example of a plurality of sensors on a road or the like of an outdoor area “site A” where the user A is currently located.
- a plurality of speakers 20A are arranged.
- a plurality of microphones 10B and a plurality of speakers 20B are arranged on the wall, floor, ceiling, and the like.
- the sites A and B may be further provided with a human sensor or an image sensor (not shown) as an example of the sensor.
- the site A and the site B can be connected via a network, and signals inputted / outputted by the microphones and speakers of the site A and signals inputted / outputted by the microphones and speakers of the site B are mutually connected. Sent and received.
- the sound system according to the present embodiment reproduces sound and images corresponding to a predetermined target (person, place, building, etc.) in real time on a plurality of speakers and displays arranged around the user.
- the sound system according to the present embodiment can collect the user's voice by a plurality of microphones arranged around the user and reproduce it in real time around the predetermined target.
- the space around the user can be interlinked with other spaces.
- the user since microphones, image sensors, and the like are arranged everywhere, the user does not need to own a smartphone or a mobile phone terminal. It can be connected to the space around the object.
- the application of the acoustic system according to the present embodiment when the user A at the site A wants to talk to the user B at the site B will be briefly described.
- Data collection process At site A, data collection processing is continuously performed by a plurality of microphones 10A, image sensors (not shown), human sensors (not shown), and the like. Specifically, the acoustic system according to the present embodiment collects sound collected by a plurality of microphones 10A, a captured image captured by an image sensor, or a detection result of a human sensor, and thereby estimates a user's position. .
- the acoustic system according to the present embodiment is a microphone group in which a user's voice is sufficiently collected based on position information of a plurality of microphones 10A registered in advance and the estimated position of the user. May be elected.
- the acoustic system according to the present embodiment performs microphone array processing on a stream group of audio signals collected by each selected microphone.
- the acoustic system according to the present embodiment may perform a delay-and-sum array in which the sound collection point is aligned with the mouth of the user A, thereby forming super directivity of the array microphone. Therefore, a voice that is as small as the tweet of user A can be collected.
- the acoustic system according to the present embodiment recognizes a command based on the collected voice of the user A, and executes an operation process according to the command. For example, when the user A in the site A murmurs “I want to talk to Mr. B”, “call request to the user B” is recognized as a command. In this case, the acoustic system according to the present embodiment identifies the current position of the user B, and connects the site B where the user B is present and the site A where the user A is present. As a result, the user A can make a call with the user B.
- Object decomposition processing During a call, sound source separation is performed on the audio signals (stream data) collected by a plurality of microphones at site A (noise components around user A, conversations of people around user A, etc. are separated) Object decomposition processing such as reverberation suppression and noise / echo processing is performed. As a result, stream data with a good S / N ratio and suppressed reverberation is sent to the site B.
- the acoustic system according to the present embodiment can cope with the above-mentioned data collection continuously. Specifically, the acoustic system according to the present embodiment continuously collects data based on a plurality of microphones, image sensors, human sensors, and the like, and grasps the movement path and direction of user A. The acoustic system according to the present embodiment continuously updates selection of appropriate microphone groups arranged around the moving user A, and always collects sound points at the mouth of the moving user A. Array microphone processing is performed continuously so that Thereby, the acoustic system by this embodiment can respond also when the user A talks while moving.
- the moving direction and direction of the user A are converted into metadata and sent to the site B together with the stream data.
- the stream data sent to the site B is reproduced from speakers arranged around the user B in the site B.
- the acoustic system collects data by using a plurality of microphones, image sensors, and human sensors at the site B, estimates the position of the user B based on the collected data, and further Select an appropriate group of speakers that surround the surroundings with a closed acoustic surface.
- the stream data sent to the site B is reproduced from the speaker group selected in this way, and the area inside the acoustic closed surface is controlled as an appropriate sound field.
- a surface formed when a plurality of adjacent speakers or a plurality of microphones are connected in a form surrounding a certain object is conceptually referred to as an “acoustic closed surface”. ".
- the “acoustic closed curved surface” does not necessarily constitute a complete closed curved surface, but may be any shape as long as it substantially surrounds an object (for example, a user).
- the sound field here may be arbitrarily selected by the user B himself / herself.
- the environment of the site A is reproduced at the site B.
- the environment of the site A is reproduced at the site B based on sound information as ambient that is collected in real time and meta information related to the site A acquired in advance.
- the sound system according to the present embodiment can control the sound image of the user A by using the plurality of speakers 20B arranged around the user B at the site B. That is, the acoustic system according to the present embodiment can reproduce the voice (sound image) of the user A at the ear of the user B or outside the acoustic closed curved surface by forming an array speaker (beam forming). In addition, the acoustic system according to the present embodiment moves the sound image of the user A around the user B in accordance with the actual movement of the user A at the site B by using the metadata of the movement path and direction of the user A. May be.
- the outline of the voice communication from the site A to the site B has been described by dividing the data collection process, the object decomposition process, and the object composition process, but naturally the same applies to the voice communication from the site B to the site A. Processing is performed. As a result, two-way voice communication is possible between the site A and the site B.
- FIG. 2 is a diagram illustrating the overall configuration of the acoustic system according to the present embodiment. As shown in FIG. 2, the acoustic system includes a signal processing device 1 ⁇ / b> A, a signal processing device 1 ⁇ / b> B, and a management server 3.
- the signal processing device 1 ⁇ / b> A and the signal processing device 1 ⁇ / b> B are connected to the network 5 by wire / wireless, and can transmit / receive data to / from each other via the network 5. Further, the management server 3 is connected to the network 5, and the signal processing device 1 ⁇ / b> A and the signal processing device 1 ⁇ / b> B can transmit and receive data to and from the management server 3.
- the signal processing apparatus 1A processes signals input / output by the plurality of microphones 10A and the plurality of speakers 20A arranged at the site A. Further, the signal processing device 1B processes signals input / output by the plurality of microphones 10B and the plurality of speakers 20B arranged at the site B. In addition, when it is not necessary to distinguish and explain the signal processing apparatuses 1A and 1B, they are referred to as the signal processing apparatus 1.
- the management server 3 has a function of managing user authentication processing and the absolute position (current position) of the user. Furthermore, the management server 3 may manage information (IP address or the like) indicating the location or the position of the building.
- the signal processing apparatus 1 can inquire and obtain the connection destination information (IP address, etc.) of the predetermined target (person, place, building, etc.) designated by the user from the management server 3.
- IP address IP address, etc.
- FIG. 3 is a block diagram showing the configuration of the signal processing apparatus 1 according to the present embodiment.
- the signal processing apparatus 1 according to the present embodiment includes a plurality of microphones 10 (array microphones), an amplifier / ADC (analog / digital converter) unit 11, a signal processing unit 13, and a microphone position information DB (database) 15.
- a user position estimation unit 16 a recognition unit 17, an identification unit 18, a communication I / F (interface) 19, a speaker position information DB 21, a DAC (digital analog converter) / amplifier unit 23, and a plurality of speakers 20 (array speakers).
- a user position estimation unit 16 a recognition unit 17, an identification unit 18, a communication I / F (interface) 19, a speaker position information DB 21, a DAC (digital analog converter) / amplifier unit 23, and a plurality of speakers 20 (array speakers).
- the plurality of microphones 10 are arranged throughout an area (site). For example, if it is outdoors, it is arranged on a road, a power pole, a streetlight, the outer wall of a house or a building, and if it is indoors, it is placed on a floor, a wall, a ceiling, or the like.
- the plurality of microphones 10 collect ambient sounds and output the collected sounds to the amplifier / ADC unit 11.
- the amplifier / ADC unit 11 has a function of amplifying sound waves output from the plurality of microphones 10 and a function of converting sound waves (analog data) into audio signals (digital data) (Analog-to-Digital Converter). Have. The amplifier / ADC unit 11 outputs the converted audio signals to the signal processing unit 13.
- the signal processing unit 13 has a function of processing each audio signal collected by the microphone 10 and sent via the amplifier / ADC unit 11 and each audio signal reproduced from the speaker 20 via the DAC / amplifier unit 23. Have. Further, the signal processing unit 13 according to the present embodiment functions as a microphone array processing unit 131, a high S / N processing unit 133, and a sound field reproduction signal processing unit 135.
- the microphone array processing unit 131 focuses on the user's voice as microphone array processing for a plurality of audio signals output from the amplifier / ADC unit 11 (so that the sound collection position becomes the user's mouth). Perform directivity control.
- the microphone array processing unit 131 is optimal for the user's voice collection based on the position of the user estimated by the user position estimation unit 16 and the position of each microphone 10 registered in the microphone position information DB 15. Alternatively, a group of microphones forming an acoustic closed curved surface that includes the user may be selected. And the microphone array process part 131 performs directivity control with respect to the audio signal acquired by the selected microphone group. The microphone array processing unit 131 may form superdirectivity of the array microphone by delay sum array processing or null generation processing.
- High S / N processing section 133 is a monaural signal that has high clarity and a high S / N ratio with respect to a plurality of audio signals output from the amplifier / ADC section 11. Has the function of processing. Specifically, the high S / N processing unit 133 separates a sound source and performs reverberation / noise suppression.
- the high S / N conversion processing unit 133 may be provided in the subsequent stage of the microphone array processing unit 131. Also, the audio signal (stream data) processed by the high S / N processing unit 133 is used for speech recognition by the recognition unit 17 or transmitted to the outside via the communication unit I / F 19.
- the sound field reproduction signal processing unit 135 performs signal processing on audio signals reproduced from the plurality of speakers 20, and controls the sound field to be localized near the position of the user. Specifically, for example, the sound field reproduction signal processing unit 135 includes a user based on the position of the user estimated by the user position estimation unit 16 and the position of each speaker 20 registered in the speaker position information DB 21. Select the optimal speaker group that forms the acoustic closed surface. Then, the sound field reproduction signal processing unit 135 writes the signal-processed audio signal to the output buffers of a plurality of channels corresponding to the selected speaker group.
- the sound field reproduction signal processing unit 135 controls the area inside the acoustic closed curved surface as an appropriate sound field.
- the sound field control method is known, for example, as Kirchhoff-Helmholtz integration rule or Rayleigh integration rule, and a wave field synthesis method (WFS: Wave Field Synthesis) to which this is applied is generally known. Further, the sound field reproduction signal processing unit 135 may apply the signal processing techniques described in Japanese Patent Nos. 4673505 and 4735108.
- the shape of the closed acoustic surface formed by the microphone or the speaker is not particularly limited as long as it is a three-dimensional shape surrounding the user.
- an elliptical closed acoustic surface 40-1 as shown in FIG. Alternatively, it may be a cylindrical acoustic closed surface 40-2 or a polygonal acoustic closed surface 40-3.
- the shape of the acoustic closed curved surface by the plurality of speakers 20B-1 to 20B-12 arranged around the user B at the site B is shown as an example. The same applies to.
- the microphone position information DB 15 is a storage unit that stores position information of a plurality of microphones 10 arranged on the site. The position information of the plurality of microphones 10 may be registered in advance.
- the user position estimation unit 16 has a function of estimating the position of the user. Specifically, the user position estimating unit 16 uses a plurality of microphones based on an analysis result of sound collected from the plurality of microphones 10, an analysis result of a captured image captured by the image sensor, or a detection result by the human sensor. The relative position of the user with respect to ten or a plurality of speakers 20 is estimated. Further, the user position estimation unit 16 may acquire GPS (Global Positioning System) information and estimate the absolute position (current position information) of the user.
- GPS Global Positioning System
- the recognition unit 17 analyzes the user's voice based on the audio signal collected by the plurality of microphones 10 and processed by the signal processing unit 13 to recognize the command. For example, the recognition unit 17 performs morphological analysis on the user's voice “I want to talk with Mr. B”, and recognizes the call request command based on the predetermined target “B” and the request “speak” specified by the user.
- the identification unit 18 has a function of identifying a predetermined object recognized by the recognition unit 17. Specifically, for example, the identification unit 18 may determine connection destination information for acquiring sound or an image corresponding to a predetermined target. For example, the identification unit 18 may transmit information indicating a predetermined target from the communication unit I / F 19 to the management server 3 and acquire connection destination information (IP address or the like) corresponding to the predetermined target from the management server 3. .
- the communication I / F 19 is a communication module for transmitting and receiving data to and from other signal processing devices and the management server 3 through the network 5.
- the communication I / F 19 makes an inquiry about connection destination information corresponding to a predetermined target to the management server 3 or collects sound with the microphone 10 in another signal processing apparatus that is a connection destination.
- the audio signal processed by the signal processing unit 13 is transmitted.
- the speaker position information DB 21 is a storage unit that stores position information of a plurality of speakers 20 arranged on the site. The position information of the plurality of speakers 20 may be registered in advance.
- the DAC / amplifier unit 23 functions to convert an audio signal (digital data) written in an output buffer of each channel for reproduction from a plurality of speakers 20 into sound waves (analog data) (Digital to Analog Converter). Have Further, the DAC / amplifier unit 23 has a function of amplifying sound waves reproduced from the plurality of speakers 20.
- the DAC / amplifier unit 23 performs DA conversion and amplification processing on the audio signal processed by the sound field reproduction signal processing unit 135 and outputs the result to the speaker 20.
- the plurality of speakers 20 are arranged throughout an area (site). For example, if it is outdoors, it is arranged on a road, a power pole, a streetlight, the outer wall of a house or a building, and if it is indoors, it is placed on a floor, a wall, a ceiling, or the like.
- the plurality of speakers 20 reproduce sound waves (sound) output from the DAC / amplifier unit 23.
- FIG. 5 is a block diagram showing the configuration of the management server 3 according to the present embodiment.
- the management server 3 includes a management unit 32, a search unit 33, a user position information DB 35, and a communication I / F 39. Each configuration will be described below.
- the management unit 32 Based on the user ID transmitted from the signal processing device 1, the management unit 32 manages information regarding the location (site) where the user is currently located. For example, the management unit 32 identifies a user based on the user ID, and associates the identified user's name and the like with the IP address of the signal processing apparatus 1 that is the transmission source as connection destination information and stores the information in the user position information DB 35. .
- the user ID may include a name, a password, biometric information, and the like.
- the management unit 32 may perform user authentication processing based on the transmitted user ID.
- the user position information DB 35 is a storage unit that stores information related to the location where the user is currently located in accordance with management by the management unit 32. Specifically, the user location information DB 35 stores the user ID and connection destination information (such as the IP address of the signal processing device corresponding to the site where the user is located) in association with each other. Further, the current position information of each user may be updated every moment.
- the search unit 33 In response to a connection destination (call destination) inquiry from the signal processing device 1, the search unit 33 refers to the user location information DB 35 and searches for connection destination information. Specifically, the search unit 33 searches and extracts the associated connection destination information from the user position information DB 35 based on the name of the target user included in the connection destination inquiry.
- the communication I / F 39 is a communication module for transmitting and receiving data to and from the signal processing device 1 through the network 5.
- the communication I / F 39 receives a user ID from the signal processing apparatus 1 or receives a connection destination inquiry. Further, the communication I / F 39 transmits the connection destination information of the target user in response to the connection destination inquiry.
- FIG. 6 is a flowchart showing basic processing of the sound system according to the present embodiment.
- the signal processing apparatus 1 ⁇ / b> A transmits the ID of the user A in the site A to the management server 3.
- the signal processing apparatus 1 ⁇ / b> A may acquire the ID of the user A from a tag such as RFID (Radio Frequency IDentification) owned by the user A, or may recognize the user A from the voice of the user A.
- the signal processing device 1A may read biometric information from the body (face, eyes, hands, etc.) of the user A and obtain it as an ID.
- step S106 the signal processing apparatus 1B also transmits the ID of the user B in the site B to the management server 3 in the same manner.
- step S109 the management server 3 identifies the user based on the user ID transmitted from each signal processing device 1, and the IP address of the transmission source signal processing device 1 or the like in the name of the identified user. Are registered in association with each other as connection destination information.
- the signal processing device 1B estimates the position of the user B in the site B. Specifically, the signal processing device 1B estimates the relative position of the user B with respect to a plurality of microphones arranged at the site B.
- step S115 the signal processing apparatus 1B collects sound at the mouth of the user B with respect to the audio signals collected by the plurality of microphones arranged at the site B based on the estimated relative position of the user B. Microphone array processing is performed so that the position is focused. As described above, the signal processing device 1B is provided when the user B makes a statement.
- step S118 the signal processing apparatus 1A similarly performs microphone array processing on the audio signals collected by the plurality of microphones arranged at the site A so that the sound collection position is focused on the mouth of the user A.
- the signal processing device 1A recognizes the command based on the voice (utterance) of the user A.
- the command recognition process according to the present embodiment will be described in [3-2. The command recognition process] will be described in detail.
- step S121 the signal processing apparatus 1A makes a connection destination inquiry to the management server 3.
- the signal processing apparatus 1A inquires about the connection destination information of the user B.
- step S125 the management server 3 searches for the connection destination information of the user B in response to the connection destination inquiry from the signal processing device 1A, and transmits the search result to the signal processing device 1A in the subsequent step S126.
- step S127 the signal processing device 1A identifies (determines) the connection destination based on the connection destination information of the user B received from the management server 3.
- step S128, the signal processing device 1A determines the connection destination information of the identified user B, for example, the signal processing device 1B based on the IP address of the signal processing device 1B corresponding to the site B where the user B is currently located. Perform call processing.
- the signal processing apparatus 1B outputs a message asking the user B whether or not to respond to the call from the user A (call notification).
- the signal processing device 1 ⁇ / b> B may reproduce the message from a speaker arranged around the user B. Further, the signal processing device 1B recognizes the answer of the user B to the call notification based on the voice of the user B collected from a plurality of microphones arranged around the user B.
- step S134 the signal processing device 1B transmits the answer of the user B to the signal processing device 1A.
- the user B makes an OK response, and bidirectional communication between the user A (signal processing device 1A side) and the user B (signal processing device 1B side) is started.
- step S137 the signal processing device 1A collects the voice of the user A at the site A and starts the audio stream (audio signal) at the site B (signal) in order to start communication with the signal processing device 1B. Sound collection processing to be transmitted to the processing apparatus 1B side) is performed. Note that the sound collection processing according to the present embodiment will be described in [3-3. The sound collection process] will be described in detail.
- step S140 the signal processing device 1B forms an acoustic closed surface including the user B by a plurality of speakers arranged around the user B, and generates sound based on the audio stream transmitted from the signal processing device 1A. Perform field regeneration processing.
- the sound field reproduction process according to the present embodiment will be described later in [3-4. The sound field reproduction process] will be described in detail.
- steps S137 to S140 one-way communication is shown as an example. However, since the present embodiment allows two-way communication, the signal processing apparatus 1B collects sound, contrary to steps S137 to S140. The sound field reproduction process may be performed by the processing and signal processing apparatus 1A.
- step S118 The basic processing of the acoustic system according to this embodiment has been described above.
- the user A does not need to have a mobile phone terminal, a smartphone, or the like, just tweet “I want to talk to Mr. B” and use a plurality of microphones and a plurality of speakers arranged in the vicinity. It is possible to make a call with the user B who is in the office.
- the command recognition process shown in step S118 will be described in detail with reference to FIG.
- FIG. 7 is a flowchart showing command recognition processing according to this embodiment.
- the user position estimating unit 16 of the signal processing apparatus 1 estimates the position of the user.
- the user position estimation unit 16 is based on the sound collected from the plurality of microphones 10, the captured image captured by the image sensor, the arrangement of each microphone stored in the microphone position information DB 15, and the like. Position, orientation, and mouth position may be estimated.
- step S206 the signal processing unit 13 selects a group of microphones that form an acoustic closed surface containing the user according to the estimated relative position and orientation of the user and the position of the mouth.
- step S209 the microphone array processing unit 131 of the signal processing unit 13 performs microphone array processing on the audio signal collected from the selected microphone group, and sets the microphone directivity to focus on the user's mouth. Control. Thereby, the signal processing apparatus 1 can be prepared when the user makes some remarks.
- step S212 the high S / N processing unit 133 performs processing such as reverberation / noise suppression on the audio signal processed by the microphone array processing unit 131 to improve the S / N ratio.
- step S215 the recognition unit 17 performs speech recognition (speech analysis) based on the audio signal output from the high S / N processing unit 133.
- the recognition unit 17 performs a command recognition process based on the recognized voice (audio signal).
- the specific contents of the command recognition process are not particularly limited.
- the recognition unit 17 may recognize a command by comparing a recognized voice with a request pattern registered in advance (learned).
- step S218 if the command cannot be recognized (S218 / No), the signal processing apparatus 1 repeats the processes shown in steps S203 to S215. At this time, S203 and S206 are also repeated, so that the signal processing unit 13 can update the microphone group that forms the acoustic closed curved surface containing the user in accordance with the movement of the user.
- FIG. 8 is a flowchart showing sound collection processing according to the present embodiment.
- the microphone array processing unit 131 of the signal processing unit 13 performs microphone array processing on the audio signal collected from each selected / updated microphone, and sends it to the user's mouth. Control the directivity of the microphone to focus.
- step S312 the high S / N processing unit 133 performs processing such as reverberation / noise suppression on the audio signal processed by the microphone array processing unit 131 to improve the S / N ratio.
- step S315 the communication I / F 19 connects the audio signal output from the high S / N processing unit 133 to the connection destination indicated by the connection destination information of the target user identified in step S126 (see FIG. 6). (E.g., signal processing device 1B). Thereby, the voice uttered by the user A at the site A is collected by a plurality of microphones arranged around the user A and transmitted to the site B side.
- FIG. 9 is a flowchart showing sound field reproduction processing according to the present embodiment.
- the user position estimating unit 16 of the signal processing apparatus 1 estimates the position of the user.
- the user position estimator 16 is based on the sound collected from the plurality of microphones 10, the captured image captured by the image sensor, the arrangement of the speakers stored in the speaker position information DB 21, and the like. Relative position, orientation, and ear position may be estimated.
- step S406 the signal processing unit 13 selects a speaker group that forms an acoustic closed curved surface containing the user according to the estimated relative position and orientation of the user and the position of the ear. Note that, by continuously performing the above steps S403 and S406, the signal processing unit 13 can update the speaker group that forms the acoustic closed surface including the user according to the movement of the user.
- step S409 the communication I / F 19 receives an audio signal from the caller.
- step S412 the sound field reproduction signal processing unit 135 of the signal processing unit 13 performs predetermined processing on the received audio signal so as to form an optimal sound field when output from each selected / updated speaker.
- the sound field reproduction signal processing unit 135 renders the received audio signal according to the environment of the site B (here, the arrangement of the plurality of speakers 20 arranged on the floor, wall, and ceiling of the room).
- step S415 the signal processing apparatus 1 outputs the audio signal processed by the sound field reproduction signal processing unit 135 from the speaker group selected / updated in step S406 via the DAC / amplifier unit 23. .
- the sound field reproduction signal processing unit 135 may perform signal processing so as to construct the sound field of the site A when rendering the audio signal received according to the environment of the site B.
- the sound field reproduction signal processing unit 135 performs the site B at the site B based on the ambient sound collected in real time, the measurement data (transfer function) of the impulse response at the site A, and the like.
- the sound field of A may be reproduced.
- the user B who is in the indoor site B can obtain a sound field feeling like being in the same outdoor as the user A who is in the outdoor site A, and can be immersed in a richer sense of reality.
- the sound field reproduction signal processing unit 135 can control the sound image of the received audio signal (user A's voice) using a group of speakers arranged around the user B. For example, by forming an array speaker (beam forming) with a plurality of speakers, the sound field reproduction signal processing unit 135 reproduces the user A's voice at the ear of the user B, or an acoustic closed curved surface including the user B. It is possible to reproduce the sound image of the user A on the outside.
- command input is input by voice, but the command input method of the acoustic system according to the present disclosure is not limited to voice input, and may be another input method. Hereinafter, another command input method will be described with reference to FIG.
- FIG. 10 is a block diagram showing another configuration example of the signal processing apparatus according to the present embodiment.
- the signal processing device 1 ′ includes an operation input unit 25, an imaging unit 26, and an infrared / thermal sensor 27 in addition to the components of the signal processing device 1 shown in FIG. 3.
- the operation input unit 25 has a function of detecting a user operation on each switch (not shown) arranged around the user. For example, the operation input unit 25 detects that the call request switch has been pressed by the user, and outputs the detection result to the recognition unit 17. The recognizing unit 17 recognizes the call command based on pressing of the call request switch. In this case, the operation input unit 25 can also accept the designation of the call destination (name of the target user, etc.).
- the recognizing unit 17 analyzes the user's gesture based on the captured image captured by the imaging unit 26 (image sensor) arranged around the user and the detection result by the infrared / thermal sensor 27, and uses it as a command. You may recognize it. For example, when the user performs a gesture for making a call, the recognition unit 17 recognizes the call command. In this case, the recognition unit 17 may accept the designation of the call destination (name of the target user, etc.) from the operation input unit 25 or may make a determination based on voice analysis.
- the command input method of the acoustic system according to the present disclosure is not limited to voice input, and may be switch pressing or gesture input, for example.
- the user “want to listen to the radio”, “want to listen to the song ⁇ ”, “do you have any news?”, “A music concert currently being held in Vienna
- a request other than a call request is made, such as “I want to listen to”
- the sound is picked up by a plurality of microphones 10 arranged around and is recognized as a command by the recognition unit 17.
- the signal processing device 1 performs processing according to each command recognized by the recognition unit 17.
- the signal processing apparatus 1 receives an audio signal corresponding to a target radio, song, news, music festival, or the like designated by the user from a predetermined server, and the signal from the sound field reproduction signal processing unit 135 as described above. Through the processing, it may be reproduced from a group of speakers arranged around the user. Note that the audio signal received by the signal processing device 1 may be collected in real time.
- the user can acquire a desired service by speaking on the spot without having to carry or operate a terminal device such as a smartphone or a remote controller.
- the sound field reproduction signal processing unit 135 reproduces an audio signal collected in a wide space such as an opera from a group of speakers that form a small acoustic closed surface including a user. It is possible to reproduce reverberation and sound image localization in a wide space.
- the sound field reproduction signal processing unit 135 can reproduce the sound image localization / reverberation characteristics of the sound collection environment in a reproduction environment by predetermined signal processing.
- the sound field reproduction signal processing unit 135 may use signal processing using a transfer function disclosed in Japanese Patent No. 4775487.
- a first transfer function impulse response measurement data
- an audio signal that has been subjected to arithmetic processing based on the first transfer function is reproduced in the reproduction environment.
- the sound field for example, reverberation, sound image localization
- the sound field reproduction signal processing unit 135 performs the sound image localization and reverberation effect such that the acoustic closed curved surface 40 including the user in the small space is immersed in the sound field 42 in the large space. It is possible to construct a sound field that can be obtained.
- a sound field that can be obtained.
- a large space to be reproduced for example, an opera
- an audio signal collected from the plurality of microphones 10 is calculated based on a transfer function. And reproduced from a plurality of selected speakers 20.
- the signal processing apparatus 1 according to the present embodiment can also construct an image of another space in addition to the sound field construction (sound field reproduction processing) of another space described in the above embodiment.
- the signal processing device 1 receives an audio signal and video collected at the target game venue from a predetermined server. You may receive and reproduce
- the image may be reproduced by, for example, spatial projection by hologram reproduction, or may be reproduced by a television in a room, a display, or a head mounted display worn by the user.
- the user can obtain a sense of immersion in the game venue, and can feel more realistic.
- the user can arbitrarily select and move the position (sound collection / imaging position) to be immersed in the target game field.
- the position sound collection / imaging position
- the user can immerse themselves in a game venue or in a sense of presence that follows a specific player, without staying at a predetermined audience seat.
- the system configuration of the acoustic system according to the above-described embodiment described with reference to FIGS. 1 and 2 includes a plurality of microphones and speakers around the user on both the calling side (site A) and the called side (site B).
- the signal processing devices 1A and 1B perform signal processing.
- the system configuration of the acoustic system according to the present embodiment is not limited to the configuration shown in FIGS. 1 and 2, and may be the configuration shown in FIG. 13, for example.
- FIG. 13 is a diagram showing another system configuration of the acoustic system according to the present embodiment. As shown in FIG. 13, in the acoustic system according to the present embodiment, the signal processing device 1, the communication terminal 7, and the management server 3 are connected via a network 5.
- the communication terminal 7 has a normal single microphone and a single speaker such as a mobile phone terminal and a smartphone, and is a legacy for a highly functional interface space in which a plurality of microphones and a plurality of speakers according to the present embodiment are arranged. Interface.
- the signal processing apparatus 1 is connected to a normal communication terminal 7 and can reproduce audio received from the communication terminal 7 from a plurality of speakers arranged around the user. Further, the signal processing device 1 according to the present embodiment can transmit the user's voice collected from a plurality of microphones arranged around the user to the communication terminal 7.
- the first user who is in a space where a plurality of microphones and a plurality of speakers are arranged in the vicinity, and the second user who has a normal communication terminal 7 A call with can be realized. That is, the configuration of the acoustic system according to the present embodiment may be a highly functional interface space in which one of the calling side and the called side is arranged with a plurality of microphones and a plurality of speakers according to the present embodiment.
- the space around the user can be interlinked with other spaces.
- the acoustic system according to the present embodiment reproduces sound and images corresponding to a predetermined target (person, place, building, etc.) from a plurality of speakers and displays arranged around the user, Sound can be picked up by a plurality of microphones arranged around the user and reproduced around a predetermined target.
- a predetermined target person, place, building, etc.
- Sound can be picked up by a plurality of microphones arranged around the user and reproduced around a predetermined target.
- the user since microphones, image sensors, and the like are arranged everywhere, the user does not need to own a smartphone or a mobile phone terminal. It can be connected to the space around the object.
- the configuration of the signal processing device 1 is not limited to the configuration illustrated in FIG. 3.
- the recognition unit 17 and the identification unit 18 illustrated in FIG. 3 are provided on the server side connected via the network instead of the signal processing device 1. It may be a configuration.
- the signal processing device 1 transmits the audio signal output from the signal processing unit 13 to the server via the communication I / F 19.
- the server performs command recognition and processing for identifying a predetermined target (person, place, building, program, song, etc.) based on the received audio signal, and corresponds to the recognition result and the identified predetermined target. Connection destination information to be transmitted to the signal processing device 1.
- this technique can also take the following structures.
- a recognition unit for recognizing a predetermined target based on signals detected by a plurality of sensors arranged around a specific user;
- An identification unit for identifying the predetermined object recognized by the recognition unit;
- An estimation unit that estimates the position of the specific user according to a signal detected by any of the plurality of sensors;
- the periphery of the predetermined target identified by the identification unit so as to be localized near the position of the specific user estimated by the estimation unit when being output from a plurality of actuators arranged around the specific user
- a signal processing unit for processing a signal acquired from the sensor;
- An information processing system comprising: (2) The information processing system according to (1), wherein the signal processing unit processes signals acquired from a plurality of sensors arranged around the predetermined target.
- the plurality of sensors arranged around the specific user are microphones, The information processing system according to (1) or (2), wherein the recognition unit recognizes the predetermined target based on an audio signal detected by the microphone.
- the sensor arranged around the specific user is a microphone, The information processing system according to (4), wherein the recognition unit recognizes a call request to the predetermined target based on an audio signal detected by the microphone.
- the sensor arranged around the specific user is a pressure sensor, The information processing system according to (4), wherein the recognizing unit recognizes a call request for the predetermined target when the pressure sensor detects pressing of a specific switch.
- the sensor arranged around the specific user is an imaging sensor, The information processing system according to (4), wherein the recognition unit recognizes a call request to the predetermined target based on a captured image acquired by the imaging sensor.
- the sensor around the predetermined object is a microphone
- the plurality of actuators arranged around the specific user are a plurality of speakers
- the signal processing unit based on each position of the plurality of speakers and the estimated position of the specific user so as to form a sound field near the position of the specific user when output from the plurality of speakers,
- the information processing system according to any one of (1) to (7), wherein an audio signal collected by the microphone around the predetermined target is processed.
- An information processing system comprising: (10) Computer A recognition unit for recognizing a predetermined target based on signals detected by a plurality of sensors arranged around a specific user; An identification unit for identifying the predetermined object recognized by the recognition unit; An estimation unit that estimates the position of the specific user according to a signal detected by any of the plurality of sensors; The periphery of the predetermined target identified by the identification unit so as to be localized near the position of the specific user estimated by the estimation unit when being output from a plurality of actuators arranged around the specific user A signal processing unit for processing a signal acquired from the sensor; Program to function as (11) Computer A recognition unit for recognizing a predetermined target based
Landscapes
- Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
1.本開示の一実施形態による音響システムの概要
2.基本構成
2-1.システム構成
2-2.信号処理装置
2-3.管理サーバ
3.動作処理
3-1.基本処理
3-2.コマンド認識処理
3-3.収音処理
3-4.音場再生処理
4.補足
5.まとめ
まず、本開示の一実施形態による音響システム(情報処理システム)の概要について、図1を参照して説明する。図1は、本開示の一実施形態による音響システムの概要を説明するための図である。図1に示すように、本実施形態による音響システムでは、部屋、家、ビル、屋外、地域、国等の世界の至る所に大量のマイクロフォン10、イメージセンサ(不図示)、およびスピーカー20等の各種センサおよびアクチュエータが配置されている状況を想定する。
サイトAでは、複数のマイク10A、イメージセンサ(不図示)、および人感センサ(不図示)等により継続的にデータ収集処理が行われている。具体的には、本実施形態による音響システムは、複数のマイク10Aで収音した音声、イメージセンサで撮像した撮像画像、または人感センサの検知結果を収集し、これによりユーザの位置を推定する。
通話中においては、サイトAの複数のマイクで収音されたオーディオ信号(ストリームデータ)に対して、音源分離(ユーザAの周囲のノイズ成分や、ユーザAの周囲の人物の会話などを分離)、残響抑制、ノイズ/エコー処理等のオブジェクト分解処理が行われる。これにより、S/N比のよい、残響感も抑制されたストリームデータがサイトBに送られる。
そして、サイトBに送られたストリームデータは、サイトBに居るユーザBの周囲に配されたスピーカーから再生される。この際、本実施形態による音響システムは、サイトBにおいて、複数のマイク、イメージセンサ、および人感センサによりデータ収集を行い、収集したデータに基づいてユーザBの位置を推定し、さらにユーザBの周囲を音響閉曲面で囲う適切なスピーカー群を選出する。サイトBに送られたストリームデータは、このように選出したスピーカー群から再生され、音響閉曲面内側のエリアが適切な音場として制御される。なお、本明細書において、ある対象物(例えばユーザ)を取り囲むような形で、近接する複数のスピーカーまたは複数のマイクの位置を繋いだ場合に形成される面を、概念的に「音響閉曲面」と称す。また、「音響閉曲面」は、必ずしも完全な閉曲面を構成するものではなく、おおよそ対象物(例えばユーザ)を取り囲むような形であればよい。
[2-1.システム構成]
図2は、本実施形態による音響システムの全体構成を示す図である。図2に示すように、音響システムは、信号処理装置1A、信号処理装置1B、および管理サーバ3を有する。
次に、本実施形態による信号処理装置1の構成について詳細に説明する。図3は、本実施形態による信号処理装置1の構成を示すブロック図である。図3に示すように、本実施形態による信号処理装置1は、複数のマイク10(アレイマイク)、アンプ・ADC(アナログデジタルコンバータ)部11、信号処理部13、マイク位置情報DB(データベース)15、ユーザ位置推定部16、認識部17、同定部18、通信I/F(インターフェース)19、スピーカー位置情報DB21、DAC(デジタルアナログコンバータ)・アンプ部23、および複数のスピーカー20(アレイスピーカー)を有する。以下、各構成について説明する。
複数のマイク10は、上述したように、あるエリア(サイト)の至る所に配置されている。例えば、屋外であれば、道路、電柱、街灯、家やビルの外壁等、屋内であれば、床、壁、天井等に配置される。また、複数のマイク10は、周囲の音を収音し、アンプ・ADC部11に各々出力する。
アンプ・ADC部11は、複数のマイク10から各々出力された音波の増幅機能(amplifier)、および音波(アナログデータ)をオーディオ信号(デジタルデータ)に変換する機能(Analog・to・Digital Converter)を有する。アンプ・ADC部11は、変換した各オーディオ信号を信号処理部13に出力する。
信号処理部13は、マイク10により収音され、アンプ・ADC部11を介して送られた各オーディオ信号や、DAC・アンプ部23を介してスピーカー20から再生する各オーディオ信号を処理する機能を有する。また、本実施形態による信号処理部13は、マイクアレイ処理部131、高S/N化処理部133、および音場再生信号処理部135として機能する。
マイクアレイ処理部131は、アンプ・ADC部11から出力された複数のオーディオ信号に対するマイクアレイ処理として、ユーザの音声にフォーカスするよう(収音位置がユーザの口元になるよう)指向性制御を行う。
高S/N化処理部133は、アンプ・ADC部11から出力された複数のオーディオ信号に対して、明瞭度が高くS/N比がよいモノラル信号となるよう処理する機能を有する。具体的には、高S/N化処理部133は、音源を分離し、残響・ノイズ抑制を行う。
音場再生信号処理部135は、複数のスピーカー20から再生するオーディオ信号に関する信号処理を行い、ユーザの位置付近に音場が定位するよう制御する。具体的には、例えば音場再生信号処理部135は、ユーザ位置推定部16により推定されたユーザの位置やスピーカー位置情報DB21に登録されている各スピーカー20の位置に基づいて、ユーザを内包する音響閉曲面を形成する最適なスピーカー群を選択する。そして、音場再生信号処理部135は、選択したスピーカー群に応じた複数のチャンネルの出力バッファに、信号処理したオーディオ信号を書き込む。
マイク位置情報DB15は、サイトに配される複数のマイク10の位置情報を記憶する記憶部である。複数のマイク10の位置情報は、予め登録されていてもよい。
ユーザ位置推定部16は、ユーザの位置を推定する機能を有する。具体的には、ユーザ位置推定部16は、複数のマイク10から収音した音声の解析結果、イメージセンサにより撮像した撮像画像の解析結果、または人感センサによる検知結果に基づいて、複数のマイク10または複数のスピーカー20に対するユーザの相対位置を推定する。また、ユーザ位置推定部16は、GPS(Global Positioning System)情報を取得し、ユーザの絶対位置(現在位置情報)を推定してもよい。
認識部17は、複数のマイク10により収音され、信号処理部13により処理されたオーディオ信号に基づいてユーザの音声を解析し、コマンドを認識する。例えば、認識部17は、「Bさんと話したい」というユーザの音声を形態素解析し、ユーザに指定された所定の対象「B」および要求「話す」に基づき、発呼要求コマンドを認識する。
同定部18は、認識部17により認識された所定の対象を同定する機能を有する。具体的には、例えば同定部18は、所定の対象に対応する音声や画像を取得するための接続先情報を決定してもよい。同定部18は、例えば所定の対象を示す情報を通信部I/F19から管理サーバ3に送信し、管理サーバ3から所定の対象に対応する接続先情報(IPアドレス等)を取得してもよい。
通信I/F19は、ネットワーク5を通じて他の信号処理装置や管理サーバ3との間でデータの送受信を行うための通信モジュールである。例えば、本実施形態による通信I/F19は、管理サーバ3に対して所定の対象に対応する接続先情報の問い合わせを行ったり、接続先である他の信号処理装置に、マイク10で収音して信号処理部13で処理したオーディオ信号を送信したりする。
スピーカー位置情報DB21は、サイトに配される複数のスピーカー20の位置情報を記憶する記憶部である。複数のスピーカー20の位置情報は、予め登録されていてもよい。
DAC・アンプ部23は、複数のスピーカー20から各々再生するための各チャンネルの出力バッファに書き込まれたオーディオ信号(デジタルデータ)を音波(アナログデータ)に変換する機能(Digital・to・Analog Converter)を有する。さらに、DAC・アンプ部23は、複数のスピーカー20から各々再生する音波を増幅する機能(amplifier)を有する。
複数のスピーカー20は、上述したように、あるエリア(サイト)の至る所に配置されている。例えば、屋外であれば、道路、電柱、街灯、家やビルの外壁等、屋内であれば、床、壁、天井等に配置される。また、複数のスピーカー20は、DAC・アンプ部23から出力された音波(音声)を再生する。
図5は、本実施形態による管理サーバ3の構成を示すブロック図である。図5に示すように、管理サーバ3は、管理部32、検索部33、ユーザ位置情報DB35、および通信I/F39を有する。以下、各構成について説明する。
管理部32は、信号処理装置1から送信されたユーザID等に基づいて、ユーザが現在居る場所(サイト)に関する情報を管理する。例えば管理部32は、ユーザIDに基づいてユーザを識別し、識別したユーザの氏名等に、送信元の信号処理装置1のIPアドレス等を接続先情報として対応付けてユーザ位置情報DB35に記憶させる。なお、ユーザIDは、氏名、暗証番号、または生体情報等を含んでもよい。また、管理部32は、送信されたユーザIDに基づいてユーザの認証処理を行ってもよい。
ユーザ位置情報DB35は、管理部32による管理に応じて、ユーザが現在居る場所に関する情報を記憶する記憶部である。具体的には、ユーザ位置情報DB35は、ユーザのID、および接続先情報(ユーザが居るサイトに対応する信号処理装置のIPアドレス等)を対応付けて記憶する。また、各ユーザの現在位置情報は時々刻々と更新されてもよい。
検索部33は、信号処理装置1からの接続先(発呼先)問い合わせに応じて、ユーザ位置情報DB35を参照し、接続先情報を検索する。具体的には、検索部33は、接続先問い合わせに含まれる対象ユーザの氏名等に基づいて、対応付けられた接続先情報をユーザ位置情報DB35から検索して抽出する。
通信I/F39は、ネットワーク5を通じて信号処理装置1との間でデータの送受信を行うための通信モジュールである。例えば、本実施形態による通信I/F39は、信号処理装置1からユーザのIDを受信したり、接続先問い合わせを受信したりする。また、通信I/F39は、接続先問い合わせに応じて、対象ユーザの接続先情報を送信する。
[3-1.基本処理]
図6は、本実施形態による音響システムの基本処理を示すフローチャートである。図6に示すように、まず、ステップS103において、信号処理装置1AはサイトAに居るユーザAのIDを管理サーバ3に送信する。信号処理装置1Aは、ユーザAのIDを、ユーザAが所有しているRFID(Radio Frequency IDentification)等のタグから取得してもよいし、ユーザAの音声から認識してもよい。また、信号処理装置1Aは、ユーザAの身体(顔、目、手等)から生体情報を読み取り、IDとして取得してもよい。
図7は、本実施形態によるコマンド認識処理を示すフローチャートである。図7に示すように、まず、ステップS203において、信号処理装置1のユーザ位置推定部16は、ユーザの位置を推定する。例えばユーザ位置推定部16は、複数のマイク10から収音した音、イメージセンサにより撮像した撮像画像、およびマイク位置情報DB15に記憶されている各マイクの配置等に基づき、各マイクに対するユーザの相対的な位置、向き、および口の位置を推定してもよい。
次に、図6のステップS137に示す収音処理について、図8を参照して詳細に説明する。図8は、本実施形態による収音処理を示すフローチャートである。図8に示すように、まず、ステップS308において、信号処理部13のマイクアレイ処理部131は、選出/更新した各マイクから収音したオーディオ信号に対してマイクアレイ処理を行い、ユーザの口元にフォーカスするようマイクの指向性を制御する。
次に、図6のステップS140に示す音場再生処理について、図9を参照して詳細に説明する。図9は、本実施形態による音場再生処理を示すフローチャートである。図9に示すように、まず、ステップS403において、信号処理装置1のユーザ位置推定部16は、ユーザの位置を推定する。例えばユーザ位置推定部16は、複数のマイク10から収音した音、イメージセンサにより撮像した撮像画像、およびスピーカー位置情報DB21に記憶されている各スピーカーの配置等に基づき、各スピーカー20に対するユーザの相対的な位置、向き、および耳の位置を推定してもよい。
例えば、音場再生信号処理部135は、受信したオーディオ信号を、サイトBの環境(ここでは、部屋の床、壁、および天井に配された複数のスピーカー20の配置)に応じてレンダリングする。
[4-1.コマンド入力の変形例]
上記実施形態では、音声にてコマンドを入力していたが、本開示による音響システムのコマンド入力方法は音声入力に限定されず、他の入力方法であってもよい。以下、図10を参照して他のコマンド入力方法について説明する。
上記実施形態では、所定の対象として人物が指定され、発呼要求(通話要求)をコマンドとして認識する場合について説明したが、本開示による音響システムのコマンドは発呼要求(通話要求)に限定されず、他のコマンドであってもよい。例えば、信号処理装置1の認識部17は、所定の対象として指定された場所、建物、番組、曲等をユーザが居る空間で再現するコマンドを認識してもよい。
さらに、本実施形態による信号処理装置1は、上記実施形態において説明した他の空間の音場構築(音場再生処理)の他、併せて他の空間の映像構築を行うこともできる。
図1~図2を参照して説明した上記実施形態による音響システムのシステム構成は、発呼側(サイトA)および着呼側(サイトB)の両者とも、ユーザの周辺に複数のマイクやスピーカーが配され、信号処理装置1A、1Bにより信号処理されている。しかし、本実施形態による音響システムのシステム構成は図1~図2に示す構成に限定されず、例えば図13に示すような構成であってもよい。
上述したように、本実施形態による音響システムでは、ユーザ周辺の空間を他の空間と相互連携させることが可能となる。具体的には、本実施形態による音響システムは、所定対象(人物、場所、建物等)に対応する音声や画像をユーザの周囲に配された複数のスピーカーやディスプレイから再生し、また、ユーザの音声をユーザの周囲に配された複数のマイクで収音して所定対象の周囲で再生することができる。このように、屋内や屋外の至る所に配されるマイクロフォン10、スピーカー20、イメージセンサ等を用いて、実質的にユーザの口、目、耳等の身体を広範囲に拡張させることが可能となり、新たなコミュニケーション方法を実現することができる。
(1)
特定ユーザの周辺に配される複数のセンサにより検知された信号に基づいて、所定の対象を認識する認識部と、
前記認識部により認識された前記所定の対象を同定する同定部と、
前記複数のセンサのいずれかにより検知された信号に応じて、前記特定ユーザの位置を推定する推定部と、
前記特定ユーザの周辺に配される複数のアクチュエータから出力される際に、前記推定部により推定された前記特定ユーザの位置付近に定位するよう、前記同定部により同定された前記所定の対象の周辺のセンサから取得した信号を処理する信号処理部と、
を備える、情報処理システム。
(2)
前記信号処理部は、前記所定の対象の周辺に配される複数のセンサから取得した信号を処理する、前記(1)に記載の情報処理システム。
(3)
前記特定ユーザの周辺に配される複数のセンサは、マイクロフォンであって、
前記認識部は、前記マイクロフォンにより検知されたオーディオ信号に基づいて、前記所定の対象を認識する、前記(1)または(2)に記載の情報処理システム。
(4)
前記認識部は、前記特定ユーザの周辺に配されるセンサにより検知された信号に基づいて、前記所定の対象に対する要求をさらに認識する、前記(1)~(3)のいずれか1項に記載の情報処理システム。
(5)
前記特定ユーザの周辺に配されるセンサは、マイクロフォンであって、
前記認識部は、前記マイクロフォンにより検知されたオーディオ信号に基づいて、前記所定の対象に対する発呼要求を認識する、前記(4)に記載の情報処理システム。
(6)
前記特定ユーザの周辺に配されるセンサは、圧力センサであって、
前記認識部は、前記圧力センサにより特定のスイッチの押圧が検知された場合、前記所定の対象に対する発呼要求を認識する、前記(4)に記載の情報処理システム。
(7)
前記特定ユーザの周辺に配されるセンサは、撮像センサであって、
前記認識部は、前記撮像センサにより取得された撮像画像に基づいて、前記所定の対象に対する発呼要求を認識する、前記(4)に記載の情報処理システム。
(8)
前記所定の対象の周辺のセンサは、マイクロフォンであって、
前記特定ユーザの周辺に配される複数のアクチュエータは、複数のスピーカーであって、
前記信号処理部は、前記複数のスピーカーから出力された際に前記特定ユーザの位置付近に音場を形成するよう、前記複数のスピーカーの各位置および推定された前記特定ユーザの位置に基づいて、前記所定の対象の周辺の前記マイクロフォンにより収音されたオーディオ信号を処理する、前記(1)~(7)のいずれか1項に記載の情報処理システム。
(9)
特定ユーザの周辺のセンサにより検知された信号に基づいて、所定の対象を認識する認識部と、
前記認識部により認識された前記所定の対象を同定する同定部と、
前記同定部により同定された前記所定の対象の周辺に配される複数のセンサから取得された信号に基づき、前記特定ユーザの周辺のアクチュエータから出力する信号を生成する信号処理部と、
を備える、情報処理システム。
(10)
コンピュータを、
特定ユーザの周辺に配される複数のセンサにより検知された信号に基づいて、所定の対象を認識する認識部と、
前記認識部により認識された前記所定の対象を同定する同定部と、
前記複数のセンサのいずれかにより検知された信号に応じて、前記特定ユーザの位置を推定する推定部と、
前記特定ユーザの周辺に配される複数のアクチュエータから出力される際に、前記推定部により推定された前記特定ユーザの位置付近に定位するよう、前記同定部により同定された前記所定の対象の周辺のセンサから取得した信号を処理する信号処理部と、
として機能させるための、プログラム。
(11)
コンピュータを、
特定ユーザの周辺のセンサにより検知された信号に基づいて、所定の対象を認識する認識部と、
前記認識部により認識された前記所定の対象を同定する同定部と、
前記同定部により同定された前記所定の対象の周辺に配される複数のセンサから取得された信号に基づき、前記特定ユーザの周辺のアクチュエータから出力する信号を生成する信号処理部と、
として機能させるための、プログラム。
3 管理サーバ
5 ネットワーク
7 通信端末
10、10A、10B マイクロフォン(マイク)
11 アンプ・ADC(アナログデジタルコンバータ)部
13 信号処理部
15 マイク位置情報DB(データベース)
16 ユーザ位置推定部
17 認識部
18 同定部
19 通信I/F(インターフェース)
20、20A、20B スピーカー
23 DAC(デジタルアナログコンバータ)・アンプ部
25 操作入力部
26 撮像部(イメージセンサ)
27 赤外線/熱センサ
32 管理部
33 検索部
40、40-1、40-2、40-3 音響閉曲面
42 音場
131 マイクアレイ処理部
133 高S/N化処理部
135 音場再生信号処理部
Claims (11)
- 特定ユーザの周辺に配される複数のセンサにより検知された信号に基づいて、所定の対象を認識する認識部と、
前記認識部により認識された前記所定の対象を同定する同定部と、
前記複数のセンサのいずれかにより検知された信号に応じて、前記特定ユーザの位置を推定する推定部と、
前記特定ユーザの周辺に配される複数のアクチュエータから出力される際に、前記推定部により推定された前記特定ユーザの位置付近に定位するよう、前記同定部により同定された前記所定の対象の周辺のセンサから取得した信号を処理する信号処理部と、
を備える、情報処理システム。 - 前記信号処理部は、前記所定の対象の周辺に配される複数のセンサから取得した信号を処理する、請求項1に記載の情報処理システム。
- 前記特定ユーザの周辺に配される複数のセンサは、マイクロフォンであって、
前記認識部は、前記マイクロフォンにより検知されたオーディオ信号に基づいて、前記所定の対象を認識する、請求項1または2に記載の情報処理システム。 - 前記認識部は、前記特定ユーザの周辺に配されるセンサにより検知された信号に基づいて、前記所定の対象に対する要求をさらに認識する、請求項1~3のいずれか1項に記載の情報処理システム。
- 前記特定ユーザの周辺に配されるセンサは、マイクロフォンであって、
前記認識部は、前記マイクロフォンにより検知されたオーディオ信号に基づいて、前記所定の対象に対する発呼要求を認識する、請求項4に記載の情報処理システム。 - 前記特定ユーザの周辺に配されるセンサは、圧力センサであって、
前記認識部は、前記圧力センサにより特定のスイッチの押圧が検知された場合、前記所定の対象に対する発呼要求を認識する、請求項4に記載の情報処理システム。 - 前記特定ユーザの周辺に配されるセンサは、撮像センサであって、
前記認識部は、前記撮像センサにより取得された撮像画像に基づいて、前記所定の対象に対する発呼要求を認識する、請求項4に記載の情報処理システム。 - 前記所定の対象の周辺のセンサは、マイクロフォンであって、
前記特定ユーザの周辺に配される複数のアクチュエータは、複数のスピーカーであって、
前記信号処理部は、前記複数のスピーカーから出力された際に前記特定ユーザの位置付近に音場を形成するよう、前記複数のスピーカーの各位置および推定された前記特定ユーザの位置に基づいて、前記所定の対象の周辺の前記マイクロフォンにより収音されたオーディオ信号を処理する、請求項1~7のいずれか1項に記載の情報処理システム。 - 特定ユーザの周辺のセンサにより検知された信号に基づいて、所定の対象を認識する認識部と、
前記認識部により認識された前記所定の対象を同定する同定部と、
前記同定部により同定された前記所定の対象の周辺に配される複数のセンサから取得された信号に基づき、前記特定ユーザの周辺のアクチュエータから出力する信号を生成する信号処理部と、
を備える、情報処理システム。 - コンピュータを、
特定ユーザの周辺に配される複数のセンサにより検知された信号に基づいて、所定の対象を認識する認識部と、
前記認識部により認識された前記所定の対象を同定する同定部と、
前記複数のセンサのいずれかにより検知された信号に応じて、前記特定ユーザの位置を推定する推定部と、
前記特定ユーザの周辺に配される複数のアクチュエータから出力される際に、前記推定部により推定された前記特定ユーザの位置付近に定位するよう、前記同定部により同定された前記所定の対象の周辺のセンサから取得した信号を処理する信号処理部と、
として機能させるための、プログラムが記憶された記憶媒体。 - コンピュータを、
特定ユーザの周辺のセンサにより検知された信号に基づいて、所定の対象を認識する認識部と、
前記認識部により認識された前記所定の対象を同定する同定部と、
前記同定部により同定された前記所定の対象の周辺に配される複数のセンサから取得された信号に基づき、前記特定ユーザの周辺のアクチュエータから出力する信号を生成する信号処理部と、
として機能させるための、プログラムが記憶された記憶媒体。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014524672A JP6248930B2 (ja) | 2012-07-13 | 2013-04-19 | 情報処理システムおよびプログラム |
EP13817541.9A EP2874411A4 (en) | 2012-07-13 | 2013-04-19 | INFORMATION PROCESSING SYSTEM AND STORAGE MEDIUM |
CN201380036179.XA CN104412619B (zh) | 2012-07-13 | 2013-04-19 | 信息处理系统 |
US14/413,024 US10075801B2 (en) | 2012-07-13 | 2013-04-19 | Information processing system and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012-157722 | 2012-07-13 | ||
JP2012157722 | 2012-07-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014010290A1 true WO2014010290A1 (ja) | 2014-01-16 |
Family
ID=49915766
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/061647 WO2014010290A1 (ja) | 2012-07-13 | 2013-04-19 | 情報処理システムおよび記憶媒体 |
Country Status (5)
Country | Link |
---|---|
US (1) | US10075801B2 (ja) |
EP (1) | EP2874411A4 (ja) |
JP (1) | JP6248930B2 (ja) |
CN (1) | CN104412619B (ja) |
WO (1) | WO2014010290A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018070487A1 (ja) * | 2016-10-14 | 2018-04-19 | 国立研究開発法人科学技術振興機構 | 空間音響生成装置、空間音響生成システム、空間音響生成方法、および、空間音響生成プログラム |
JP2020198588A (ja) * | 2019-06-05 | 2020-12-10 | シャープ株式会社 | 音声処理システム、会議システム、音声処理方法、及び音声処理プログラム |
WO2023100560A1 (ja) | 2021-12-02 | 2023-06-08 | ソニーグループ株式会社 | 情報処理装置、情報処理方法、記憶媒体 |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6102923B2 (ja) * | 2012-07-27 | 2017-03-29 | ソニー株式会社 | 情報処理システムおよび記憶媒体 |
US9294839B2 (en) | 2013-03-01 | 2016-03-22 | Clearone, Inc. | Augmentation of a beamforming microphone array with non-beamforming microphones |
DE112015000640T5 (de) * | 2014-02-04 | 2017-02-09 | Tp Vision Holding B.V. | Handgerät mit Mikrofon |
EP3387523A4 (en) * | 2015-12-07 | 2019-08-21 | Creative Technology Ltd. | AUDIO SYSTEM |
US10097919B2 (en) * | 2016-02-22 | 2018-10-09 | Sonos, Inc. | Music service selection |
US9807499B2 (en) * | 2016-03-30 | 2017-10-31 | Lenovo (Singapore) Pte. Ltd. | Systems and methods to identify device with which to participate in communication of audio data |
EP3662328A4 (en) | 2017-07-31 | 2021-05-05 | Driessen Aerospace Group N.V. | VIRTUAL CONTROL DEVICE AND SYSTEM |
CN111903143B (zh) * | 2018-03-30 | 2022-03-18 | 索尼公司 | 信号处理设备和方法以及计算机可读存储介质 |
CN109188927A (zh) * | 2018-10-15 | 2019-01-11 | 深圳市欧瑞博科技有限公司 | 家居控制方法、装置、网关设备及存储介质 |
US10991361B2 (en) * | 2019-01-07 | 2021-04-27 | International Business Machines Corporation | Methods and systems for managing chatbots based on topic sensitivity |
US10812921B1 (en) | 2019-04-30 | 2020-10-20 | Microsoft Technology Licensing, Llc | Audio stream processing for distributed device meeting |
WO2021021857A1 (en) * | 2019-07-30 | 2021-02-04 | Dolby Laboratories Licensing Corporation | Acoustic echo cancellation control for distributed audio devices |
CN111048081B (zh) * | 2019-12-09 | 2023-06-23 | 联想(北京)有限公司 | 一种控制方法、装置、电子设备及控制系统 |
JP7532793B2 (ja) * | 2020-02-10 | 2024-08-14 | ヤマハ株式会社 | 音量調整装置および音量調整方法 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS647100A (en) * | 1987-06-30 | 1989-01-11 | Ricoh Kk | Voice recognition equipment |
JPH09261351A (ja) * | 1996-03-22 | 1997-10-03 | Nippon Telegr & Teleph Corp <Ntt> | 音声電話会議装置 |
JP2006279565A (ja) | 2005-03-29 | 2006-10-12 | Yamaha Corp | アレイスピーカ制御装置及びアレイマイク制御装置 |
JP2008543137A (ja) | 2005-05-23 | 2008-11-27 | シーメンス ソシエタ ペル アツィオーニ | 機械を、ipマルチメディアサブシステムのipリンクを介して遠隔管理するための方法およびシステム、ims |
JP2010130411A (ja) * | 2008-11-28 | 2010-06-10 | Nippon Telegr & Teleph Corp <Ntt> | 複数信号区間推定装置とその方法とプログラム |
JP4674505B2 (ja) | 2005-08-01 | 2011-04-20 | ソニー株式会社 | 音声信号処理方法、音場再現システム |
JP4735108B2 (ja) | 2005-08-01 | 2011-07-27 | ソニー株式会社 | 音声信号処理方法、音場再現システム |
JP4775487B2 (ja) | 2009-11-24 | 2011-09-21 | ソニー株式会社 | 音声信号処理方法、音声信号処理装置 |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6738382B1 (en) * | 1999-02-24 | 2004-05-18 | Stsn General Holdings, Inc. | Methods and apparatus for providing high speed connectivity to a hotel environment |
GB2391741B (en) * | 2002-08-02 | 2004-10-13 | Samsung Electronics Co Ltd | Method and system for providing conference feature between internet call and telephone network call in a webphone system |
JP4096801B2 (ja) * | 2003-04-28 | 2008-06-04 | ヤマハ株式会社 | 簡易型ステレオ発音実現方法、ステレオ発音システム及び楽音発生制御システム |
US7724885B2 (en) * | 2005-07-11 | 2010-05-25 | Nokia Corporation | Spatialization arrangement for conference call |
KR100897971B1 (ko) * | 2005-07-29 | 2009-05-18 | 하르만 인터내셔날 인더스트리즈, 인코포레이티드 | 오디오 튜닝 시스템 |
EP1923866B1 (en) * | 2005-08-11 | 2014-01-01 | Asahi Kasei Kabushiki Kaisha | Sound source separating device, speech recognizing device, portable telephone, sound source separating method, and program |
JP4873316B2 (ja) * | 2007-03-09 | 2012-02-08 | 株式会社国際電気通信基礎技術研究所 | 音響空間共有装置 |
JP5559691B2 (ja) | 2007-09-24 | 2014-07-23 | クアルコム,インコーポレイテッド | 音声及びビデオ通信のための機能向上したインタフェース |
KR20100131467A (ko) * | 2008-03-03 | 2010-12-15 | 노키아 코포레이션 | 복수의 오디오 채널들을 캡쳐하고 렌더링하는 장치 |
KR101462930B1 (ko) * | 2008-04-30 | 2014-11-19 | 엘지전자 주식회사 | 이동 단말기 및 그 화상통화 제어방법 |
JP5113647B2 (ja) | 2008-07-07 | 2013-01-09 | 株式会社日立製作所 | 無線通信を用いた列車制御システム |
CN101656908A (zh) * | 2008-08-19 | 2010-02-24 | 深圳华为通信技术有限公司 | 控制声音聚焦的方法、通讯设备及通讯系统 |
US8724829B2 (en) * | 2008-10-24 | 2014-05-13 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for coherence detection |
US8390665B2 (en) * | 2009-09-03 | 2013-03-05 | Samsung Electronics Co., Ltd. | Apparatus, system and method for video call |
CN102281425A (zh) * | 2010-06-11 | 2011-12-14 | 华为终端有限公司 | 一种播放远端与会人员音频的方法、装置及远程视频会议系统 |
US8300845B2 (en) * | 2010-06-23 | 2012-10-30 | Motorola Mobility Llc | Electronic apparatus having microphones with controllable front-side gain and rear-side gain |
US9973848B2 (en) * | 2011-06-21 | 2018-05-15 | Amazon Technologies, Inc. | Signal-enhancing beamforming in an augmented reality environment |
US20130083948A1 (en) * | 2011-10-04 | 2013-04-04 | Qsound Labs, Inc. | Automatic audio sweet spot control |
-
2013
- 2013-04-19 JP JP2014524672A patent/JP6248930B2/ja not_active Expired - Fee Related
- 2013-04-19 CN CN201380036179.XA patent/CN104412619B/zh not_active Expired - Fee Related
- 2013-04-19 EP EP13817541.9A patent/EP2874411A4/en not_active Ceased
- 2013-04-19 WO PCT/JP2013/061647 patent/WO2014010290A1/ja active Application Filing
- 2013-04-19 US US14/413,024 patent/US10075801B2/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS647100A (en) * | 1987-06-30 | 1989-01-11 | Ricoh Kk | Voice recognition equipment |
JPH09261351A (ja) * | 1996-03-22 | 1997-10-03 | Nippon Telegr & Teleph Corp <Ntt> | 音声電話会議装置 |
JP2006279565A (ja) | 2005-03-29 | 2006-10-12 | Yamaha Corp | アレイスピーカ制御装置及びアレイマイク制御装置 |
JP2008543137A (ja) | 2005-05-23 | 2008-11-27 | シーメンス ソシエタ ペル アツィオーニ | 機械を、ipマルチメディアサブシステムのipリンクを介して遠隔管理するための方法およびシステム、ims |
JP4674505B2 (ja) | 2005-08-01 | 2011-04-20 | ソニー株式会社 | 音声信号処理方法、音場再現システム |
JP4735108B2 (ja) | 2005-08-01 | 2011-07-27 | ソニー株式会社 | 音声信号処理方法、音場再現システム |
JP2010130411A (ja) * | 2008-11-28 | 2010-06-10 | Nippon Telegr & Teleph Corp <Ntt> | 複数信号区間推定装置とその方法とプログラム |
JP4775487B2 (ja) | 2009-11-24 | 2011-09-21 | ソニー株式会社 | 音声信号処理方法、音声信号処理装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP2874411A4 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018070487A1 (ja) * | 2016-10-14 | 2018-04-19 | 国立研究開発法人科学技術振興機構 | 空間音響生成装置、空間音響生成システム、空間音響生成方法、および、空間音響生成プログラム |
US10812927B2 (en) | 2016-10-14 | 2020-10-20 | Japan Science And Technology Agency | Spatial sound generation device, spatial sound generation system, spatial sound generation method, and spatial sound generation program |
JP2020198588A (ja) * | 2019-06-05 | 2020-12-10 | シャープ株式会社 | 音声処理システム、会議システム、音声処理方法、及び音声処理プログラム |
JP7351642B2 (ja) | 2019-06-05 | 2023-09-27 | シャープ株式会社 | 音声処理システム、会議システム、音声処理方法、及び音声処理プログラム |
WO2023100560A1 (ja) | 2021-12-02 | 2023-06-08 | ソニーグループ株式会社 | 情報処理装置、情報処理方法、記憶媒体 |
Also Published As
Publication number | Publication date |
---|---|
US20150208191A1 (en) | 2015-07-23 |
CN104412619A (zh) | 2015-03-11 |
JPWO2014010290A1 (ja) | 2016-06-20 |
EP2874411A1 (en) | 2015-05-20 |
JP6248930B2 (ja) | 2017-12-20 |
CN104412619B (zh) | 2017-03-01 |
US10075801B2 (en) | 2018-09-11 |
EP2874411A4 (en) | 2016-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6248930B2 (ja) | 情報処理システムおよびプログラム | |
JP6102923B2 (ja) | 情報処理システムおよび記憶媒体 | |
JP2014060647A (ja) | 情報処理システム及びプログラム | |
CN106797512B (zh) | 多源噪声抑制的方法、系统和非瞬时计算机可读存储介质 | |
US9491033B1 (en) | Automatic content transfer | |
JP2019518985A (ja) | 分散したマイクロホンからの音声の処理 | |
JP2016146547A (ja) | 収音システム及び収音方法 | |
US12095951B2 (en) | Systems and methods for providing headset voice control to employees in quick-service restaurants | |
WO2021244056A1 (zh) | 一种数据处理方法、装置和可读介质 | |
JP2022514325A (ja) | 聴覚デバイスにおけるソース分離及び関連する方法 | |
CN109218948B (zh) | 助听系统、系统信号处理单元及用于产生增强的电音频信号的方法 | |
WO2022059362A1 (ja) | 情報処理装置、情報処理方法および情報処理システム | |
JP2020086027A (ja) | 音声再生システムおよびプログラム | |
TW202314684A (zh) | 對來自多個麥克風的音訊信號的處理 | |
JP7361460B2 (ja) | コミュニケーション装置、コミュニケーションプログラム、及びコミュニケーション方法 | |
JP2004072354A (ja) | 音声会議システム | |
WO2024232229A1 (ja) | 情報処理装置及び情報処理方法 | |
JP2019537071A (ja) | 分散したマイクロホンからの音声の処理 | |
US20240111482A1 (en) | Systems and methods for reducing audio quality based on acoustic environment | |
JP7293863B2 (ja) | 音声処理装置、音声処理方法およびプログラム | |
JP2002304191A (ja) | 鳴き声による音声ガイドシステム | |
TW202314478A (zh) | 音訊事件資料處理 | |
JP2011199764A (ja) | 発言者音声抽出システム、発言者音声抽出装置及び発言者音声抽出プログラム | |
JP2005122023A (ja) | 高臨場感音響信号出力装置、高臨場感音響信号出力プログラムおよび高臨場感音響信号出力方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13817541 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2014524672 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013817541 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14413024 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |