US20160277864A1 - Waveform Display Control of Visual Characteristics - Google Patents
Waveform Display Control of Visual Characteristics Download PDFInfo
- Publication number
- US20160277864A1 US20160277864A1 US14/663,231 US201514663231A US2016277864A1 US 20160277864 A1 US20160277864 A1 US 20160277864A1 US 201514663231 A US201514663231 A US 201514663231A US 2016277864 A1 US2016277864 A1 US 2016277864A1
- Authority
- US
- United States
- Prior art keywords
- sound data
- waveform
- computing device
- time intervals
- colors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000000007 visual effect Effects 0.000 title abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 51
- 239000003086 colorant Substances 0.000 claims abstract description 42
- 238000013507 mapping Methods 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 19
- 238000005192 partition Methods 0.000 claims description 7
- 238000000638 solvent extraction Methods 0.000 claims 2
- 238000003860 storage Methods 0.000 description 21
- 238000007405 data analysis Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 238000004880 explosion Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- IJJWOSAXNHWBPR-HUBLWGQQSA-N 5-[(3as,4s,6ar)-2-oxo-1,3,3a,4,6,6a-hexahydrothieno[3,4-d]imidazol-4-yl]-n-(6-hydrazinyl-6-oxohexyl)pentanamide Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)NCCCCCC(=O)NN)SC[C@@H]21 IJJWOSAXNHWBPR-HUBLWGQQSA-N 0.000 description 2
- 238000004040 coloring Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- MCSXGCZMEPXKIW-UHFFFAOYSA-N 3-hydroxy-4-[(4-methyl-2-nitrophenyl)diazenyl]-N-(3-nitrophenyl)naphthalene-2-carboxamide Chemical compound Cc1ccc(N=Nc2c(O)c(cc3ccccc23)C(=O)Nc2cccc(c2)[N+]([O-])=O)c(c1)[N+]([O-])=O MCSXGCZMEPXKIW-UHFFFAOYSA-N 0.000 description 1
- 241001647280 Pareques acuminatus Species 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
- 239000010981 turquoise Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R29/00—Monitoring arrangements; Testing arrangements
- H04R29/008—Visual indication of individual signal levels
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Definitions
- this representation involves transformation from consumption of the sound by one sense (e.g., hearing) for consumption by another sense, e.g., visually.
- One technique that has been developed to provide such a representation is through use of a waveform that is displayed visually in a user interface, e.g., as part of sound editing functionality. This typically involves display of a period of time over which the sound it output with indications of intensity (e.g., loudness) of the sound at particular points in time.
- Waveform display control techniques of visual characteristics are described.
- a method is described of increasing user efficiency in identifying particular sounds in a waveform display of sound data without listening to the sound data.
- Sound data received by a computing device is partitioned to form a plurality of sound data time intervals.
- a signature is computed for each of the plurality of sound data time intervals by the computing device based on features extracted from respective sound data time intervals.
- the computed signatures are mapped by the computing device to one or more colors.
- Output of a waveform in a user interface is controlled by the computing device, in which the waveform represents the sound data and each of the sound data time intervals in the waveform have the mapped one or more colors.
- a method is described of increasing user efficiency in identifying particular sounds in a waveform display of sound data without listening to the sound data.
- Sound data received by a computing device is partitioned to form a plurality of sound data time intervals.
- One or more phonemes are identified by the computing device that are included in respective time intervals.
- the one or more phonemes for the respective time intervals are mapped by the computing device to one or more colors.
- Output of a waveform in a user interface is controlled by the computing device, in which the waveform represents the sound data and each of the sound data time intervals in the waveform have the mapped one or more colors thereby identifying respective phonemes.
- a system to increase user efficiency in identification of particular sounds in a waveform display of sound data without listening to the sound data.
- the system includes a partition module implemented at least partially in hardware to partition sound data to form a plurality of sound data time intervals and a signature computation module implemented at least partially in hardware to compute a signature for each of the plurality of sound data time intervals based on features extracted from respective sound data time intervals.
- the system also includes a mapping module implemented at least partially in hardware to map the computed signatures to one or more visual characteristics and a user interface module implemented at least partially in hardware to control output of a waveform in a user interface, in which the waveform represents the sound data and each of the sound data time intervals in the waveform have the mapped one or more visual characteristics.
- FIG. 1 is an illustration of an environment in an example implementation that is operable to employ visual characteristic control techniques described herein.
- FIG. 2 depicts a system in example implementation showing a sound representation module and user interface module of FIG. 1 in greater detail as controlling output of a waveform a user interface.
- FIG. 3 depicts an example implementation of a waveform of FIG. 2 as displayed in a user interface as differentiating speech from other sounds.
- FIG. 4 depicts an example implementation of the waveform of FIG. 2 as displayed in the user interface as differentiating sounds from different musical instruments.
- FIG. 5 depicts an example implementation of the waveform of FIG. 2 as displayed in the user interface as representing the first two measures of Bach's Minuet as played by an oboe.
- FIG. 6 depicts an example implementation of the waveform of FIG. 2 as displayed in the user interface as representing sounds originating from a drum set.
- FIG. 7 depicts an example implementation of the waveform of FIG. 2 as displayed in the user interface as representing the same sounds at different zoom levels.
- FIG. 8 depicts an example implementation of the waveform of FIG. 2 as displayed in the user interface as representing the same sounds at different recording levels in the user interface.
- FIG. 9 depicts an example implementation of the waveforms of FIG. 2 as displayed in the user interface as representing sound files.
- FIG. 10 is a flow diagram depicting a procedure in an example implementation of increasing user efficiency in identifying particular sounds in a waveform display of sound data.
- FIG. 11 is a flow diagram depicting a procedure in an example implementation of increasing user efficiency in identifying phonemes in a waveform display of sound data.
- FIG. 12 illustrates an example system including various components of an example device that can be implemented as any type of computing device as described and/or utilize with reference to FIGS. 1-11 to implement embodiments of the techniques described herein.
- Waveform display control techniques involving visual characteristics are described.
- a waveform is configured based on how a human listener hears sounds.
- Visual characteristics such as colors are used to represent frequencies in a waveform that displays amplitude along one axis and time along another.
- the waveform is generated based on how human listeners hear.
- Phonemes are basic units of a phonology of human language that form meaningful units such as words or morphemes. The phonemes are mapped to colors in this example, with similar phonemes mapped to similar colors.
- the overall amplitude of the waveform is based on how a human listener perceives loudness of the sound, with another axis used to represent when and in what order the sounds are output.
- Example procedures are then described which may be performed in the example environment as well as other environments. Consequently, performance of the example procedures is not limited to the example environment and the example environment is not limited to performance of the example procedures.
- FIG. 1 is an illustration of an environment 100 in an example implementation that is operable to employ waveform display techniques described herein.
- the illustrated environment 100 includes a computing device 102 and a sound capture device 104 , which are configurable in a variety of ways.
- the computing device 102 is configurable as a desktop computer, a laptop computer, a mobile device (e.g., assuming a handheld configuration such as a tablet or mobile phone), and so forth.
- the computing device 102 ranges from full resource devices with substantial memory and processor resources (e.g., personal computers, game consoles) to a low-resource device with limited memory and/or processing resources (e.g., mobile devices).
- a single computing device 102 is shown, the computing device 102 is also representative of a plurality of different devices, such as multiple servers utilized by a business to perform operations “over the cloud” as further described in relation to FIG. 12 .
- the sound capture device 104 is also configurable in a variety of ways. Illustrated examples of one such configuration involves a standalone device but other configurations are also contemplated, such as part of a mobile phone, video camera, tablet computer, part of a desktop microphone, array microphone, and so on. Additionally, although the sound capture device 104 is illustrated separately from the computing device 102 , the sound capture device 104 is configurable as part of the computing device 102 , the sound capture device 104 may be representative of a plurality of sound capture devices, and so on.
- the sound capture device 104 is illustrated as including a sound capture module 106 that is representative of functionality to generate sound data 108 .
- the sound capture device 104 may generate the sound data 108 as a recording of an environment 110 surrounding the sound capture device 104 having one or more sound sources, e.g., such as speech from a user, music, and so forth. This sound data 108 is then obtained by the computing device 102 for processing.
- the computing device 102 is also illustrated as including a sound processing module 112 .
- the sound processing module 112 is representative of functionality to process the sound data 108 .
- functionality represented by the sound processing module 112 may be further divided, such as to be performed “over the cloud” by one or more servers that are accessible via a network 114 connection, further discussion of which may be found in relation to FIG. 12 .
- the sound processing module 112 is represented as a sound representation module 116 and a user interface module 118 .
- the sound representation module 116 is representative of functionality to form a representation of the sound data 108 for output in a user interface 120 .
- the user interface 120 may be configured to support sound editing operations to form edited sound data 122 from the sound data 108 , such as source separation, enhancement, noise removal, splicing, and so forth. Accordingly, the user interface includes a visual representation of the sound data 108 , with which, a user may interact.
- the representation of the sound data 108 in the user interface 120 is usable to identify what sounds are captured by the sound data 108 , such as to differentiate one sound file from another.
- the representation may be included as part of a representation of the sound file (e.g., an icon) which is usable to identity characteristics of the sounds captured in the sound data 108 , e.g., such as whether the sound data 108 includes speech (and even what is being said), music (e.g., characteristics of instruments and sounds in the music), noise, and so forth.
- a representation generated of the sound data 108 by the sound representation module 116 are also contemplated without departing from the spirit and scope thereof as further described in relation to FIG. 9 .
- the sound representation module 116 employs a sound data analysis module 124 and a mapping module 126 in the illustrated example.
- the sound data analysis module 124 is representation of functionality to extract features from the sound data 108 that are indicative of features of the sound data 108 , such as what sounds are captured in the sound data 108 .
- the mapping module 126 is representative of functionality to map these features to visual characteristics that can be visually differentiated by a user to determine differences in different types of sound data 108 .
- the user interface 120 includes a waveform 128 that includes a first axis 132 representing time and a second axis 134 that represents intensity (e.g., loudness) of the sound data 108 at particular points in time.
- intensity e.g., loudness
- Other visual characteristics e.g., color are also used to represent the extracted characteristics of the sound data at these particular points in time.
- the sound data analysis module 124 extracts frequency information from the sound data 108 , which is mapped to a color space by the mapping module 126 .
- the coloring is independent of recording level, and sounds that are perceived as similar by a human listener are represented by colors that are also perceived as similar by the human listener.
- audio-retrieval system can present colored waveforms displays as visual “thumbnails” in a list of sound search results or within a file, and so on. Further discussion of these and other examples is described in the following and shown in corresponding figures.
- FIG. 2 depicts a system 200 in example implementation showing the sound representation module 116 and user interface module 118 of FIG. 1 in greater detail as controlling output of a waveform a user interface.
- the sound representation module 116 includes the sound data analysis module 124 and the mapping module 126 as described in relation to FIG. 1 .
- Sound data 108 e.g., a sequence of digital audio samples
- the sound data analysis module 124 employs a partition module 202 to partition the sound data 108 into sound data time intervals 204 .
- the sound data time intervals 204 form brief consecutive intervals taken from the sound data 108 , e.g., fifty milliseconds for each interval.
- the sound data time intervals 204 are then provided to a signature computation module 206 that is representation of functionality to create signatures 208 that describe differentiating characteristics of the sound data time intervals 204 .
- the signature computation module 206 may employ a feature extraction module 210 to extract frequency information from each of the sound data time intervals 204 , such as by using a Fast Fourier Transform (FFT), linear prediction, wavelets, and so forth.
- FFT Fast Fourier Transform
- linear prediction linear prediction
- wavelets wavelets
- the signatures 208 represent relative strengths of the frequencies while being invariant with respect to scaling and polarity. In this way, amplification or attenuation of the sound data in the sound data time intervals 204 (e.g., multiplication by a nonzero constant) does not alter the signatures 208 .
- the signatures 208 are then used by the mapping module 126 to map one or more visual characteristics 212 (e.g., color, shading, texture, and so on) to the sound data time intervals 204 .
- the mapping module 126 employs a function to each of the signatures 208 to a corresponding color.
- the mapping is performed such that sounds perceived as similar to a human listener are mapped to colors that are also perceived as similar to the human.
- the user interface module 118 uses this mapping to generate a waveform 214 in which the sound data time intervals 204 are associated with visual characteristics 212 , e.g., colors, in the user interface 120 .
- visual characteristics 212 e.g., colors
- each of the sound data time intervals 204 are painted by the color derived from the signature 208 representing the interval, which appear as vertical stripes in the user interface 120 as shown in FIG. 1 .
- FIG. 3 depicts an example implementation 300 of the waveform 214 of FIG. 2 as displayed in the user interface 120 as differentiating speech from other sounds.
- a sixteen-byte signature 208 is mapped to a twenty-four bit color in a red/green/blue color space. The mapping from sound to color is performed so that similar sounds are mapped to similar colors.
- An explosion 302 waveform, scream 304 waveform, siren 306 waveform, and white noise 308 waveform are shown. Red has a connotation of alarm and so does a scream 204 , so a red component is increased in colors assigned to high-frequency sounds, i.e., the scream 304 is displayed using shades of red.
- Low-frequency sounds such as an explosion 302 waveform
- the explosion 302 waveform both looks and sounds ominous.
- Middle to high frequencies are shaded green 310
- low to mid-range frequencies are shaded blue 312 .
- the siren 306 waveform in this example has alternating bands of green and blue such that a user may differentiate between these portions.
- noisy sounds such as the white noise 308 waveform are mapped to a gray color.
- the louder sound is given a proportionally greater weighting on the color mapping.
- a blue sound commences just before the green sound has finished.
- the siren 306 waveform is colored by a mixture of blue and green shades of color.
- FIG. 4 depicts an example implementation 400 of the waveform 214 of FIG. 2 as displayed in the user interface 120 as differentiating sounds from different musical instruments. Mapping from sound to color may be performed to take into account all the frequency information and not solely the pitch. This allows the coloring of polyphony and inharmonic sounds, for which fundamental frequency cannot be determined.
- the same note (e.g., E4) is played by a bassoon 402 , clarinet 404 , English horn 406 , trombone 408 , and violin 410 , but different colors are mapped according to the harmonics of the instruments, e.g. green, purple, gray, blue/green, and blue/green striped, respectively.
- the striped pattern visible in the English horn 406 and violin 410 represent vibrato. Such subtle variations are thus made apparent through use of color in the user interface 120 .
- FIG. 5 depicts an example implementation 500 of the waveform 214 of FIG. 2 as displayed in the user interface 120 as representing the first two measures of Bach's Minuet as played by an oboe.
- each note is assigned a color, e.g., pink, green, orange, light pink, gray, pink again, green again, and fading green. Subtle variations in the notes are observed at the attack and release points through variations in color.
- FIG. 6 depicts an example implementation 600 of the waveform 214 of FIG. 2 as displayed in the user interface 120 as representing sounds originating from a drum set.
- Waveforms of a bass drum 602 , high hat 604 , and snare drum 606 are represented using purple, blue, and gray, respectively and thus are readily distinguishable from each other even though the amplitude and time intervals are similar.
- FIG. 7 depicts an example implementation 700 of the waveform 214 of FIG. 2 as displayed in the user interface 120 as representing the same sounds at different zoom levels.
- a waveform is shown as employing pink 702 , gray 704 , orange 706 , pink 708 , gray 710 , green 712 , orange 714 , pink 716 , and green 718 colors at first, second, and third levels 722 , 724 , 726 of zoom.
- the zooming changes the shape of the amplitude envelopes, but correspondence between color and sound is unchanged, thereby provide a stable visual landmark.
- FIG. 8 depicts an example implementation 800 of the waveform 214 of FIG. 2 as displayed in the user interface 120 as representing the same sounds at different recording levels.
- First, second, and third levels 802 , 804 , 806 that are increasing are shown in the user interface 120 .
- the signatures 208 are invariant with respect to scaling, the colors are unaffected by the changes in recording level in this example. For example, the order of pink 808 , gray 810 , orange 812 , pink 814 , orange 816 , pink 818 , and green 820 colors of peaks of the sound data 108 in the corresponding sound data time intervals 204 is unchanged.
- the number of colors discernible to the human eye is quite less, e.g., approximately 100,000.
- the number of sounds represented by the signatures 208 is approximately 10 30 , and so a many-to-one mapping may be performed by the mapping module 126 .
- the mapping assigns similar sounds to a particular RGB color.
- sounds dominated by very high frequencies e.g., above 2 kHz
- sounds dominated by very high frequencies may be assigned colors that are also used for lower frequencies.
- each audio recording is given a unique mapping of its sounds to the color space. While this may solve the color-shortage problems, users then learn a different correspondence between sound and color for each recording, which may make it difficult to compare color waveform displays of different recordings.
- users are able to learn correspondence between sound and color and develop an ability to visually read audio. That is, the users are able to obtain an impression of how a recording will sound without listening to it by viewing the colored waveform display.
- FIG. 9 depicts an example implementation 900 of the waveforms 214 of FIG. 2 as displayed in the user interface 120 as representing sound files.
- the waveform displays are also usable as visual representations (e.g., “thumbnails”) that represent recordings, e.g., such as in a list of search results returned by an audio-retrieval system.
- the colored waveform display is thus usable to help a user decide whether to listen to a recording retrieved by the system, e.g., for sound effects returned for a search.
- FIG. 10 depicts a procedure 1000 in an example implementation of increasing user efficiency in identifying particular sounds in a waveform display of sound data without listening to the sound data.
- Sound data received by a computing device is partitioned to form a plurality of sound data time intervals (block 1002 ).
- a partition module 202 for instance is usable to form sound data time intervals 204 from sound data 108 as a series of success portions of the data in time.
- a signature is computed for each of the plurality of sound data time intervals by the computing device based on features extracted from respective sound data time intervals (block 1004 ).
- the features for instance, include frequency, harmonics, and other characteristics of sound data 108 suitable to differentiate one or more of the sound data time intervals 204 from each other.
- Signatures 208 are then computed using these features, which may be invariant with respect to scaling and polarity of the sound data within a respective sound data time interval.
- the computed signatures are mapped by the computing device to one or more colors (block 1006 ).
- the signatures 208 may be computed using a frequency analysis in which perceptually-weighted averages are calculated over a plurality of frequency bands, e.g., 0-1500 Hz, 1500-4000 Hz, and 4000 Hz and up.
- the perceptual loudness in these bands is then identified with colors read, green, and blue. From these, a color angle is formed.
- a continuous mapping is then applied to align colors to sounds. For instance, deep vowels like “u” and “o” are mapped to deep red. Fricatives such as “s” and “sh” are mapped to turquoise. Other sounds produce other colors in a smooth manner that preserves distance, that is, similar sounds map to adjacent color angles.
- Output of a waveform in a user interface is controlled by the computing device, in which the waveform represents the sound data and each of the sound data time intervals in the waveform have the mapped one or more colors (block 1008 ).
- a user may readily determine characteristics of sound data visually, such as in a sound editing user interface, as a representation (e.g., thumbnail), and so on without listening to the sound data 108 .
- FIG. 11 depicts a procedure 1100 in an example implementation of increasing user efficiency in identifying phonemes in a waveform display of sound data.
- Sound data received by a computing device is partitioned to form a plurality of sound data time intervals (block 1102 ).
- the sound data time intervals 204 to form a consecutive series of portions of the sound data 108 .
- One or more phonemes are identified by the computing device that are included in respective time intervals (block 1104 ).
- Phonemes are basic units of a phonology of human language that form meaning units such as words or morphemes.
- the sound data analysis module 124 is configured in this example to identify characteristics of phonemes to identify their presence in the sound data time intervals 204 in the sound data 108 .
- the one or more phonemes for the respective time intervals are mapped by the computing device to one or more colors (block 1106 ). For example, sounds of the sound data perceived as similar by human listeners are mapped to colors that are perceived as similar by the human listeners.
- Output of a waveform in a user interface is controlled by the computing device, in which the waveform represents the sound data and each of the sound data time intervals in the waveform have the mapped one or more colors thereby identifying respective phonemes (block 1108 ). In this way, a user may readily determine properties of the sound data 108 without actually listening to the sound data.
- each phenome is represented by a color with similar phonemes mapped to similar colors.
- the overall amplitude of the display of the waveform is based on how human listeners perceive loudness of the sound data 108 . Accordingly, during playback of the sound data 108 and through watching the waveform simultaneously a user may be trained in how the display relates to the speech of other sounds. For instance, a user is able to locate words over a certain length whenever these words occur, if a speaker repeats a phrase it is immediately noticeable, and so on.
- splice points may be automatically identified that promote seamless editing. Thus, with a few minutes of training even a casual user can edit speech in a professional-sounding manner.
- FIG. 12 illustrates an example system generally at 1200 that includes an example computing device 1202 that is representative of one or more computing systems and/or devices that may implement the various techniques described herein. This is illustrated through inclusion of the sound processing module 112 .
- the computing device 1202 may be, for example, a server of a service provider, a device associated with a client (e.g., a client device), an on-chip system, and/or any other suitable computing device or computing system.
- the example computing device 1202 as illustrated includes a processing system 1204 , one or more computer-readable media 1206 , and one or more I/O interface 1208 that are communicatively coupled, one to another.
- the computing device 1202 may further include a system bus or other data and command transfer system that couples the various components, one to another.
- a system bus can include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures.
- a variety of other examples are also contemplated, such as control and data lines.
- the processing system 1204 is representative of functionality to perform one or more operations using hardware. Accordingly, the processing system 1204 is illustrated as including hardware element 1210 that may be configured as processors, functional blocks, and so forth. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors.
- the hardware elements 1210 are not limited by the materials from which they are formed or the processing mechanisms employed therein.
- processors may be comprised of semiconductor(s) and/or transistors (e.g., electronic integrated circuits (ICs)).
- processor-executable instructions may be electronically-executable instructions.
- the computer-readable storage media 1206 is illustrated as including memory/storage 1212 .
- the memory/storage 1212 represents memory/storage capacity associated with one or more computer-readable media.
- the memory/storage component 1212 may include volatile media (such as random access memory (RAM)) and/or nonvolatile media (such as read only memory (ROM), Flash memory, optical disks, magnetic disks, and so forth).
- the memory/storage component 1212 may include fixed media (e.g., RAM, ROM, a fixed hard drive, and so on) as well as removable media (e.g., Flash memory, a removable hard drive, an optical disc, and so forth).
- the computer-readable media 1206 may be configured in a variety of other ways as further described below.
- Input/output interface(s) 1208 are representative of functionality to allow a user to enter commands and information to computing device 1202 , and also allow information to be presented to the user and/or other components or devices using various input/output devices.
- input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone, a scanner, touch functionality (e.g., capacitive or other sensors that are configured to detect physical touch), a camera (e.g., which may employ visible or non-visible wavelengths such as infrared frequencies to recognize movement as gestures that do not involve touch), and so forth.
- Examples of output devices include a display device (e.g., a monitor or projector), speakers, a printer, a network card, tactile-response device, and so forth.
- the computing device 1202 may be configured in a variety of ways as further described below to support user interaction.
- modules include routines, programs, objects, elements, components, data structures, and so forth that perform particular tasks or implement particular abstract data types.
- module generally represent software, firmware, hardware, or a combination thereof.
- the features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of commercial computing platforms having a variety of processors.
- Computer-readable media may include a variety of media that may be accessed by the computing device 1202 .
- computer-readable media may include “computer-readable storage media” and “computer-readable signal media.”
- Computer-readable storage media may refer to media and/or devices that enable persistent and/or non-transitory storage of information in contrast to mere signal transmission, carrier waves, or signals per se. Thus, computer-readable storage media refers to non-signal bearing media.
- the computer-readable storage media includes hardware such as volatile and non-volatile, removable and non-removable media and/or storage devices implemented in a method or technology suitable for storage of information such as computer readable instructions, data structures, program modules, logic elements/circuits, or other data.
- Examples of computer-readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, hard disks, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage device, tangible media, or article of manufacture suitable to store the desired information and which may be accessed by a computer.
- Computer-readable signal media may refer to a signal-bearing medium that is configured to transmit instructions to the hardware of the computing device 1202 , such as via a network.
- Signal media typically may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as carrier waves, data signals, or other transport mechanism.
- Signal media also include any information delivery media.
- modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media.
- hardware elements 1210 and computer-readable media 1206 are representative of modules, programmable device logic and/or fixed device logic implemented in a hardware form that may be employed in some embodiments to implement at least some aspects of the techniques described herein, such as to perform one or more instructions.
- Hardware may include components of an integrated circuit or on-chip system, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a complex programmable logic device (CPLD), and other implementations in silicon or other hardware.
- ASIC application-specific integrated circuit
- FPGA field-programmable gate array
- CPLD complex programmable logic device
- hardware may operate as a processing device that performs program tasks defined by instructions and/or logic embodied by the hardware as well as a hardware utilized to store instructions for execution, e.g., the computer-readable storage media described previously.
- software, hardware, or executable modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage media and/or by one or more hardware elements 1210 .
- the computing device 1202 may be configured to implement particular instructions and/or functions corresponding to the software and/or hardware modules. Accordingly, implementation of a module that is executable by the computing device 1202 as software may be achieved at least partially in hardware, e.g., through use of computer-readable storage media and/or hardware elements 1210 of the processing system 1204 .
- the instructions and/or functions may be executable/operable by one or more articles of manufacture (for example, one or more computing devices 1202 and/or processing systems 1204 ) to implement techniques, modules, and examples described herein.
- the techniques described herein may be supported by various configurations of the computing device 1202 and are not limited to the specific examples of the techniques described herein. This functionality may also be implemented all or in part through use of a distributed system, such as over a “cloud” 1214 via a platform 1216 as described below.
- the cloud 1214 includes and/or is representative of a platform 1216 for resources 1218 .
- the platform 1216 abstracts underlying functionality of hardware (e.g., servers) and software resources of the cloud 1214 .
- the resources 1218 may include applications and/or data that can be utilized while computer processing is executed on servers that are remote from the computing device 1202 .
- Resources 1218 can also include services provided over the Internet and/or through a subscriber network, such as a cellular or Wi-Fi network.
- the platform 1216 may abstract resources and functions to connect the computing device 1202 with other computing devices.
- the platform 1216 may also serve to abstract scaling of resources to provide a corresponding level of scale to encountered demand for the resources 1218 that are implemented via the platform 1216 .
- implementation of functionality described herein may be distributed throughout the system 1200 .
- the functionality may be implemented in part on the computing device 1202 as well as via the platform 1216 that abstracts the functionality of the cloud 1214 .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- General Health & Medical Sciences (AREA)
- Library & Information Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Otolaryngology (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
- Representation of sound in a visual manner continues to provide a variety of challenges. By its very nature, this representation involves transformation from consumption of the sound by one sense (e.g., hearing) for consumption by another sense, e.g., visually. One technique that has been developed to provide such a representation is through use of a waveform that is displayed visually in a user interface, e.g., as part of sound editing functionality. This typically involves display of a period of time over which the sound it output with indications of intensity (e.g., loudness) of the sound at particular points in time.
- However, recognition of sounds within this conventional display of the waveform typically requires significant amounts of experience on the part of a user to even guess at what sounds are being output at corresponding points in time. Consequently, conventional waveforms lack intuitiveness due to limitations in representing the sounds, often requiring users to actually listen to the sound data to locate a particular point of interest, in order to determine what is being represented by the waveform as a whole (e.g., to locate a particular sound file), and so forth.
- Waveform display control techniques of visual characteristics are described. In one or more examples, a method is described of increasing user efficiency in identifying particular sounds in a waveform display of sound data without listening to the sound data. Sound data received by a computing device is partitioned to form a plurality of sound data time intervals. A signature is computed for each of the plurality of sound data time intervals by the computing device based on features extracted from respective sound data time intervals. The computed signatures are mapped by the computing device to one or more colors. Output of a waveform in a user interface is controlled by the computing device, in which the waveform represents the sound data and each of the sound data time intervals in the waveform have the mapped one or more colors.
- In one or more examples, a method is described of increasing user efficiency in identifying particular sounds in a waveform display of sound data without listening to the sound data. Sound data received by a computing device is partitioned to form a plurality of sound data time intervals. One or more phonemes are identified by the computing device that are included in respective time intervals. The one or more phonemes for the respective time intervals are mapped by the computing device to one or more colors. Output of a waveform in a user interface is controlled by the computing device, in which the waveform represents the sound data and each of the sound data time intervals in the waveform have the mapped one or more colors thereby identifying respective phonemes.
- In one or more examples, a system is described to increase user efficiency in identification of particular sounds in a waveform display of sound data without listening to the sound data. The system includes a partition module implemented at least partially in hardware to partition sound data to form a plurality of sound data time intervals and a signature computation module implemented at least partially in hardware to compute a signature for each of the plurality of sound data time intervals based on features extracted from respective sound data time intervals. The system also includes a mapping module implemented at least partially in hardware to map the computed signatures to one or more visual characteristics and a user interface module implemented at least partially in hardware to control output of a waveform in a user interface, in which the waveform represents the sound data and each of the sound data time intervals in the waveform have the mapped one or more visual characteristics.
- This Summary introduces a selection of concepts in a simplified form that are further described below in the Detailed Description. As such, this Summary is not intended to identify essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
- The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different instances in the description and the figures may indicate similar or identical items. Entities represented in the figures may be indicative of one or more entities and thus reference may be made interchangeably to single or plural forms of the entities in the discussion.
-
FIG. 1 is an illustration of an environment in an example implementation that is operable to employ visual characteristic control techniques described herein. -
FIG. 2 depicts a system in example implementation showing a sound representation module and user interface module ofFIG. 1 in greater detail as controlling output of a waveform a user interface. -
FIG. 3 depicts an example implementation of a waveform ofFIG. 2 as displayed in a user interface as differentiating speech from other sounds. -
FIG. 4 depicts an example implementation of the waveform ofFIG. 2 as displayed in the user interface as differentiating sounds from different musical instruments. -
FIG. 5 depicts an example implementation of the waveform ofFIG. 2 as displayed in the user interface as representing the first two measures of Bach's Minuet as played by an oboe. -
FIG. 6 depicts an example implementation of the waveform ofFIG. 2 as displayed in the user interface as representing sounds originating from a drum set. -
FIG. 7 depicts an example implementation of the waveform ofFIG. 2 as displayed in the user interface as representing the same sounds at different zoom levels. -
FIG. 8 depicts an example implementation of the waveform ofFIG. 2 as displayed in the user interface as representing the same sounds at different recording levels in the user interface. -
FIG. 9 depicts an example implementation of the waveforms ofFIG. 2 as displayed in the user interface as representing sound files. -
FIG. 10 is a flow diagram depicting a procedure in an example implementation of increasing user efficiency in identifying particular sounds in a waveform display of sound data. -
FIG. 11 is a flow diagram depicting a procedure in an example implementation of increasing user efficiency in identifying phonemes in a waveform display of sound data. -
FIG. 12 illustrates an example system including various components of an example device that can be implemented as any type of computing device as described and/or utilize with reference toFIGS. 1-11 to implement embodiments of the techniques described herein. - Overview
- Conventional techniques that rely on representation of sound through use of waveforms are difficult to interpret by unpracticed users. Indeed, even seasoned users are typically forced to guess at generalities of the sounds being represented overall, such as to guess whether a particular section of the waveform includes speech or other sounds, e.g., noise and so forth.
- Waveform display control techniques involving visual characteristics are described. In one or more implementations, a waveform is configured based on how a human listener hears sounds. Visual characteristics such as colors are used to represent frequencies in a waveform that displays amplitude along one axis and time along another. For example, in the case of human speech the waveform is generated based on how human listeners hear. Phonemes are basic units of a phonology of human language that form meaningful units such as words or morphemes. The phonemes are mapped to colors in this example, with similar phonemes mapped to similar colors. The overall amplitude of the waveform is based on how a human listener perceives loudness of the sound, with another axis used to represent when and in what order the sounds are output.
- In this way, a user viewing the waveform may more readily determine characteristics of the sounds being represented. These techniques are also applicable to representations of sounds other than human speech, such as noise, music (e.g., particular instruments), and so on, further discussion of which is contained in the following sections and shown in corresponding figures.
- In the following discussion, an example environment is first described that may employ the techniques described herein. Example procedures are then described which may be performed in the example environment as well as other environments. Consequently, performance of the example procedures is not limited to the example environment and the example environment is not limited to performance of the example procedures.
- Example Environment
-
FIG. 1 is an illustration of anenvironment 100 in an example implementation that is operable to employ waveform display techniques described herein. The illustratedenvironment 100 includes acomputing device 102 and asound capture device 104, which are configurable in a variety of ways. - The
computing device 102, for instance, is configurable as a desktop computer, a laptop computer, a mobile device (e.g., assuming a handheld configuration such as a tablet or mobile phone), and so forth. Thus, thecomputing device 102 ranges from full resource devices with substantial memory and processor resources (e.g., personal computers, game consoles) to a low-resource device with limited memory and/or processing resources (e.g., mobile devices). Additionally, although asingle computing device 102 is shown, thecomputing device 102 is also representative of a plurality of different devices, such as multiple servers utilized by a business to perform operations “over the cloud” as further described in relation toFIG. 12 . - The
sound capture device 104 is also configurable in a variety of ways. Illustrated examples of one such configuration involves a standalone device but other configurations are also contemplated, such as part of a mobile phone, video camera, tablet computer, part of a desktop microphone, array microphone, and so on. Additionally, although thesound capture device 104 is illustrated separately from thecomputing device 102, thesound capture device 104 is configurable as part of thecomputing device 102, thesound capture device 104 may be representative of a plurality of sound capture devices, and so on. - The
sound capture device 104 is illustrated as including asound capture module 106 that is representative of functionality to generatesound data 108. Thesound capture device 104, for instance, may generate thesound data 108 as a recording of anenvironment 110 surrounding thesound capture device 104 having one or more sound sources, e.g., such as speech from a user, music, and so forth. Thissound data 108 is then obtained by thecomputing device 102 for processing. - The
computing device 102 is also illustrated as including asound processing module 112. Thesound processing module 112 is representative of functionality to process thesound data 108. Although illustrated as part of thecomputing device 102, functionality represented by thesound processing module 112 may be further divided, such as to be performed “over the cloud” by one or more servers that are accessible via anetwork 114 connection, further discussion of which may be found in relation toFIG. 12 . - An example of functionality of the
sound processing module 112 is represented as asound representation module 116 and a user interface module 118. Thesound representation module 116 is representative of functionality to form a representation of thesound data 108 for output in auser interface 120. Theuser interface 120, for instance, may be configured to support sound editing operations to form editedsound data 122 from thesound data 108, such as source separation, enhancement, noise removal, splicing, and so forth. Accordingly, the user interface includes a visual representation of thesound data 108, with which, a user may interact. - In another example, the representation of the
sound data 108 in theuser interface 120 is usable to identify what sounds are captured by thesound data 108, such as to differentiate one sound file from another. The representation, for instance, may be included as part of a representation of the sound file (e.g., an icon) which is usable to identity characteristics of the sounds captured in thesound data 108, e.g., such as whether thesound data 108 includes speech (and even what is being said), music (e.g., characteristics of instruments and sounds in the music), noise, and so forth. A variety of other uses for a representation generated of thesound data 108 by thesound representation module 116 are also contemplated without departing from the spirit and scope thereof as further described in relation toFIG. 9 . - In order to generate the representation of the
sound data 108, thesound representation module 116 employs a sounddata analysis module 124 and amapping module 126 in the illustrated example. The sounddata analysis module 124 is representation of functionality to extract features from thesound data 108 that are indicative of features of thesound data 108, such as what sounds are captured in thesound data 108. Themapping module 126 is representative of functionality to map these features to visual characteristics that can be visually differentiated by a user to determine differences in different types ofsound data 108. - In the illustrated example, the
user interface 120 includes awaveform 128 that includes afirst axis 132 representing time and asecond axis 134 that represents intensity (e.g., loudness) of thesound data 108 at particular points in time. Other visual characteristics (e.g., color) are also used to represent the extracted characteristics of the sound data at these particular points in time. - The sound
data analysis module 124, for instance, extracts frequency information from thesound data 108, which is mapped to a color space by themapping module 126. In one or more implementations, the coloring is independent of recording level, and sounds that are perceived as similar by a human listener are represented by colors that are also perceived as similar by the human listener. In this way, sound editing techniques are enhanced by the improveduser interface 120, audio-retrieval system can present colored waveforms displays as visual “thumbnails” in a list of sound search results or within a file, and so on. Further discussion of these and other examples is described in the following and shown in corresponding figures. -
FIG. 2 depicts asystem 200 in example implementation showing thesound representation module 116 and user interface module 118 ofFIG. 1 in greater detail as controlling output of a waveform a user interface. Thesound representation module 116 includes the sounddata analysis module 124 and themapping module 126 as described in relation toFIG. 1 . -
Sound data 108, e.g., a sequence of digital audio samples, is received by thesound representation module 116. The sounddata analysis module 124 employs apartition module 202 to partition thesound data 108 into sounddata time intervals 204. For example, the sounddata time intervals 204 form brief consecutive intervals taken from thesound data 108, e.g., fifty milliseconds for each interval. - The sound
data time intervals 204 are then provided to asignature computation module 206 that is representation of functionality to createsignatures 208 that describe differentiating characteristics of the sounddata time intervals 204. For example, thesignature computation module 206 may employ afeature extraction module 210 to extract frequency information from each of the sounddata time intervals 204, such as by using a Fast Fourier Transform (FFT), linear prediction, wavelets, and so forth. - In one or more implementations, the
signatures 208 represent relative strengths of the frequencies while being invariant with respect to scaling and polarity. In this way, amplification or attenuation of the sound data in the sound data time intervals 204 (e.g., multiplication by a nonzero constant) does not alter thesignatures 208. - The
signatures 208 are then used by themapping module 126 to map one or more visual characteristics 212 (e.g., color, shading, texture, and so on) to the sounddata time intervals 204. In a color example, themapping module 126 employs a function to each of thesignatures 208 to a corresponding color. There are an endless number of possible mappings, however, in one or more implementations the mapping is performed such that sounds perceived as similar to a human listener are mapped to colors that are also perceived as similar to the human. - The user interface module 118 then uses this mapping to generate a
waveform 214 in which the sounddata time intervals 204 are associated withvisual characteristics 212, e.g., colors, in theuser interface 120. Thus, within thewaveform 214, each of the sounddata time intervals 204 are painted by the color derived from thesignature 208 representing the interval, which appear as vertical stripes in theuser interface 120 as shown inFIG. 1 . -
FIG. 3 depicts anexample implementation 300 of thewaveform 214 ofFIG. 2 as displayed in theuser interface 120 as differentiating speech from other sounds. In this example, a sixteen-byte signature 208 is mapped to a twenty-four bit color in a red/green/blue color space. The mapping from sound to color is performed so that similar sounds are mapped to similar colors. Anexplosion 302 waveform, scream 304 waveform,siren 306 waveform, andwhite noise 308 waveform are shown. Red has a connotation of alarm and so does ascream 204, so a red component is increased in colors assigned to high-frequency sounds, i.e., thescream 304 is displayed using shades of red. - Low-frequency sounds, such as an
explosion 302 waveform, are given dark colors so theexplosion 302 waveform both looks and sounds ominous. Middle to high frequencies are shaded green 310, while low to mid-range frequencies are shaded blue 312. Thus, thesiren 306 waveform in this example has alternating bands of green and blue such that a user may differentiate between these portions. - Noisy sounds such as the
white noise 308 waveform are mapped to a gray color. When distinct sounds are played together, the louder sound is given a proportionally greater weighting on the color mapping. In thesiren 306 waveform example, for instance, a blue sound commences just before the green sound has finished. Thus, in the brief interval when both sounds can be heard, thesiren 306 waveform is colored by a mixture of blue and green shades of color. -
FIG. 4 depicts anexample implementation 400 of thewaveform 214 ofFIG. 2 as displayed in theuser interface 120 as differentiating sounds from different musical instruments. Mapping from sound to color may be performed to take into account all the frequency information and not solely the pitch. This allows the coloring of polyphony and inharmonic sounds, for which fundamental frequency cannot be determined. - In this example, the same note (e.g., E4) is played by a
bassoon 402,clarinet 404,English horn 406,trombone 408, andviolin 410, but different colors are mapped according to the harmonics of the instruments, e.g. green, purple, gray, blue/green, and blue/green striped, respectively. The striped pattern visible in theEnglish horn 406 andviolin 410 represent vibrato. Such subtle variations are thus made apparent through use of color in theuser interface 120. -
FIG. 5 depicts anexample implementation 500 of thewaveform 214 ofFIG. 2 as displayed in theuser interface 120 as representing the first two measures of Bach's Minuet as played by an oboe. In this example, each note is assigned a color, e.g., pink, green, orange, light pink, gray, pink again, green again, and fading green. Subtle variations in the notes are observed at the attack and release points through variations in color. -
FIG. 6 depicts anexample implementation 600 of thewaveform 214 ofFIG. 2 as displayed in theuser interface 120 as representing sounds originating from a drum set. Waveforms of abass drum 602,high hat 604, andsnare drum 606 are represented using purple, blue, and gray, respectively and thus are readily distinguishable from each other even though the amplitude and time intervals are similar. -
FIG. 7 depicts anexample implementation 700 of thewaveform 214 ofFIG. 2 as displayed in theuser interface 120 as representing the same sounds at different zoom levels. A waveform is shown as employingpink 702, gray 704,orange 706,pink 708, gray 710, green 712,orange 714,pink 716, and green 718 colors at first, second, andthird levels -
FIG. 8 depicts anexample implementation 800 of thewaveform 214 ofFIG. 2 as displayed in theuser interface 120 as representing the same sounds at different recording levels. First, second, andthird levels user interface 120. Because thesignatures 208 are invariant with respect to scaling, the colors are unaffected by the changes in recording level in this example. For example, the order ofpink 808, gray 810,orange 812,pink 814,orange 816,pink 818, and green 820 colors of peaks of thesound data 108 in the corresponding sounddata time intervals 204 is unchanged. - Although there are more than sixteen million colors available in the 24-bit color space, the number of colors discernible to the human eye is quite less, e.g., approximately 100,000. The number of sounds represented by the
signatures 208, however, is approximately 1030, and so a many-to-one mapping may be performed by themapping module 126. In one or more implementations, the mapping assigns similar sounds to a particular RGB color. However, due to the shortage of discernible colors, sounds dominated by very high frequencies (e.g., above 2 kHz) may be assigned colors that are also used for lower frequencies. - In an example, rather than map the entire sonic universe to the color space, each audio recording is given a unique mapping of its sounds to the color space. While this may solve the color-shortage problems, users then learn a different correspondence between sound and color for each recording, which may make it difficult to compare color waveform displays of different recordings. In another example, by using only a single mapping from sound to color, users are able to learn correspondence between sound and color and develop an ability to visually read audio. That is, the users are able to obtain an impression of how a recording will sound without listening to it by viewing the colored waveform display.
-
FIG. 9 depicts anexample implementation 900 of thewaveforms 214 ofFIG. 2 as displayed in theuser interface 120 as representing sound files. In addition to use inuser interfaces 120 configured to support editing of thesound data 108, the waveform displays are also usable as visual representations (e.g., “thumbnails”) that represent recordings, e.g., such as in a list of search results returned by an audio-retrieval system. The colored waveform display is thus usable to help a user decide whether to listen to a recording retrieved by the system, e.g., for sound effects returned for a search. - Example Procedures
- The following discussion describes waveform display control techniques that may be implemented utilizing the previously described systems and devices. Aspects of each of the procedures may be implemented in hardware, firmware, or software, or a combination thereof. The procedures are shown as a set of blocks that specify operations performed by one or more devices and are not necessarily limited to the orders shown for performing the operations by the respective blocks. In portions of the following discussion, reference will be made to
FIGS. 1-9 . -
FIG. 10 depicts aprocedure 1000 in an example implementation of increasing user efficiency in identifying particular sounds in a waveform display of sound data without listening to the sound data. Sound data received by a computing device is partitioned to form a plurality of sound data time intervals (block 1002). Apartition module 202, for instance is usable to form sounddata time intervals 204 fromsound data 108 as a series of success portions of the data in time. - A signature is computed for each of the plurality of sound data time intervals by the computing device based on features extracted from respective sound data time intervals (block 1004). The features, for instance, include frequency, harmonics, and other characteristics of
sound data 108 suitable to differentiate one or more of the sounddata time intervals 204 from each other.Signatures 208 are then computed using these features, which may be invariant with respect to scaling and polarity of the sound data within a respective sound data time interval. - The computed signatures are mapped by the computing device to one or more colors (block 1006). Continuing with the previous example, the
signatures 208 may be computed using a frequency analysis in which perceptually-weighted averages are calculated over a plurality of frequency bands, e.g., 0-1500 Hz, 1500-4000 Hz, and 4000 Hz and up. The perceptual loudness in these bands is then identified with colors read, green, and blue. From these, a color angle is formed. A continuous mapping is then applied to align colors to sounds. For instance, deep vowels like “u” and “o” are mapped to deep red. Fricatives such as “s” and “sh” are mapped to turquoise. Other sounds produce other colors in a smooth manner that preserves distance, that is, similar sounds map to adjacent color angles. - Output of a waveform in a user interface is controlled by the computing device, in which the waveform represents the sound data and each of the sound data time intervals in the waveform have the mapped one or more colors (block 1008). In this way, a user may readily determine characteristics of sound data visually, such as in a sound editing user interface, as a representation (e.g., thumbnail), and so on without listening to the
sound data 108. -
FIG. 11 depicts aprocedure 1100 in an example implementation of increasing user efficiency in identifying phonemes in a waveform display of sound data. Sound data received by a computing device is partitioned to form a plurality of sound data time intervals (block 1102). As before, the sounddata time intervals 204 to form a consecutive series of portions of thesound data 108. - One or more phonemes are identified by the computing device that are included in respective time intervals (block 1104). Phonemes are basic units of a phonology of human language that form meaning units such as words or morphemes. Accordingly, the sound
data analysis module 124 is configured in this example to identify characteristics of phonemes to identify their presence in the sounddata time intervals 204 in thesound data 108. - The one or more phonemes for the respective time intervals are mapped by the computing device to one or more colors (block 1106). For example, sounds of the sound data perceived as similar by human listeners are mapped to colors that are perceived as similar by the human listeners.
- Output of a waveform in a user interface is controlled by the computing device, in which the waveform represents the sound data and each of the sound data time intervals in the waveform have the mapped one or more colors thereby identifying respective phonemes (block 1108). In this way, a user may readily determine properties of the
sound data 108 without actually listening to the sound data. - For example, each phenome is represented by a color with similar phonemes mapped to similar colors. The overall amplitude of the display of the waveform is based on how human listeners perceive loudness of the
sound data 108. Accordingly, during playback of thesound data 108 and through watching the waveform simultaneously a user may be trained in how the display relates to the speech of other sounds. For instance, a user is able to locate words over a certain length whenever these words occur, if a speaker repeats a phrase it is immediately noticeable, and so on. In addition, splice points may be automatically identified that promote seamless editing. Thus, with a few minutes of training even a casual user can edit speech in a professional-sounding manner. - Example System and Device
-
FIG. 12 illustrates an example system generally at 1200 that includes anexample computing device 1202 that is representative of one or more computing systems and/or devices that may implement the various techniques described herein. This is illustrated through inclusion of thesound processing module 112. Thecomputing device 1202 may be, for example, a server of a service provider, a device associated with a client (e.g., a client device), an on-chip system, and/or any other suitable computing device or computing system. - The
example computing device 1202 as illustrated includes aprocessing system 1204, one or more computer-readable media 1206, and one or more I/O interface 1208 that are communicatively coupled, one to another. Although not shown, thecomputing device 1202 may further include a system bus or other data and command transfer system that couples the various components, one to another. A system bus can include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. A variety of other examples are also contemplated, such as control and data lines. - The
processing system 1204 is representative of functionality to perform one or more operations using hardware. Accordingly, theprocessing system 1204 is illustrated as including hardware element 1210 that may be configured as processors, functional blocks, and so forth. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors. The hardware elements 1210 are not limited by the materials from which they are formed or the processing mechanisms employed therein. For example, processors may be comprised of semiconductor(s) and/or transistors (e.g., electronic integrated circuits (ICs)). In such a context, processor-executable instructions may be electronically-executable instructions. - The computer-readable storage media 1206 is illustrated as including memory/storage 1212. The memory/storage 1212 represents memory/storage capacity associated with one or more computer-readable media. The memory/storage component 1212 may include volatile media (such as random access memory (RAM)) and/or nonvolatile media (such as read only memory (ROM), Flash memory, optical disks, magnetic disks, and so forth). The memory/storage component 1212 may include fixed media (e.g., RAM, ROM, a fixed hard drive, and so on) as well as removable media (e.g., Flash memory, a removable hard drive, an optical disc, and so forth). The computer-readable media 1206 may be configured in a variety of other ways as further described below.
- Input/output interface(s) 1208 are representative of functionality to allow a user to enter commands and information to
computing device 1202, and also allow information to be presented to the user and/or other components or devices using various input/output devices. Examples of input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone, a scanner, touch functionality (e.g., capacitive or other sensors that are configured to detect physical touch), a camera (e.g., which may employ visible or non-visible wavelengths such as infrared frequencies to recognize movement as gestures that do not involve touch), and so forth. Examples of output devices include a display device (e.g., a monitor or projector), speakers, a printer, a network card, tactile-response device, and so forth. Thus, thecomputing device 1202 may be configured in a variety of ways as further described below to support user interaction. - Various techniques may be described herein in the general context of software, hardware elements, or program modules. Generally, such modules include routines, programs, objects, elements, components, data structures, and so forth that perform particular tasks or implement particular abstract data types. The terms “module,” “functionality,” and “component” as used herein generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of commercial computing platforms having a variety of processors.
- An implementation of the described modules and techniques may be stored on or transmitted across some form of computer-readable media. The computer-readable media may include a variety of media that may be accessed by the
computing device 1202. By way of example, and not limitation, computer-readable media may include “computer-readable storage media” and “computer-readable signal media.” - “Computer-readable storage media” may refer to media and/or devices that enable persistent and/or non-transitory storage of information in contrast to mere signal transmission, carrier waves, or signals per se. Thus, computer-readable storage media refers to non-signal bearing media. The computer-readable storage media includes hardware such as volatile and non-volatile, removable and non-removable media and/or storage devices implemented in a method or technology suitable for storage of information such as computer readable instructions, data structures, program modules, logic elements/circuits, or other data. Examples of computer-readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, hard disks, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage device, tangible media, or article of manufacture suitable to store the desired information and which may be accessed by a computer.
- “Computer-readable signal media” may refer to a signal-bearing medium that is configured to transmit instructions to the hardware of the
computing device 1202, such as via a network. Signal media typically may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as carrier waves, data signals, or other transport mechanism. Signal media also include any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. - As previously described, hardware elements 1210 and computer-readable media 1206 are representative of modules, programmable device logic and/or fixed device logic implemented in a hardware form that may be employed in some embodiments to implement at least some aspects of the techniques described herein, such as to perform one or more instructions. Hardware may include components of an integrated circuit or on-chip system, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a complex programmable logic device (CPLD), and other implementations in silicon or other hardware. In this context, hardware may operate as a processing device that performs program tasks defined by instructions and/or logic embodied by the hardware as well as a hardware utilized to store instructions for execution, e.g., the computer-readable storage media described previously.
- Combinations of the foregoing may also be employed to implement various techniques described herein. Accordingly, software, hardware, or executable modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage media and/or by one or more hardware elements 1210. The
computing device 1202 may be configured to implement particular instructions and/or functions corresponding to the software and/or hardware modules. Accordingly, implementation of a module that is executable by thecomputing device 1202 as software may be achieved at least partially in hardware, e.g., through use of computer-readable storage media and/or hardware elements 1210 of theprocessing system 1204. The instructions and/or functions may be executable/operable by one or more articles of manufacture (for example, one ormore computing devices 1202 and/or processing systems 1204) to implement techniques, modules, and examples described herein. - The techniques described herein may be supported by various configurations of the
computing device 1202 and are not limited to the specific examples of the techniques described herein. This functionality may also be implemented all or in part through use of a distributed system, such as over a “cloud” 1214 via aplatform 1216 as described below. - The
cloud 1214 includes and/or is representative of aplatform 1216 forresources 1218. Theplatform 1216 abstracts underlying functionality of hardware (e.g., servers) and software resources of thecloud 1214. Theresources 1218 may include applications and/or data that can be utilized while computer processing is executed on servers that are remote from thecomputing device 1202.Resources 1218 can also include services provided over the Internet and/or through a subscriber network, such as a cellular or Wi-Fi network. - The
platform 1216 may abstract resources and functions to connect thecomputing device 1202 with other computing devices. Theplatform 1216 may also serve to abstract scaling of resources to provide a corresponding level of scale to encountered demand for theresources 1218 that are implemented via theplatform 1216. Accordingly, in an interconnected device embodiment, implementation of functionality described herein may be distributed throughout thesystem 1200. For example, the functionality may be implemented in part on thecomputing device 1202 as well as via theplatform 1216 that abstracts the functionality of thecloud 1214. - Although the invention has been described in language specific to structural features and/or methodological acts, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as example forms of implementing the claimed invention.
Claims (22)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/663,231 US9445210B1 (en) | 2015-03-19 | 2015-03-19 | Waveform display control of visual characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/663,231 US9445210B1 (en) | 2015-03-19 | 2015-03-19 | Waveform display control of visual characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
US9445210B1 US9445210B1 (en) | 2016-09-13 |
US20160277864A1 true US20160277864A1 (en) | 2016-09-22 |
Family
ID=56881114
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/663,231 Active US9445210B1 (en) | 2015-03-19 | 2015-03-19 | Waveform display control of visual characteristics |
Country Status (1)
Country | Link |
---|---|
US (1) | US9445210B1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018129382A1 (en) * | 2017-01-09 | 2018-07-12 | Inmusic Brands, Inc. | Systems and methods for displaying graphics about a control wheel's center |
JP6430609B1 (en) * | 2017-10-20 | 2018-11-28 | EncodeRing株式会社 | Jewelery modeling system, jewelry modeling program, and jewelry modeling method |
CN112667193A (en) * | 2020-12-22 | 2021-04-16 | 北京小米移动软件有限公司 | Shell display state control method and device, electronic equipment and storage medium |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6184898B1 (en) | 1998-03-26 | 2001-02-06 | Comparisonics Corporation | Waveform display utilizing frequency-based coloring and navigation |
US7232948B2 (en) * | 2003-07-24 | 2007-06-19 | Hewlett-Packard Development Company, L.P. | System and method for automatic classification of music |
US7500190B1 (en) * | 2005-04-13 | 2009-03-03 | Apple Inc. | Visual feedback to illustrate effects of editing operations |
US20070067174A1 (en) * | 2005-09-22 | 2007-03-22 | International Business Machines Corporation | Visual comparison of speech utterance waveforms in which syllables are indicated |
HUP0600540A2 (en) * | 2006-06-27 | 2008-03-28 | Ave Fon Kft | System for and method of visualizing audio signals |
US7521622B1 (en) * | 2007-02-16 | 2009-04-21 | Hewlett-Packard Development Company, L.P. | Noise-resistant detection of harmonic segments of audio signals |
WO2008130660A1 (en) * | 2007-04-20 | 2008-10-30 | Master Key, Llc | Archiving of environmental sounds using visualization components |
US8037413B2 (en) * | 2007-09-06 | 2011-10-11 | Adobe Systems Incorporated | Brush tool for audio editing |
KR20090087394A (en) * | 2008-02-12 | 2009-08-17 | 이관영 | Apparatus and method of manufacturing goods using sound |
US8890869B2 (en) * | 2008-08-12 | 2014-11-18 | Adobe Systems Incorporated | Colorization of audio segments |
US20100198583A1 (en) * | 2009-02-04 | 2010-08-05 | Aibelive Co., Ltd. | Indicating method for speech recognition system |
US9898086B2 (en) * | 2013-09-06 | 2018-02-20 | Immersion Corporation | Systems and methods for visual processing of spectrograms to generate haptic effects |
-
2015
- 2015-03-19 US US14/663,231 patent/US9445210B1/en active Active
Also Published As
Publication number | Publication date |
---|---|
US9445210B1 (en) | 2016-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109785820B (en) | Processing method, device and equipment | |
CN105632508B (en) | Audio processing method and audio processing device | |
US10559323B2 (en) | Audio and video synchronizing perceptual model | |
WO2020113733A1 (en) | Animation generation method and apparatus, electronic device, and computer-readable storage medium | |
CN113921022B (en) | Audio signal separation method, device, storage medium and electronic equipment | |
US9445210B1 (en) | Waveform display control of visual characteristics | |
JP7140221B2 (en) | Information processing method, information processing device and program | |
WO2020015411A1 (en) | Method and device for training adaptation level evaluation model, and method and device for evaluating adaptation level | |
CN113614828A (en) | Method and apparatus for fingerprinting audio signals via normalization | |
KR20160076316A (en) | Apparatus and method for producing a rhythm game, and computer program for executing the method | |
JP2023071787A (en) | Method and apparatus for extracting pitch-independent timbre attribute from medium signal | |
EP2660815B1 (en) | Methods and apparatus for audio processing | |
Felipe et al. | Acoustic scene classification using spectrograms | |
CN113287169A (en) | Apparatus, method and computer program for blind source separation and remixing | |
Marui et al. | Timbre of nonlinear distortion effects: Perceptual attributes beyond sharpness | |
CN109495786B (en) | Pre-configuration method and device of video processing parameter information and electronic equipment | |
Lagrange et al. | Semi-automatic mono to stereo up-mixing using sound source formation | |
CN114678038A (en) | Audio noise detection method, computer device and computer program product | |
Lorho | Perceptual evaluation of mobile multimedia loudspeakers | |
Rice | Frequency-based coloring of the waveform display to facilitate audio editing and retrieval | |
US7949420B2 (en) | Methods and graphical user interfaces for displaying balance and correlation information of signals | |
Benson | Toward Perceptual Searching of Room Impulse Response Libraries | |
Engeln et al. | VisualAudio-Design–towards a graphical Sounddesign | |
Freire et al. | Development of Audio Descriptors Inspired by Schaefferian Criteria: A Set of Tools for Interactive Exploration of Percussive Sounds | |
US20240249715A1 (en) | Auditory augmentation of speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ADOBE SYSTEMS INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MOORER, JAMES ANDERSON;REEL/FRAME:035265/0738 Effective date: 20150318 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: ADOBE INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:ADOBE SYSTEMS INCORPORATED;REEL/FRAME:048867/0882 Effective date: 20181008 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |