CN107885845B - Audio classification method and device, computer equipment and storage medium - Google Patents
Audio classification method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN107885845B CN107885845B CN201711107617.3A CN201711107617A CN107885845B CN 107885845 B CN107885845 B CN 107885845B CN 201711107617 A CN201711107617 A CN 201711107617A CN 107885845 B CN107885845 B CN 107885845B
- Authority
- CN
- China
- Prior art keywords
- audio
- target entry
- entry
- classification information
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 238000004590 computer program Methods 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 13
- 238000012360 testing method Methods 0.000 description 8
- 230000005236 sound signal Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000003111 delayed effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/686—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Electrophonic Musical Instruments (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an audio classification method and device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring a target entry to which audio to be classified belongs, wherein the target entry comprises audio with the same audio attribute, and the audio attribute is used for representing the characteristics of the audio; judging whether the target entry is a pure music entry; and when the target entry is a pure music entry, determining that the audio to be classified is pure music. The invention solves the problem of low reliability of audio classification in the related art. The invention is used for audio classification.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an audio classification method and apparatus, a computer device, and a storage medium.
Background
With the rapid development of the internet and mobile communication technology, listening to music has become an important entertainment mode in people's life, and therefore music playing software needs to be configured with a music library with rich resources, so as to provide services for users. To meet the personalized needs of the user, it is usually necessary to classify the music in the music library, for example, the classification of the music in the music library may include pure music classification.
In the related art, the voices in the audio may be identified through an automatic identification algorithm (e.g., a machine learning algorithm), the voices in the music library may be classified into a voiced music category, and the voices without voices in the music library may be classified into a pure music category.
However, the audio included in the pure music category obtained by classification in the related art is only silent, and the silent audio also includes accompaniment audio, and since the pure music is an audio that does not include word-filling and the accompaniment audio is an audio part of word-filling music other than words, that is, the accompaniment audio is an audio part of human vocal music other than human voice, the accompaniment audio cannot be calculated as pure music, and the reliability of audio classification in the related art is low.
Disclosure of Invention
The embodiment of the invention provides an audio classification method and device, computer equipment and a storage medium, which can solve the problem of low reliability of audio classification in the related art. The technical scheme is as follows:
in a first aspect, a method for audio classification is provided, the method comprising:
acquiring a target entry to which audio to be classified belongs, wherein the target entry comprises audio with the same audio attribute, and the audio attribute is used for representing the characteristics of the audio;
judging whether the target entry is a pure music entry;
and when the target entry is a pure music entry, determining that the audio to be classified is pure music.
Optionally, each audio in the target entry has classification information and a class label, where the classification information is used to characterize a coarse-grained type of the audio, the class label is used to characterize a fine-grained type of the audio, the classification information includes a suspected voice class or a suspected pure music class, and the class label includes an accompaniment label;
the judging whether the target entry is a pure music entry includes:
when the classification information of all audios in the target entry is not suspicious human voice and the classification labels of all audios in the target entry are not accompaniment labels, judging whether a first ratio is larger than a preset ratio threshold, wherein the first ratio is the ratio of the number of the audios of which the classification information is suspicious pure music in all audios to the number of all audios;
and when the first ratio is larger than the preset ratio threshold, determining that the target entry is a pure music entry.
Optionally, the category tag further includes a ring tone tag, and the method further includes:
when the first ratio is not larger than the preset ratio threshold, detecting whether the classification information of the audios except the audios of which the class labels are the ring labels in the target entry is suspicious pure music classes;
and when the classification information of the audios except the audios with the class labels being the ring tone labels in the target entry is the suspicious pure music class, determining that the target entry is a pure music entry.
Optionally, the category tag further includes an accompaniment tag, and before the detecting whether the first ratio is greater than a preset ratio threshold, the method further includes:
detecting whether the category label of each audio frequency in the target entry is an accompaniment label;
and when the category label of any audio frequency in the target entry is the accompaniment label, determining that the target entry is the non-pure music entry.
Optionally, the method further includes:
determining classification information of each audio frequency in the target entry;
detecting whether the classification information of each audio frequency in the target entry is a suspicious human voice class;
and when the classification information of any audio frequency in the target entry is a suspicious human voice class, determining that the target entry is a non-pure music entry.
Optionally, the category label further includes a single song label, and the method further includes:
judging whether the category label of each audio frequency in the target entry is a single song label or not;
when the category label of a certain audio frequency in the target entry is a single song label, detecting whether the tone quality of the certain audio frequency meets a preset tone quality condition;
and when the tone quality of the certain audio does not meet the preset tone quality condition, determining that the target entry is a non-pure music entry.
Optionally, after detecting whether the sound quality of the certain audio meets a preset sound quality condition, the method further includes:
when the tone quality of the certain audio meets the preset tone quality condition, judging whether the classification information of the certain audio is suspicious pure music;
and when the classification information of the certain audio is not a suspicious pure music class, determining that the target entry is a non-pure music entry.
Optionally, the classification information further includes a class to be determined, and after the detecting whether the class label of each audio in the target entry is a single song label, the method further includes:
when the category labels of all audios in the target entry are not the single music labels, or the audios of which the category labels are the single music labels in the target entry meet the preset tone quality condition and the classification information is a suspicious pure music class, detecting whether the classification information of each audio in the target entry is a class to be determined;
and when the classification information of any audio is a class to be determined and the audio name of any audio comprises pure music typefaces, if the first audio has the same class label with the any audio in the target entry and the classification information of the first audio is a suspicious pure music class, updating the classification information of any audio into the suspicious pure music class.
Optionally, the determining the classification information of each audio in the target entry includes:
and determining the classification information of each audio in the target entry by calling a musly library.
Optionally, the range of the preset ratio threshold is 0.6-0.75.
Optionally, the audio attributes are the same, including the same name of the singer and the same keyword in the audio name.
Optionally, the category tag includes a single song tag, a ringtone tag, an accompaniment tag, a live music tag, a pure music tag, or an electronic music tag.
In a second aspect, there is provided an audio classification apparatus, the apparatus comprising:
the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring a target entry to which audio to be classified belongs, the target entry comprises audio with the same audio attribute, and the audio attribute is used for representing the characteristics of the audio;
the first judgment module is used for judging whether the target entry is a pure music entry;
and the first determining module is used for determining that the audio to be classified is pure music when the target entry is a pure music entry.
Optionally, each audio in the target entry has classification information and a class label, where the classification information is used to characterize a coarse-grained type of the audio, the class label is used to characterize a fine-grained type of the audio, the classification information includes a suspected voice class or a suspected pure music class, and the class label includes an accompaniment label;
the first judging module comprises:
the judgment submodule is used for judging whether a first ratio is larger than a preset ratio threshold value or not when the classification information of all audios in the target entry is not suspicious human voice and the classification labels of all audios in the target entry are not accompaniment labels, wherein the first ratio is the ratio of the number of the audios of which the classification information is suspicious pure music in all the audios to the number of all the audios;
and the determining submodule is used for determining that the target entry is a pure music entry when the first ratio is greater than the preset ratio threshold.
Optionally, the category tag further includes a ring tag, and the apparatus further includes:
the first detection module is used for detecting whether the classification information of the audios except the audio with the class label being the ring tone label in the target entry is suspicious pure music class or not when the first ratio is not larger than the preset ratio threshold;
and the second determining module is used for determining that the target entry is a pure music entry when the classification information of the audios except the audios with the class labels being the ring tone labels in the target entry is the suspicious pure music class.
Optionally, the category tag further includes an accompaniment tag, and the apparatus further includes:
the second detection module is used for detecting whether the category label of each audio frequency in the target entry is an accompaniment label;
and the third determining module is used for determining that the target entry is a non-pure music entry when the category label of any audio frequency in the target entry is an accompaniment label.
Optionally, the apparatus further comprises:
the fourth determining module is used for determining the classification information of each audio frequency in the target entry;
the third detection module is used for detecting whether the classification information of each audio frequency in the target entry is a suspicious human voice class;
and the fifth determining module is used for determining that the target entry is a non-pure music entry when the classification information of any audio frequency in the target entry is a suspicious human voice class.
Optionally, the category label further includes a single song label, and the apparatus further includes:
the second judgment module is used for judging whether the category label of each audio frequency in the target entry is a single song label or not;
the fourth detection module is used for detecting whether the tone quality of a certain audio meets a preset tone quality condition or not when the category label of the certain audio in the target entry is a single song label;
and the sixth determining module is used for determining the target entry as a non-pure music entry when the tone quality of the certain audio does not meet the preset tone quality condition.
Optionally, the apparatus further comprises:
the third judgment module is used for judging whether the classification information of the certain audio is suspicious pure music or not when the tone quality of the certain audio meets the preset tone quality condition;
and the seventh determining module is used for determining that the target entry is a non-pure music entry when the classification information of the certain audio is not a suspicious pure music class.
Optionally, the classification information further includes a class to be determined, and the apparatus further includes:
a fifth detection module, configured to detect whether the classification information of each audio in the target entry is a class to be determined when the class labels of all the audios in the target entry are not the song labels, or the audios in the target entry, of which the class labels are the song labels, all meet the preset sound quality condition and the classification information is a suspicious pure music class;
and the updating module is used for updating the classification information of any audio into a suspicious pure music class if the first audio has the same class label with the any audio and the classification information of the first audio is the suspicious pure music class when the classification information of any audio is the class to be determined and the audio name of any audio comprises pure music characters.
Optionally, the fourth determining module is configured to:
and determining the classification information of each audio in the target entry by calling a musly library.
Optionally, the range of the preset ratio threshold is 0.6-0.75.
Optionally, the audio attributes are the same, including the same name of the singer and the same keyword in the audio name.
Optionally, the category tag includes a single song tag, a ringtone tag, an accompaniment tag, a live music tag, a pure music tag, or an electronic music tag.
In a third aspect, there is provided a computer device comprising a processor and a memory,
wherein,
the memory is used for storing a computer program;
the processor is configured to execute the program stored in the memory to implement the audio classification method according to any one of the first aspect.
In a fourth aspect, a storage medium is provided, in which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the audio classification method of any of the first aspects.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
the audio classification method and device, the computer device and the storage medium provided by the embodiment of the invention firstly obtain the target entry to which the audio to be classified belongs, then judge whether the target entry is a pure music entry, and when the target entry is the pure music entry, determine that the audio to be classified is pure music.
Drawings
Fig. 1 is a flowchart of an audio classification method according to an embodiment of the present invention;
FIG. 2-1 is a flow chart of another audio classification method provided by an embodiment of the invention;
FIG. 2-2 is a schematic diagram of a first non-pure music determination process according to an embodiment of the present invention;
FIGS. 2-3 are schematic diagrams of a second non-pure music determination process according to an embodiment of the present invention;
FIGS. 2-4 are schematic diagrams illustrating a third non-pure music determination process according to an embodiment of the present invention;
FIG. 3 is a flow chart of another audio classification method provided by the embodiment of the invention;
fig. 4-1 is a schematic structural diagram of an audio classification apparatus according to an embodiment of the present invention;
fig. 4-2 is a schematic structural diagram of another audio classification apparatus provided in the embodiment of the present invention;
fig. 4-3 are schematic structural diagrams of another audio classification apparatus provided in the embodiment of the present invention;
fig. 4-4 are schematic structural diagrams of still another audio classification apparatus according to an embodiment of the present invention;
fig. 4-5 are schematic structural diagrams of an audio classification apparatus according to another embodiment of the present invention;
fig. 4-6 are schematic structural diagrams of another audio classification apparatus according to another embodiment of the present invention;
fig. 4-7 are schematic structural diagrams of another audio classification apparatus according to another embodiment of the present invention;
FIGS. 4-8 are schematic structural diagrams of another audio classification apparatus according to another embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Fig. 1 is a flowchart of an audio classification method according to an embodiment of the present invention, and as shown in fig. 1, the method may include:
And 102, judging whether the target entry is a pure music entry.
And 103, when the target entry is a pure music entry, determining that the audio to be classified is pure music.
In summary, the audio classification method provided in the embodiment of the present invention obtains the target entry to which the audio to be classified belongs, then determines whether the target entry is a pure music entry, and determines that the audio to be classified is pure music when the target entry is a pure music entry.
Fig. 2-1 is a flowchart of an audio classification method according to an embodiment of the present invention, and as shown in fig. 2-1, the method may include:
The target entry comprises audios with the same audio attribute, and the audio attribute is used for representing the characteristics of the audios. Alternatively, the audio attributes are the same including the name of the singer and the keywords in the audio name are the same.
It should be noted that, in the embodiment of the present invention, the target entry is an audio set, and the target entry may also be used to indicate an audio attribute of audio in the target entry. For example, the name of the target entry may be used to indicate the audio properties of the audio in the target entry.
For example, assuming that the name of the target entry is "three actors", the audio performing singers included in the target entry are all three actors, and the names of the audios all include the keyword "actors", that is, different types of audios including the same music in the target entry.
Further, in the embodiment of the present invention, after the target entry to which the audio to be classified belongs is obtained, multiple non-pure music entry determination processes may be performed on the target entry to exclude the non-pure music entry, so as to obtain a pure music entry, for example, the multiple non-pure music entry determination processes may include three non-pure music determination processes, which are the first non-pure music entry determination process, the second non-pure music determination process, and the third non-pure music determination process, and the processes thereof refer to steps 202 to 204.
As shown in fig. 2-2, the first non-pure music determining process may include:
Optionally, each audio in the target entry has a category label, and the category label is used for representing a fine-grained type of the audio. The category label may be used to indicate a type of the audio, for example, the category label may include a single song label, a ringtone label, an accompaniment label, a live music label, a pure music label, or an electronic music (including a record Jockey (DJ)) label, and the like, and the category label of each audio may be obtained by manually classifying the audio, or may be obtained by classifying the audio in other manners, which is not limited in the embodiment of the present invention.
It should be noted that, since the accompaniment audio is an audio part except words in the word-filling music, when the category label of the audio is determined manually, it is usually determined that there is no human voice in the audio and the word-filling exists in the audio, and then the category label of the audio is determined as the accompaniment label, so that the accuracy of the accompaniment audio obtained by manual classification is high. After determining that the target entry includes the accompaniment audio, the target entry can be determined to be an impure music entry because the accompaniment audio is impure music.
As shown in fig. 2 to 3, the second non-pure music determining process may include:
In the embodiment of the invention, each audio in the target entry has classification information, and the classification information is used for representing the coarse-grained type of the audio. The classification information may include a suspected human voice class, a class to be determined, or a suspected pure music class.
Optionally, a musly library may be invoked to determine classification information for each audio in the target entry. The musly library is a high-speed high-quality audio music similarity library written in C/C + +, and is a library of term License open sources in MPL (Mozilla Public License)2.0 software License.
It should be noted that before determining the classification information of each audio frequency in the target entry, a large number of known types of training audio frequencies may be used to train the musly library, after the training of the musly library is completed, the musly library may be called, a plurality of known types of test audio frequencies are used to perform testing, and the output results corresponding to each test audio frequency are counted to obtain the probabilities of the classification information corresponding to different numerical value domains. For example, if the output result of each test audio includes 20 audios most similar to the test audio, the number of pure music in the 20 audios in the output result may be counted, and it is assumed that after all known types of test audio are tested, the output result is counted based on the output result: the number of pure music in the output results corresponding to more than 90% of pure music in the test audio is not less than 15, the number of pure music in the output results corresponding to more than 90% of people's voice in the test audio is less than 10, then the classification information corresponding to the numerical field 0-10 can be defined as suspicious human voice class, the classification information corresponding to the numerical field 15-20 is suspicious pure music class, and the classification information corresponding to the numerical field 10-15 is to-be-determined class.
Furthermore, a musly library can be called, and classification information of each audio frequency in the target entry is determined according to the output numerical value of each audio frequency in the target entry.
Optionally, based on the statistical probability, when the classification information of the audio is a suspicious human voice class, the probability that the audio is human voice is high, and therefore it can be determined that the target entry is an impure music entry.
And step 204, executing a third non-pure music judgment process.
As shown in fig. 2 to 4, the third non-pure music determining process may include:
2041, judging whether the category label of each audio frequency in the target entry is a single song label; when the category label of a certain audio frequency in the target entry is a single song label, executing step 2042; when the category labels of all the audios in the target entry are not the song labels, step 2046 is executed.
Alternatively, the preset tone quality condition may be stereo and/or an audio sampling rate above a certain threshold. The pseudo stereo is a stereo signal which is approximated by a mono signal, and specifically includes: dividing the single sound signal into two paths, wherein one path of signal is a direct sound signal, the other path of signal is changed into a delayed sound signal after being delayed, and the two paths of sound signals are replayed through two loudspeakers to form a pseudo stereo effect with a certain sound field; alternatively, pseudo stereo sound may be a sound signal in which the waveforms of the left and right channels are substantially identical. The audio sampling rate refers to the sampling times of a sound signal in one second by a recording device, and the higher the audio sampling rate is, the truer and more natural the sound reproduction is, namely, the better the sound quality of the audio is.
And 2043, when the tone quality of a certain audio does not meet the preset tone quality condition, determining that the target entry is a non-pure music entry.
Because the classification information is easily judged as suspicious pure music or to-be-determined when the musly library is called by someone audio with low audio frequency and/or audio frequency adoption rate of pseudo stereo, for example, the classification information is easily judged as suspicious pure music or to-be-determined when the musly library is called by someone audio frequency in an old album, when the tone quality of a certain audio frequency does not meet the preset tone quality condition, the target entry is determined as an impure music entry, the misjudgment probability of pure music can be reduced, and the reliability of audio classification can be improved.
Optionally, when the category labels of all audios in the target entry are not the song labels, or the audios of which the category labels are the song labels in the target entry all meet preset tone quality conditions and the classification information is a suspicious pure music class, detecting whether the classification information of each audio in the target entry is a class to be determined.
Step 2047, when the classification information of any audio is the class to be determined and the audio name of any audio includes pure music typefaces, if the class labels of the first audio and any audio exist in the target entry and the classification information of the first audio is a suspicious pure music class, updating the classification information of any audio to be the suspicious pure music class.
For example, assuming that the audio name of any audio includes pure music characters, for example, the audio name of any audio is "actor pure music", the category label of any audio is a live music label, the category label of the first audio existing in the target entry is also a live music label, and the classification information of the first audio is a suspicious pure music class, the classification information of any audio may be updated to the suspicious pure music class.
The first ratio is the ratio of the number of audios of which the classification information is a suspicious pure music class to the number of all audios.
Optionally, the preset ratio threshold may be in a range of 0.6 to 0.75. For example, when the target entry includes three audios, the preset ratio threshold may be 2/3, and when the target entry includes four or more audios, the preset ratio threshold may be 0.75.
And step 206, determining the target entry as a pure music entry.
Further, step 210 may be performed.
And step 207, detecting whether the classification information of the audios except the audio with the class label of the ring tone label in the target entry is suspicious pure music class.
And step 208, when the classification information of the audios except the audios with the class labels being the ring tone labels in the target entry are all suspicious pure music classes, determining that the target entry is a pure music entry.
Further, step 210 may be performed.
In the embodiment of the present invention, when the target entry is determined to be a pure music entry, it may be determined that the audio to be classified is pure music, and when the target entry is a non-pure music entry, it is not determined whether the audio to be classified is pure music.
It should be noted that, the order of the steps of the audio classification method provided in the embodiment of the present invention may be appropriately adjusted, for example, the order of the above step 202, step 203 and step 204 may be interchanged, or may be increased or decreased according to the situation, for example, step 204 may be omitted, and any method that can be easily considered by those skilled in the art within the technical scope of the present invention shall be covered by the protection scope of the present invention, and therefore, will not be described again.
In practical applications, in order to reduce the amount of operations and increase the speed of audio classification, there may be a progressive relationship between the above steps, for example, fig. 3 is a flowchart of an audio classification method provided in an embodiment of the present invention, and as shown in fig. 3, the method may include:
And step 304, when the category labels of all audios in the target entry are not the accompaniment labels, determining the classification information of each audio in the target entry.
And 305, detecting whether the classification information of each audio frequency in the target entry is a suspicious human voice class.
And step 306, when the classification information of any audio in the target entry is a suspicious human voice class, determining that the target entry is a non-pure music entry.
307, when the classification information of all audios in the target entry is not suspicious human voice, judging whether the class label of each audio in the target entry is a single song label or not; when the category label of a certain audio frequency in the target entry is a single song label, executing step 308; when the category labels of all the audios in the target entry are not the song labels, step 312 is executed.
And 309, when the tone quality of a certain audio does not meet the preset tone quality condition, determining that the target entry is a non-pure music entry.
Step 310, when the tone quality of a certain audio meets a preset tone quality condition, judging whether the classification information of the certain audio is suspicious pure music; when the classification information of a certain audio is not suspicious pure music, execute step 311; when the classification information of a certain audio is suspicious pure music, step 312 is executed.
And 311, determining that the target entry is a non-pure music entry.
And step 312, detecting whether the classification information of each audio in the target entry is a class to be determined.
Step 313, when the classification information of any audio is the class to be determined and the audio name of any audio includes a pure music typeface, if the class labels of the first audio and any audio exist in the target entry and the classification information of the first audio is a suspicious pure music class, updating the classification information of any audio into the suspicious pure music class.
Step 315, determine the target entry as a pure music entry.
Further, step 319 may be performed.
Step 316, detecting whether the classification information of the audio frequency except the audio frequency with the class label as the ring tone label in the target entry is suspicious pure music class.
Further, step 319 may be performed.
The specific implementation process of steps 301 to 319 can refer to steps 201 to 210, which are not described herein.
In summary, the audio classification method provided in the embodiment of the present invention obtains the target entry to which the audio to be classified belongs, then determines whether the target entry is a pure music entry, and determines that the audio to be classified is pure music when the target entry is a pure music entry.
Fig. 4-1 is a schematic structural diagram of an audio classification apparatus 40 according to an embodiment of the present invention, and as shown in fig. 4-1, the apparatus 40 may include:
the obtaining module 401 is configured to obtain a target entry to which the audio to be classified belongs, where the target entry includes audios with the same audio attribute, and the audio attribute is used to represent characteristics of the audios.
The first determining module 402 is configured to determine whether the target entry is a pure music entry.
A first determining module 403, configured to determine that the audio to be classified is pure music when the target entry is a pure music entry.
In summary, in the audio classification apparatus provided in the embodiment of the present invention, the target entry to which the audio to be classified belongs is first obtained by the obtaining module, and then the first determining module determines whether the target entry is a pure music entry, when the target entry is a pure music entry, the first determining module determines that the audio to be classified is pure music, because the target entry may include audios with the same audio attribute in the music library, that is, the target entry may include different types of audios of the same piece of music, and by determining whether the target entry is a pure music entry, it is possible to comprehensively determine whether the audio to be classified is pure music, so that the reliability of audio classification is improved.
Optionally, each audio in the target entry has classification information and a class label, the classification information is used for representing a coarse-grained type of the audio, the class label is used for representing a fine-grained type of the audio, the classification information includes a suspicious voice class or a suspicious pure music class, and the class label includes an accompaniment label;
accordingly, as shown in fig. 4-2, the first determining module 402 of the apparatus 40 may include:
the determining submodule 4021 is configured to determine whether a first ratio is greater than a preset ratio threshold when the classification information of all the audios in the target entry is not suspicious human voice and the classification tags of all the audios in the target entry are not accompaniment tags, where the first ratio is a ratio of the number of the audios in which the classification information is suspicious pure music to the number of all the audios.
The determining sub-module 4022 is configured to determine that the target entry is a pure music entry when the first ratio is greater than a preset ratio threshold.
Optionally, the category tag further includes a ring tag, as shown in fig. 4-3, and the apparatus 40 further includes:
the first detecting module 404 is configured to detect whether the classification information of the audio in the target entry, except for the audio whose category label is the ring label, is a suspicious pure music class when the first ratio is not greater than the preset ratio threshold.
The second determining module 405 is configured to determine that the target entry is a pure music entry when the classification information of the audio in the target entry, except for the audio with the category label being the ring tone label, is a suspicious pure music class.
Optionally, the category tag further includes an accompaniment tag, as shown in fig. 4-4, the apparatus 40 may further include:
the second detecting module 406 is configured to detect whether the category tag of each audio in the target entry is an accompaniment tag.
The third determining module 407 is configured to determine that the target entry is a non-pure music entry when the category tag of any audio in the target entry is an accompaniment tag.
Optionally, as shown in fig. 4-5, the apparatus 40 may further include:
a fourth determining module 408, configured to determine classification information of each audio in the target entry.
The third detecting module 409 is configured to detect whether the classification information of each audio in the target entry is a suspicious human voice class.
A fifth determining module 410, configured to determine that the target entry is a non-pure music entry when the classification information of any audio in the target entry is a suspicious human voice class.
Optionally, the category label further includes a single song label, as shown in fig. 4 to 6, and the apparatus 40 further includes:
the second determining module 411 is configured to determine whether the category label of each audio in the target entry is a single song label.
The fourth detecting module 412 is configured to detect whether the tone quality of a certain audio satisfies a preset tone quality condition when the category tag of the certain audio in the target entry is a single song tag.
A sixth determining module 413, configured to determine that the target entry is a non-pure music entry when the sound quality of a certain audio does not meet the preset sound quality condition.
Further, as shown in fig. 4-7, the apparatus 40 may further include:
the third determining module 414 is configured to determine whether the classification information of a certain audio is a suspicious pure music class when the sound quality of the certain audio satisfies the preset sound quality condition.
A seventh determining module 415, configured to determine that the target entry is a non-pure music entry when the classification information of the certain audio is not the suspected pure music class.
Still further, the classification information may further include a class to be determined, and as shown in fig. 4 to 8, the apparatus 40 may further include:
a fifth detecting module 416, configured to detect whether the classification information of each audio in the target entry is a class to be determined when all the class labels of all the audios in the target entry are not the song labels, or the audios in the target entry whose class labels are the song labels all meet preset timbre conditions and the classification information is a suspicious pure music class;
an updating module 417, configured to update the classification information of any audio to be a suspected pure music class if the classification information of the first audio is the same as the classification label of any audio and the classification information of the first audio is the suspected pure music class when the classification information of any audio is the class to be determined and the audio name of any audio includes a pure music typeface.
Optionally, the fourth determining module may be configured to:
by calling the musly library, the classification information of each audio in the target entry is determined.
Optionally, the preset ratio threshold is in a range of 0.6-0.75.
Alternatively, the audio attributes may be the same, including the name of the singer being the same and the keywords being the same in the audio name.
Alternatively, the category tags may include a single song tag, a ringtone tag, an accompaniment tag, a live music tag, a pure music tag, or an electronic music tag.
In summary, in the audio classification apparatus provided in the embodiment of the present invention, the target entry to which the audio to be classified belongs is first obtained by the obtaining module, and then the first determining module determines whether the target entry is a pure music entry, when the target entry is a pure music entry, the first determining module determines that the audio to be classified is pure music, because the target entry may include audios with the same audio attribute in the music library, that is, the target entry may include different types of audios of the same piece of music, and by determining whether the target entry is a pure music entry, it is possible to comprehensively determine whether the audio to be classified is pure music, so that the reliability of audio classification is improved.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
An embodiment of the present invention provides a computer device, as shown in fig. 5, the computer device 01 includes a processor 12 and a memory 16,
wherein,
the memory 16 is used for storing computer programs;
the processor 12 is configured to execute the program stored in the memory 16 to implement the audio classification method according to any one of the above embodiments, for example, the method may include:
acquiring a target entry to which audio to be classified belongs, wherein the target entry comprises audio with the same audio attribute, and the audio attribute is used for representing the characteristics of the audio;
judging whether the target entry is a pure music entry;
and when the target entry is a pure music entry, determining that the audio to be classified is pure music.
In particular, processor 12 includes one or more processing cores. The processor 12 executes various functional applications and data processing by running a computer program stored in the memory 16, which includes software programs and units.
The computer programs stored by the memory 16 include software programs and units. In particular, memory 16 may store an operating system 162, an application unit 164 required for at least one function. Operating system 162 may be a Real Time eXceptive (RTX) operating system, such as LINUX, UNIX, WINDOWS, or OS X. The application unit 164 may include an acquisition unit 164a, a first judgment unit 164b, and a first determination unit 164 c.
The acquisition unit 164a has the same or similar functions as the acquisition module 401.
The first judging unit 164b has the same or similar function as the first judging module 402.
The first determination unit 164c has the same or similar function as the first determination module 403.
An embodiment of the present invention provides a storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the audio classification method according to the above embodiment is implemented.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The invention is not to be considered as limited to the particular embodiments shown and described, but is to be understood that various modifications, equivalents, improvements and the like can be made without departing from the spirit and scope of the invention.
Claims (26)
1. A method of audio classification, the method comprising:
the method comprises the steps of obtaining target entries to which audio to be classified belongs, wherein the target entries comprise audios with the same audio attribute, the audio attribute is used for representing the characteristics of the audios, each audio in the target entries is provided with classification information and a class label, the classification information is used for representing the coarse-grained type of the audio, the class label is used for representing the fine-grained type of the audio, the classification information comprises suspicious human voice classes or suspicious pure music classes, and the class labels comprise accompaniment labels;
judging whether the target entry is a pure music entry;
and when the target entry is a pure music entry, determining that the audio to be classified is pure music.
2. The method of claim 1, wherein the determining whether the target entry is a pure music entry comprises:
when the classification information of all audios in the target entry is not suspicious human voice and the classification labels of all audios in the target entry are not accompaniment labels, judging whether a first ratio is larger than a preset ratio threshold, wherein the first ratio is the ratio of the number of the audios of which the classification information is suspicious pure music in all audios to the number of all audios;
and when the first ratio is larger than the preset ratio threshold, determining that the target entry is a pure music entry.
3. The method of claim 2, wherein the category tag further comprises a ring tone tag, the method further comprising:
when the first ratio is not larger than the preset ratio threshold, detecting whether the classification information of the audios except the audios of which the class labels are the ring labels in the target entry is suspicious pure music classes;
and when the classification information of the audios except the audios with the class labels being the ring tone labels in the target entry is the suspicious pure music class, determining that the target entry is a pure music entry.
4. The method of claim 2 or 3, wherein the category labels further comprise accompaniment labels, the method further comprising:
detecting whether the category label of each audio frequency in the target entry is an accompaniment label;
and when the category label of any audio frequency in the target entry is the accompaniment label, determining that the target entry is the non-pure music entry.
5. A method according to claim 2 or 3, characterized in that the method further comprises:
determining classification information of each audio frequency in the target entry;
detecting whether the classification information of each audio frequency in the target entry is a suspicious human voice class;
and when the classification information of any audio frequency in the target entry is a suspicious human voice class, determining that the target entry is a non-pure music entry.
6. The method of claim 2 or 3, wherein the category labels further comprise single song labels, the method further comprising:
judging whether the category label of each audio frequency in the target entry is a single song label or not;
when the category label of a certain audio frequency in the target entry is a single song label, detecting whether the tone quality of the certain audio frequency meets a preset tone quality condition;
and when the tone quality of the certain audio does not meet the preset tone quality condition, determining that the target entry is a non-pure music entry.
7. The method according to claim 6, wherein after said detecting whether the sound quality of the certain audio satisfies a preset sound quality condition, the method further comprises:
when the tone quality of the certain audio meets the preset tone quality condition, judging whether the classification information of the certain audio is suspicious pure music;
and when the classification information of the certain audio is not a suspicious pure music class, determining that the target entry is a non-pure music entry.
8. The method of claim 7, wherein the classification information further includes a class to be determined, and after the detecting whether the class label of each audio in the target entry is a song label, the method further comprises:
when the category labels of all audios in the target entry are not the single music labels, or the audios of which the category labels are the single music labels in the target entry meet the preset tone quality condition and the classification information is a suspicious pure music class, detecting whether the classification information of each audio in the target entry is a class to be determined;
and when the classification information of any audio is a class to be determined and the audio name of any audio comprises pure music typefaces, if the first audio has the same class label with the any audio in the target entry and the classification information of the first audio is a suspicious pure music class, updating the classification information of any audio into the suspicious pure music class.
9. The method of claim 5, wherein the determining classification information for each audio in the target entry comprises:
and determining the classification information of each audio in the target entry by calling a musly library.
10. The method of claim 2,
the range of the preset ratio threshold is 0.6-0.75.
11. The method of claim 1,
the audio attributes are the same, including the same name of the singer and the same keyword in the audio name.
12. The method of claim 2,
the category tags include a single song tag, a ringtone tag, an accompaniment tag, a live music tag, a pure music tag, or an electronic music tag.
13. An apparatus for audio classification, the apparatus comprising:
the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring target entries to which audio to be classified belongs, the target entries comprise audios with the same audio attributes, the audio attributes are used for representing the characteristics of the audios, each audio in the target entries is provided with classification information and a class label, the classification information is used for representing the coarse-grained type of the audio, the class label is used for representing the fine-grained type of the audio, the classification information comprises suspicious human voice classes or suspicious pure music classes, and the class label comprises an accompaniment label;
the first judgment module is used for judging whether the target entry is a pure music entry;
and the first determining module is used for determining that the audio to be classified is pure music when the target entry is a pure music entry.
14. The apparatus of claim 13, wherein the first determining module comprises:
the judgment submodule is used for judging whether a first ratio is larger than a preset ratio threshold value or not when the classification information of all audios in the target entry is not suspicious human voice and the classification labels of all audios in the target entry are not accompaniment labels, wherein the first ratio is the ratio of the number of the audios of which the classification information is suspicious pure music in all the audios to the number of all the audios;
and the determining submodule is used for determining that the target entry is a pure music entry when the first ratio is greater than the preset ratio threshold.
15. The apparatus of claim 14, wherein the category tag further comprises a ring tag, the apparatus further comprising:
the first detection module is used for detecting whether the classification information of the audios except the audio with the class label being the ring tone label in the target entry is suspicious pure music class or not when the first ratio is not larger than the preset ratio threshold;
and the second determining module is used for determining that the target entry is a pure music entry when the classification information of the audios except the audios with the class labels being the ring tone labels in the target entry is the suspicious pure music class.
16. The apparatus of claim 14 or 15, wherein the category tag further comprises an accompaniment tag, the apparatus further comprising:
the second detection module is used for detecting whether the category label of each audio frequency in the target entry is an accompaniment label;
and the third determining module is used for determining that the target entry is a non-pure music entry when the category label of any audio frequency in the target entry is an accompaniment label.
17. The apparatus of claim 14 or 15, further comprising:
the fourth determining module is used for determining the classification information of each audio frequency in the target entry;
the third detection module is used for detecting whether the classification information of each audio frequency in the target entry is a suspicious human voice class;
and the fifth determining module is used for determining that the target entry is a non-pure music entry when the classification information of any audio frequency in the target entry is a suspicious human voice class.
18. The apparatus of claim 14 or 15, wherein the category label further comprises a single song label, the apparatus further comprising:
the second judgment module is used for judging whether the category label of each audio frequency in the target entry is a single song label or not;
the fourth detection module is used for detecting whether the tone quality of a certain audio meets a preset tone quality condition or not when the category label of the certain audio in the target entry is a single song label;
and the sixth determining module is used for determining the target entry as a non-pure music entry when the tone quality of the certain audio does not meet the preset tone quality condition.
19. The apparatus of claim 18, further comprising:
the third judgment module is used for judging whether the classification information of the certain audio is suspicious pure music or not when the tone quality of the certain audio meets the preset tone quality condition;
and the seventh determining module is used for determining that the target entry is a non-pure music entry when the classification information of the certain audio is not a suspicious pure music class.
20. The apparatus of claim 19, wherein the classification information further comprises a class to be determined, the apparatus further comprising:
a fifth detection module, configured to detect whether the classification information of each audio in the target entry is a class to be determined when the class labels of all the audios in the target entry are not the song labels, or the audios in the target entry, of which the class labels are the song labels, all meet the preset sound quality condition and the classification information is a suspicious pure music class;
and the updating module is used for updating the classification information of any audio into a suspicious pure music class if the first audio has the same class label with the any audio and the classification information of the first audio is the suspicious pure music class when the classification information of any audio is the class to be determined and the audio name of any audio comprises pure music characters.
21. The apparatus of claim 17, wherein the fourth determining module is configured to:
and determining the classification information of each audio in the target entry by calling a musly library.
22. The apparatus of claim 14,
the range of the preset ratio threshold is 0.6-0.75.
23. The apparatus of claim 13,
the audio attributes are the same, including the same name of the singer and the same keyword in the audio name.
24. The apparatus of claim 14,
the category tags include a single song tag, a ringtone tag, an accompaniment tag, a live music tag, a pure music tag, or an electronic music tag.
25. A computer device comprising a processor and a memory,
wherein,
the memory is used for storing a computer program;
the processor is configured to execute the program stored in the memory to implement the audio classification method according to any one of claims 1 to 12.
26. A storage medium, characterized in that the storage medium has stored therein a computer program which, when being executed by a processor, implements the audio classification method of any one of claims 1 to 12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711107617.3A CN107885845B (en) | 2017-11-10 | 2017-11-10 | Audio classification method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711107617.3A CN107885845B (en) | 2017-11-10 | 2017-11-10 | Audio classification method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107885845A CN107885845A (en) | 2018-04-06 |
CN107885845B true CN107885845B (en) | 2020-11-17 |
Family
ID=61780206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711107617.3A Active CN107885845B (en) | 2017-11-10 | 2017-11-10 | Audio classification method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107885845B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061909B (en) * | 2019-11-22 | 2023-11-28 | 腾讯音乐娱乐科技(深圳)有限公司 | Accompaniment classification method and accompaniment classification device |
CN111147871B (en) * | 2019-12-04 | 2021-10-12 | 北京达佳互联信息技术有限公司 | Singing recognition method and device in live broadcast room, server and storage medium |
CN111240540B (en) * | 2019-12-27 | 2023-11-10 | 咪咕视讯科技有限公司 | Video adjustment method, terminal with flexible screen and storage medium |
CN111460214B (en) * | 2020-04-02 | 2024-04-19 | 北京字节跳动网络技术有限公司 | Classification model training method, audio classification method, device, medium and equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016066228A1 (en) * | 2014-10-31 | 2016-05-06 | Longsand Limited | Focused sentiment classification |
CN105843931A (en) * | 2016-03-30 | 2016-08-10 | 广州酷狗计算机科技有限公司 | Classification method and device |
CN106548786A (en) * | 2015-09-18 | 2017-03-29 | 广州酷狗计算机科技有限公司 | A kind of detection method and system of voice data |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9519685B1 (en) * | 2012-08-30 | 2016-12-13 | deviantArt, Inc. | Tag selection, clustering, and recommendation for content hosting services |
CN103294811A (en) * | 2013-06-05 | 2013-09-11 | 中国科学院自动化研究所 | Visual classifier construction method with consideration of characteristic reliability |
CN103678635B (en) * | 2013-12-19 | 2017-01-04 | 中国传媒大学 | Online music aggregation recommendation method based on label directed graph |
CN104679902B (en) * | 2015-03-20 | 2017-11-28 | 湘潭大学 | A kind of informative abstract extracting method of combination across Media Convergence |
CN107220281B (en) * | 2017-04-19 | 2020-02-21 | 北京协同创新研究院 | Music classification method and device |
-
2017
- 2017-11-10 CN CN201711107617.3A patent/CN107885845B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016066228A1 (en) * | 2014-10-31 | 2016-05-06 | Longsand Limited | Focused sentiment classification |
CN106548786A (en) * | 2015-09-18 | 2017-03-29 | 广州酷狗计算机科技有限公司 | A kind of detection method and system of voice data |
CN105843931A (en) * | 2016-03-30 | 2016-08-10 | 广州酷狗计算机科技有限公司 | Classification method and device |
Non-Patent Citations (1)
Title |
---|
"基于内容的音频与音乐分析综述";张一彬 等;《计算机学报》;20070515;第30卷(第5期);712-728 * |
Also Published As
Publication number | Publication date |
---|---|
CN107885845A (en) | 2018-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kong et al. | Sound event detection and time–frequency segmentation from weakly labelled data | |
US8700194B2 (en) | Robust media fingerprints | |
CN107885845B (en) | Audio classification method and device, computer equipment and storage medium | |
US20200005761A1 (en) | Voice synthesis method, apparatus, device and storage medium | |
TW202008349A (en) | Speech labeling method and apparatus, and device | |
US11017774B2 (en) | Cognitive audio classifier | |
JP2005530214A (en) | Mega speaker identification (ID) system and method corresponding to its purpose | |
CN103646046A (en) | Method and device for sound control in browser and browser | |
Mesaros et al. | Datasets and evaluation | |
Lafay et al. | A morphological model for simulating acoustic scenes and its application to sound event detection | |
CN105931642A (en) | Speech recognition method, apparatus and system | |
US20190156835A1 (en) | Diarization Driven by Meta-Information Identified in Discussion Content | |
CN111859011B (en) | Audio processing method and device, storage medium and electronic equipment | |
CN113674769A (en) | Voice system test method, device, equipment, medium and program product | |
CN110324657A (en) | Model generation, method for processing video frequency, device, electronic equipment and storage medium | |
CN115240659B (en) | Classification model training method and device, computer equipment and storage medium | |
Hajihashemi et al. | Novel time-frequency based scheme for detecting sound events from sound background in audio segments | |
CN110739006A (en) | Audio processing method and device, storage medium and electronic equipment | |
WO2023000782A1 (en) | Method and apparatus for acquiring video hotspot, readable medium, and electronic device | |
CN116580713A (en) | Vehicle-mounted voice recognition method, device, equipment and storage medium | |
CN116072147A (en) | Music detection model training method and device, electronic equipment and storage medium | |
CN116343771A (en) | Music on-demand voice instruction recognition method and device based on knowledge graph | |
Haunschmid et al. | Towards Musically Meaningful Explanations Using Source Separation | |
CN115329125A (en) | Song skewer burning splicing method and device | |
CN110400559B (en) | Audio synthesis method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |