[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112036187A - Context-based video barrage text auditing method and system - Google Patents

Context-based video barrage text auditing method and system Download PDF

Info

Publication number
CN112036187A
CN112036187A CN202010655180.2A CN202010655180A CN112036187A CN 112036187 A CN112036187 A CN 112036187A CN 202010655180 A CN202010655180 A CN 202010655180A CN 112036187 A CN112036187 A CN 112036187A
Authority
CN
China
Prior art keywords
audit
context
text
video
auditing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010655180.2A
Other languages
Chinese (zh)
Inventor
王晓平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jilian Network Technology Co ltd
Original Assignee
Shanghai Jilian Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jilian Network Technology Co ltd filed Critical Shanghai Jilian Network Technology Co ltd
Priority to CN202010655180.2A priority Critical patent/CN112036187A/en
Publication of CN112036187A publication Critical patent/CN112036187A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a video barrage text auditing method combining context, which adopts a multi-level auditing mode and specifically comprises the following steps: the method and the system have the advantages that the defect that the prior art cannot deal with insufficient information amount of the bullet screen text is overcome, the defect that the accuracy of the conventional method for auditing the video bullet screen text is low is overcome, the auditing result of the video bullet screen text is ensured to be more accurate and reliable, and the method and the system have obvious technical advantages and beneficial effects.

Description

Context-based video barrage text auditing method and system
Technical Field
The invention relates to a text auditing method, in particular to a video barrage text auditing method combining context.
Background
In the information era, diversified user interaction data such as video barracks, comments and the like are continuously generated by a plurality of network media and social platforms such as network videos, microblogs, WeChats, chat communities and the like, so that the challenges are brought to effective information auditing and supervision.
In the text data types, because the barrage text has the characteristics of short length, insufficient information amount and the like, the situations that the meanings of the same video barrage text are completely different under different context contexts often occur, and therefore, the examination of the type of text is obviously more difficult and challenging.
The conventional video barrage text auditing method is generally to directly audit the video barrage text, and obviously, a reliable auditing result cannot be obtained for a shorter video barrage text due to the lack of context.
Disclosure of Invention
The invention provides a solution for video barrage text auditing, which aims to overcome the defect of analysis capability of barrage texts in the prior art and enhance the reliability of barrage text auditing results.
In order to achieve the above object, the present invention designs a video barrage text auditing method in combination with context, the method includes: acquiring a video barrage text to be audited as a target audit text; sensitive word expansion auditing step: performing word segmentation processing on the target audit text by adopting a word segmentation method to obtain a text fragment list of the target audit text, comparing and matching the text fragment list with a preset sensitive word feature library to obtain a matching result, finishing the sensitive word expansion audit step if the matching is successful, and continuing the next audit if the matching is failed; and semantic auditing: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding semantic classification labels to the video bullet screen text according to the judgment result, and determining the target audit text needing further audit according to the semantic classification labels; context auditing step: and obtaining context information of the target audit text, and detecting and analyzing the context information based on context audit to obtain an audit result.
Preferably, the context information includes: the method comprises the steps of obtaining target structural information and scene classification information of a video frame corresponding to a video bullet screen text, obtaining event structural information of a video within a certain time range corresponding to the video bullet screen text, and obtaining classification information of bullet screen context texts within the certain time range corresponding to the video bullet screen text.
Preferably, the context audit comprises a context video context audit and a context text context audit, the context video context audit is performed based on a deep learning method or a traditional method, and the context text context audit is performed based on a semantic classification method.
Preferably, the method for constructing the sensitive word feature library includes: establishing an original sensitive word bank; performing deformation mapping processing on each sensitive word in the original sensitive word bank to obtain various deformation mapping results; and combining the various deformation mapping results with the original sensitive word library to construct a sensitive word feature library.
Preferably, the deformation mapping processing includes mixed deformation of phonetic characters, harmonic deformation, abbreviated deformation of pinyin, deformation of front and back nasal sound and flat and warped tongue sound, reverse reading deformation, deformation of filling characters, deformation of missing characters, deformation of disassembled characters, deformation of shape-similar characters and deformation of synonyms.
Preferably, the training method of the semantic classification model includes a deep learning method and a traditional training method, and the deep learning method includes: TextCNN, TextRNN, BERT, XLNet, RoBERTa, ALBERT, etc., and the conventional training methods include logistic regression, support vector machines. Preferably, a deep learning method ALBERT is used.
Preferably, the semantic classification labeling includes: "semantically normal", "semantically violated", and "semantically fuzzy".
The invention also discloses a video barrage text auditing system combined with the context, which comprises a sensitive word expansion auditing module, a semantic auditing module and a context auditing module, wherein the sensitive word expansion auditing module: the system comprises a word segmentation unit, a word matching unit, a word segmentation unit, a word; and a semantic auditing module: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding a semantic classification label to a video bullet screen text according to the judgment result, judging and processing output according to the semantic classification label, and determining an output mode and the target audit text needing further audit; context auditing module: and obtaining context information of the target audit text, carrying out detection analysis on the context information based on context audit, and carrying out comprehensive judgment to obtain an audit result.
Preferably, the system further comprises an audit result output module, and the audit result output module performs final audit output and display on the outputs from the sensitive word expansion audit module, the semantic audit module and the context audit module.
Preferably, the auditing result output module outputs the output from the sensitive word expansion auditing module, and the output of the displayed data includes: the position of the search word in the input text, the original shape of the matched sensitive word and the actual deformation mapping information of the sensitive word in the input text.
The invention also discloses an electronic device, which is characterized in that the system comprises a processor and a memory, wherein the memory is used for storing the executable program; the processor is configured to execute the executable program to implement the method.
In practical applications, the modules described in the method and system disclosed by the present invention may be deployed on one server, or each module may be deployed on a different server independently, and particularly, in order to provide a stronger computing processing capability, the modules may be deployed on a cluster server as needed.
By utilizing the method and the system disclosed by the invention, a multi-stage auditing mode is adopted, auditing means are diversified, and multiple audits of context information of a video channel and a text channel are combined, so that the defect that the prior art cannot deal with the insufficient information quantity of the barrage text per se is overcome, the defect that the accuracy rate of the conventional method for auditing the video barrage text is low is overcome, the auditing result of the video barrage text is ensured to be more accurate and reliable, and the method and the system have obvious technical advantages and beneficial effects
In order that the invention may be more clearly and fully understood, specific embodiments thereof are described in detail below with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 shows a flow diagram of a video barrage text review method in conjunction with context in one embodiment.
FIG. 2 is a flowchart illustrating a method for constructing a sensitive word feature library according to an embodiment.
FIG. 3 is a block diagram of a video barrage text review system in conjunction with contextual context in one embodiment.
FIG. 4 is a block diagram that illustrates the context auditing module of an embodiment.
FIG. 5 illustrates a context audit module flow diagram of one embodiment.
Fig. 6 shows an overall flow chart of audit result output.
Detailed Description
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for reviewing a video bullet screen text in combination with a context, and specifically includes steps S11 to S14:
and step S11, acquiring the video barrage text to be audited as a target audit text.
And obtaining a text of the video bullet screen to be audited, and taking the text as a target audit text.
And step S12, sensitive word expansion auditing step.
In this embodiment, the sensitive word expansion review step includes the following steps: performing word segmentation processing on the target audit text by adopting a word segmentation method to obtain a text fragment list of the target audit text, comparing and matching the text fragment list with a preset sensitive word feature library to obtain a matching result, finishing the sensitive word expansion audit step if the matching is successful, and continuing the next audit if the matching is failed.
Firstly, performing word segmentation on a target audit text, wherein the text is the target audit text, and outputting a word segmentation result list arranged according to the sequence of appearance of words after performing word segmentation operation on the textseg
listseg=[seg1,seg2,…,segM]
Wherein, M represents the number of elements of the word segmentation result list.
Will listsegAs a target audit text.
In this embodiment, it is also necessary to establish a sensitive word feature library collection in advancemapPlease refer to the embodiment shown in fig. 2 for a method for creating a sensitive word feature library.
Secondly, the target audit text and the collection are combinedmapAnd (6) carrying out comparison and matching. In this example, the collection is performed sequentiallymapComparing each element with the text, wherein once the comparison is successful, the verification result is 'not passed', the verification process is finished, otherwise, the text is output to the semantic verification module for continuous verification.
And step S13, semantic auditing step.
In this embodiment, this step specifically includes: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding semantic classification labels to the video bullet screen text according to the judgment result, and determining the target audit text needing further audit according to the semantic classification labels.
Before auditing, a semantic classification model needs to be trained on a large amount of text data in advance at one time, deep learning methods such as TextCNN, TextRNN, BERT, XLNet, RoBERTa, ALBERT and the like can be used as training methods of the semantic classification model, and traditional methods such as logistic regression, support vector machine and the like can also be used. Preferably, ALBERT may be used.
And for the text, auditing based on the trained semantic classification model to obtain a semantic label corresponding to the text. In this embodiment, the semantic classification labels include three types of "semantic normal", "semantic violation", and "semantic fuzzy". Performing output judgment processing according to the semantic classification result, and if the semantic violation is judged, considering that the text fails to pass the audit; if the text is judged to be 'semantically normal', the text is considered to pass the audit; if the semantic ambiguity is judged, the text needs to be continuously output to the context auditing module for further auditing.
Step S14, context auditing step.
In this embodiment, the step of further performing an audit on the "semantic fuzzy" text output by the semantic audit module based on context analysis specifically includes: and obtaining context information of the target audit text, and detecting and analyzing the context information based on context audit to obtain an audit result.
The context information includes: the method comprises the steps of obtaining target structural information and scene classification information of a video frame corresponding to a video bullet screen text, obtaining event structural information of a video within a certain time range corresponding to the video bullet screen text, and obtaining classification information of bullet screen context texts within the certain time range corresponding to the video bullet screen text.
In this embodiment, the context detection includes context video detection and context text detection, wherein context video detection includes detecting, locating objects in an image, classifying image scenes, detecting using any time-series based event analysis, and context text detection includes semantic classification detection.
In this embodiment, various deep learning methods or conventional detection methods may be used to implement contextual video detection, and preferably, YOLOv4 (young Only Look one) may be used, and the detection result is divided into a normal target and an illegal target, where the type of the illegal target may include a sensitive person, a sensitive object (e.g., a sensitive flag, a knife, a gun weapon), and the like.
Referring to fig. 2, fig. 2 is a schematic flow chart illustrating a method for constructing a sensitive word feature library, which specifically includes steps S21 to S23:
step S21: and establishing an original sensitive word bank.
Step S22: and performing deformation mapping processing on each sensitive word in the original sensitive word bank to obtain various deformation mapping results.
Step S23: and combining the various deformation mapping results with the original sensitive word library to construct a sensitive word feature library.
In this embodiment, the original sensitive word w is read from the original sensitive word library, and first, w is deformed according to all deformation rules defined by the verification, such as mixed deformation of pronunciation and characters, harmonic deformation, abbreviated deformation of pinyin, deformation of front and back nasal sound and flat and warped tongue sound, reverse reading deformation, deformation of filling characters, deformation of missing characters, deformation of disassembled characters, deformation of near characters, deformation of synonyms, and the like, and is combined with the original sensitive word w to form a complete matched morpheme set collectionmap
Figure BDA0002576503630000061
Wherein f isy(x) Indicating that the word x is deformed according to a defined deformation rule y and returning a deformation result, wherein S represents the total number of the deformation rules.
Referring to fig. 3, fig. 3 shows an embodiment of a structure of a video barrage text review system combined with a context, in this embodiment, the video barrage text review system includes a sensitive word expansion review module a, a semantic review module B, and a context review module C, where:
sensitive word expands audit module A: the system is used for performing word segmentation processing on a target audit text, comparing and matching the target audit text with a preset sensitive word feature library to obtain a matching result, outputting, judging and processing according to the sensitive word matching result, if the matching is successful, the auditing is finished, and if the matching is failed, the auditing is required to be continued.
In this embodiment, the sensitive word expansion review module a further includes a sensitive word expansion review output judgment sub-module a1, which is used to perform output judgment processing according to the sensitive word matching result, if the matching is successful, the text is considered to fail to be reviewed, the result is directly output to the review result output module, otherwise, the text is continuously output to the semantic review module for processing.
And a semantic auditing module B: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding a semantic classification label to the video bullet screen text according to the judgment result, judging and processing output according to the semantic classification label, and determining an output mode and the target audit text needing further audit.
In this embodiment, the semantic review module B further includes a semantic review output judgment sub-module B1, configured to perform output judgment processing according to the semantic classification result, and if it is determined that "semantic violation" is detected, the text is considered to fail to be reviewed; if the text is judged to be 'semantically normal', the text is considered to pass the audit; if the semantic ambiguity is judged, the text needs to be continuously output to the context auditing module for further auditing.
Context auditing module C: and obtaining context information of the target audit text, carrying out detection analysis on the context information based on context audit, and carrying out comprehensive judgment to obtain an audit result.
Referring to fig. 4, fig. 4 shows a structure of a context auditing module C in an embodiment, where the context auditing module C specifically includes: a video context analysis submodule C1, a text context analysis submodule C2, and a context audit output determination submodule C3, wherein:
video context analysis submodule C1: based on a video structuring technology, target structuring information, scene classification information and event structuring information corresponding to a video within a certain time range of a video frame corresponding to a bullet screen text are obtained, so that video structuring context information is provided for the bullet screen text to be audited. The module comprises a target structuring sub-module C11, a scene classification sub-module C12, and an event structuring sub-module C13:
target structuring sub-module C11: and detecting and positioning the target in the image. The target structuring method may use various deep learning methods or conventional detection methods, and preferably, YOLOv4 (young Only Look one) may be used. The detection result is divided into a normal target and an illegal target, wherein the type of the illegal target can comprise a sensitive person, a sensitive object (such as a sensitive flag, a cutter, a gun and weapon) and the like. The Object detection flag b _ Object _ Abnormal is defined and preset to False, and set to True if an offending Object is detected.
Scene classification submodule C12: the image scene is classified. Scene classification may use various deep learning methods for classification. The scene classification result is divided into a normal scene and an illegal scene, wherein the type of the illegal scene can comprise sensitive scenes such as bloody smell, pornography and the like. The Scene classification flag b _ Scene _ abstract is defined and preset to False, and set to True if an illegal Scene is detected.
Event structuring submodule C13: using any method based on time series analysis, bsn (boundary Sensitive network) can be preferably used. The event structured result is divided into a normal event and an illegal event, wherein the type of the illegal event can comprise fighting, burning, collision and the like, an event classification mark b _ Action _ Abnormal is defined and preset as False, and if the illegal event is detected, the event classification mark is set as True. The setting of the time range may be set empirically, and preferably may be set to trace back 2 seconds from the current time.
The text context analysis submodule C2 is configured to obtain a bullet screen context text set corresponding to the current bullet screen text within a certain time range, so as to provide context reference information for the bullet screen text to be audited. The time range may be set empirically, and preferably, considering that the bullet screen usually has a lag (e.g. due to reaction time, character input time) corresponding to the event, it may be set to delay 1 second based on the time setting of the event structuring module. The module classifies texts in the barrage context text set by adopting a semantic classification method in the semantic auditing module. Defining a Text Context exception flag b _ Text _ Context _ exception and presetting the Text Context exception flag as False, wherein the updating calculation method comprises the following steps: and if the text labeled as 'semantic violation' exists in the bullet screen context text set, setting the text to True.
The context auditing output judgment submodule C3 is responsible for comprehensively outputting and judging the output results of the video context analysis submodule and the text context analysis submodule.
In this embodiment:
defining and inputting a state exception mark b _ Test _ exception of the bullet screen to be audited, and presetting the state exception mark as False;
defining a Video Context exception flag b _ Video _ Context _ exception and presetting as False, and then performing update calculation as follows:
b_Video_Context_Abnormal=
(b_Object_Abnormal OR b_Scene_Abnormal OR b_Action_Abnormal)
further, updating and calculating the state abnormity mark of the input bullet screen to be audited:
IF b_Video_Context_Abnormal AND b_Text_Context_Abnormal:
b_Test_Abnormal=True
and if at least one of the violation target, the violation scene and the violation event occurs in the context and the violation bullet screen text also occurs in the context, judging that the bullet screen text to be audited is violated, wherein the auditing result is 'not passed', otherwise, judging that the bullet screen text to be audited is 'passed'. The audit process ends and the results are output to the context audit output decision sub-module C3.
Referring to fig. 5, fig. 5 is a schematic flowchart illustrating a context auditing module of an embodiment, as shown in the figure, in this embodiment, first, according to an input video barrage text, based on a video structuring technology, structured information of video frames corresponding to the barrage text is obtained, where the structured information includes an image frame sequence and a barrage text sequence, where the image frame sequence includes target structured information, scene classification information, and event structured information corresponding to a video within a certain time range; the bullet screen text sequence comprises a bullet screen context text set corresponding to the current bullet screen text within a certain time range.
Secondly, respectively inputting the image frame sequence into sub-modules of a video context analysis sub-module C1, namely a target structured sub-module C11, a scene classification sub-module C12 and an event structured sub-module C13 for detection so as to obtain video structured information; and inputting the bullet screen text sequence into a text context analysis submodule C2 to obtain a semantic classification result.
Thirdly, the detection/classification results of the video context analysis submodule C1 and the text context analysis submodule C2 are respectively input to the context audit output determination submodule C3. The context audit output judgment submodule C3 is responsible for performing comprehensive output judgment processing on the output results of the video context analysis submodule C1 and the text context analysis submodule C2.
In this embodiment:
defining and inputting a state exception mark b _ Test _ exception of the bullet screen to be audited, and presetting the state exception mark as False;
defining a Video Context exception flag b _ Video _ Context _ exception and presetting as False, and then performing update calculation as follows:
b_Video_Context_Abnormal=
(b_Object_Abnormal OR b_Scene_Abnormal OR b_Action_Abnormal)
further, updating and calculating the state abnormity mark of the input bullet screen to be audited:
IF b_Video_Context_Abnormal AND b_Text_Context_Abnormal:
b_Test_Abnormal=True
and if at least one of the violation target, the violation scene and the violation event occurs in the context and the violation bullet screen text also occurs in the context, judging that the bullet screen text to be audited is violated, wherein the auditing result is 'not passed', otherwise, judging that the bullet screen text to be audited is 'passed'. And finishing the auditing process and outputting the result to an auditing result output module.
In this embodiment, the video barrage text auditing system may further include a context auditing output judgment sub-module C3, which may output and display the auditing results of each auditing module in the auditing process, and output data of the auditing results for the output results from the sensitive word expansion auditing module further includes: the position of the search word in the input text, the original shape of the matched sensitive word and the actual deformation mapping information of the sensitive word in the input text.
Referring to fig. 6, fig. 6 is a schematic diagram illustrating an overall process of outputting an audit result, and as shown in the drawing, in this embodiment, an audit result output module D outputs and displays a final audit result for output results from a sensitive word expansion audit output judgment sub-module a1, a semantic audit output judgment sub-module B1, and a context audit output judgment sub-module C3.
In addition, for the output result from the sensitive word expansion auditing module a1, the auditing result output data further includes: the position of the search word in the input text, the original shape of the matched sensitive word and the actual deformation mapping information of the sensitive word in the input text.
Overall, the output flow logic of the present embodiment is as follows:
defining a state exception flag b _ Test _ exception of an input bullet screen to be audited and presetting the state exception flag as False
The Object detection flag b _ Object _ Abnormal is defined and preset as False
The Scene classification flag b _ Scene _ abstract is defined and preset as False
Define the event class flag b _ Action _ Absnormal and preset as False
The Video Context exception flag b _ Video _ Context _ exception is defined and preset to False
Define Text Context exception flag b _ Text _ Context _ Absormal and preset as False
The result of the examination of the IF sensitive word expansion examination module is 'fail':
outputting the auditing result and ending the auditing
ELSE:
Sending the data to a semantic auditing module for continuous auditing
The IF audit result is 'semantic normal' or 'semantic violation':
outputting the auditing result and ending the auditing
ELSE:
Sending the context auditing module to continue auditing
1) Sequentially carrying out the steps of target detection, scene classification, event classification and text semantic classification, and updating the corresponding mark value according to the calculation result:
b_Object_Abnormal
b_Scene_Abnormal
b_Action_Abnormal
b_Text_Context_Abnormal
2) calculating a video context exception flag:
b_Video_Context_Abnormal=
(b_Object_Abnormal OR b_Scene_Abnormal OR b Action Abnormal)
3) and auditing the barrage text according to the video context and the text context:
IF b_Video_Context_Abnormal AND b_Text_Context_Abnormal:
b_Test_Abnormal=True
the result of the examination is "violation"
ELSE:
The examination result was "Normal"
And outputting an auditing result, and ending auditing.
An embodiment of the present application further provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores an executable program, and when the executable program runs on a computer, the computer executes the method and the system described in any of the above embodiments.
It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A video barrage text auditing method combining context and context is characterized by comprising the following steps:
acquiring a video barrage text to be audited as a target audit text;
sensitive word expansion auditing step: performing word segmentation processing on the target audit text by adopting a word segmentation method to obtain a text fragment list of the target audit text, comparing and matching the text fragment list with a preset sensitive word feature library to obtain a matching result, finishing the sensitive word expansion audit step if the matching is successful, and continuing the next audit if the matching is failed;
and semantic auditing: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding semantic classification labels to the video bullet screen text according to the judgment result, and determining the target audit text needing further audit according to the semantic classification labels;
context auditing step: and obtaining context information of the target audit text, and detecting and analyzing the context information based on context audit to obtain an audit result.
2. The method of claim 1, wherein the contextual information comprises: the method comprises the steps of obtaining target structural information and scene classification information of a video frame corresponding to a video bullet screen text, obtaining event structural information of a video within a certain time range corresponding to the video bullet screen text, and obtaining classification information of bullet screen context texts within the certain time range corresponding to the video bullet screen text.
3. The method according to claim 1 or 2, wherein the context audit comprises a context video context audit and a context text context audit, the context video context audit is performed based on a deep learning method or a conventional method, and the context text context audit is performed based on a semantic classification method.
4. The method of claim 1, further comprising: the construction method of the sensitive word feature library comprises the following steps:
establishing an original sensitive word bank;
performing deformation mapping processing on each sensitive word in the original sensitive word bank to obtain various deformation mapping results;
and combining the various deformation mapping results with the original sensitive word library to construct a sensitive word feature library.
5. The method of claim 4, wherein: the deformation mapping processing comprises mixed deformation of sound and characters, harmonic deformation, pinyin abbreviation deformation, front and back nasal sound and flat and warped tongue sound deformation, reverse reading deformation, character filling deformation, character missing deformation, character disassembling deformation, shape and character approaching deformation and synonym deformation.
6. The method as claimed in claim 1, wherein the training method of the semantic classification model comprises a deep learning method and a traditional training method, and the deep learning method comprises: TextCNN, TextRNN, BERT, XLNet, RoBERTa, ALBERT, etc., and the conventional training methods include logistic regression, support vector machines.
7. The method of claim 1, wherein the semantic classification label comprises: "semantically normal", "semantically violated", and "semantically fuzzy".
8. A video barrage text auditing system combining context and context is characterized by comprising: sensitive word expands audit module, semantic audit module, context audit module, wherein:
sensitive word expands audit module: the system comprises a word segmentation unit, a word matching unit, a word segmentation unit, a word;
and a semantic auditing module: inputting a target audit text needing further audit into a trained semantic classification model to obtain a judgment result, adding a semantic classification label to a video bullet screen text according to the judgment result, judging and processing output according to the semantic classification label, and determining an output mode and the target audit text needing further audit;
context auditing module: and obtaining context information of the target audit text, carrying out detection analysis on the context information based on context audit, and carrying out comprehensive judgment to obtain an audit result.
9. The system of claim 8, wherein the system further comprises: and the audit result output module is used for performing final audit output and display on the output from the sensitive word expansion audit module, the semantic audit module and the context audit module.
10. The system of claim 9, wherein: the auditing result output module outputs the output from the sensitive word expansion auditing module, and the output of the displayed data comprises the following steps: the position of the search word in the input text, the original shape of the matched sensitive word and the actual deformation mapping information of the sensitive word in the input text.
CN202010655180.2A 2020-07-09 2020-07-09 Context-based video barrage text auditing method and system Pending CN112036187A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010655180.2A CN112036187A (en) 2020-07-09 2020-07-09 Context-based video barrage text auditing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010655180.2A CN112036187A (en) 2020-07-09 2020-07-09 Context-based video barrage text auditing method and system

Publications (1)

Publication Number Publication Date
CN112036187A true CN112036187A (en) 2020-12-04

Family

ID=73578954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010655180.2A Pending CN112036187A (en) 2020-07-09 2020-07-09 Context-based video barrage text auditing method and system

Country Status (1)

Country Link
CN (1) CN112036187A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735465A (en) * 2020-12-24 2021-04-30 广州方硅信息技术有限公司 Invalid information determination method and device, computer equipment and storage medium
CN112989817A (en) * 2021-05-11 2021-06-18 中国气象局公共气象服务中心(国家预警信息发布中心) Automatic auditing method for meteorological early warning information
CN114547317A (en) * 2022-04-28 2022-05-27 飞狐信息技术(天津)有限公司 Text auditing method and device
CN114998220A (en) * 2022-05-12 2022-09-02 湖南中医药大学 Tongue image detection and positioning method based on improved Tiny-YOLO v4 natural environment
CN115883884A (en) * 2022-11-28 2023-03-31 中国工商银行股份有限公司 Data auditing method and device, computer equipment and storage medium thereof
CN116189064A (en) * 2023-04-26 2023-05-30 中国科学技术大学 Barrage emotion analysis method and system based on joint model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279476A (en) * 2013-04-11 2013-09-04 深圳市易聆科信息技术有限公司 Detection method and system for WEB application system sensitive words
CN106601253A (en) * 2016-11-29 2017-04-26 肖娟 Important-field intelligent robot character broadcast and reading check and proofreading method and system
CN106897422A (en) * 2017-02-23 2017-06-27 百度在线网络技术(北京)有限公司 Text handling method, device and server
CN108170813A (en) * 2017-12-29 2018-06-15 智搜天机(北京)信息技术有限公司 A kind of method and its system of full media content intelligent checks
CN109614608A (en) * 2018-10-26 2019-04-12 平安科技(深圳)有限公司 Electronic device, text information detection method and storage medium
CN109977416A (en) * 2019-04-03 2019-07-05 中山大学 A kind of multi-level natural language anti-spam text method and system
CN110674255A (en) * 2019-09-24 2020-01-10 湖南快乐阳光互动娱乐传媒有限公司 Text content auditing method and device
CN110727766A (en) * 2019-10-18 2020-01-24 上海斗象信息科技有限公司 Method for detecting sensitive words

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279476A (en) * 2013-04-11 2013-09-04 深圳市易聆科信息技术有限公司 Detection method and system for WEB application system sensitive words
CN106601253A (en) * 2016-11-29 2017-04-26 肖娟 Important-field intelligent robot character broadcast and reading check and proofreading method and system
CN106897422A (en) * 2017-02-23 2017-06-27 百度在线网络技术(北京)有限公司 Text handling method, device and server
CN108170813A (en) * 2017-12-29 2018-06-15 智搜天机(北京)信息技术有限公司 A kind of method and its system of full media content intelligent checks
CN109614608A (en) * 2018-10-26 2019-04-12 平安科技(深圳)有限公司 Electronic device, text information detection method and storage medium
CN109977416A (en) * 2019-04-03 2019-07-05 中山大学 A kind of multi-level natural language anti-spam text method and system
CN110674255A (en) * 2019-09-24 2020-01-10 湖南快乐阳光互动娱乐传媒有限公司 Text content auditing method and device
CN110727766A (en) * 2019-10-18 2020-01-24 上海斗象信息科技有限公司 Method for detecting sensitive words

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735465A (en) * 2020-12-24 2021-04-30 广州方硅信息技术有限公司 Invalid information determination method and device, computer equipment and storage medium
CN112735465B (en) * 2020-12-24 2023-02-24 广州方硅信息技术有限公司 Invalid information determination method and device, computer equipment and storage medium
CN112989817A (en) * 2021-05-11 2021-06-18 中国气象局公共气象服务中心(国家预警信息发布中心) Automatic auditing method for meteorological early warning information
CN112989817B (en) * 2021-05-11 2021-08-27 中国气象局公共气象服务中心(国家预警信息发布中心) Automatic auditing method for meteorological early warning information
CN114547317A (en) * 2022-04-28 2022-05-27 飞狐信息技术(天津)有限公司 Text auditing method and device
CN114998220A (en) * 2022-05-12 2022-09-02 湖南中医药大学 Tongue image detection and positioning method based on improved Tiny-YOLO v4 natural environment
CN114998220B (en) * 2022-05-12 2023-06-13 湖南中医药大学 Tongue image detection and positioning method based on improved Tiny-YOLO v4 natural environment
CN115883884A (en) * 2022-11-28 2023-03-31 中国工商银行股份有限公司 Data auditing method and device, computer equipment and storage medium thereof
CN116189064A (en) * 2023-04-26 2023-05-30 中国科学技术大学 Barrage emotion analysis method and system based on joint model
CN116189064B (en) * 2023-04-26 2023-08-29 中国科学技术大学 Barrage emotion analysis method and system based on joint model

Similar Documents

Publication Publication Date Title
CN112036187A (en) Context-based video barrage text auditing method and system
CN111897970B (en) Text comparison method, device, equipment and storage medium based on knowledge graph
US20230334254A1 (en) Fact checking
Jin et al. A novel lexicalized HMM-based learning framework for web opinion mining
CN112686036B (en) Risk text recognition method and device, computer equipment and storage medium
CN110175609B (en) Interface element detection method, device and equipment
US12105704B2 (en) Machine learning-implemented chat bot database query system for multi-format database queries
CN110569502A (en) Method and device for identifying forbidden slogans, computer equipment and storage medium
CN110874534A (en) Data processing method and data processing device
CN116415017A (en) Advertisement sensitive content auditing method and system based on artificial intelligence
CN111221960A (en) Text detection method, similarity calculation method, model training method and device
CN109614481A (en) Object identifying method, device, electronic equipment and computer readable storage medium
KR102309870B1 (en) Method and apparatus for text summary in display ad
CN115392235A (en) Character matching method and device, electronic equipment and readable storage medium
CN110232071A (en) Search method, device and storage medium, the electronic device of drug data
CN118396786A (en) Contract document auditing method and device, electronic equipment and computer readable storage medium
CN113553861B (en) Information processing method, device and storage medium based on dialogue system
CN112784015B (en) Information identification method and device, apparatus, medium, and program
CN115774784A (en) Text object identification method and device
CN112115362B (en) Programming information recommendation method and device based on similar code recognition
Alshehri An Online Fake Review Detection Approach Using Famous Machine Learning Algorithms.
CN109344254B (en) Address information classification method and device
Anusha et al. Detection of fake news using recurrent neural network
CN111950289A (en) Data processing method and device based on automobile maintenance record
CN112308153A (en) Smoke and fire detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201204