CN108875769A - Data mask method, device and system and storage medium - Google Patents
Data mask method, device and system and storage medium Download PDFInfo
- Publication number
- CN108875769A CN108875769A CN201810064918.0A CN201810064918A CN108875769A CN 108875769 A CN108875769 A CN 108875769A CN 201810064918 A CN201810064918 A CN 201810064918A CN 108875769 A CN108875769 A CN 108875769A
- Authority
- CN
- China
- Prior art keywords
- data
- unlabeled data
- control
- unlabeled
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The embodiment of the present invention provides a kind of data mask method, device and system and storage medium.Data mask method includes:The unlabeled data and its pre- markup information of the first number are obtained, pre- markup information is to carry out pre- mark using unlabeled data of the marking model to the first number to obtain, and pre- markup information includes pre- annotation results;Show in the display interface the first number unlabeled data and its pre- annotation results;User is received to the first feedback information of the unlabeled data of the first number;And the final annotation results of the unlabeled data of the first number are determined according to the first feedback information.Data mask method, device and system and storage medium according to an embodiment of the present invention, first unlabeled data is marked in advance by data labeling system, and these unlabeled data and its pre- annotation results can be shown in the display interface, user need to only change the pre- annotation results of mistake, annotating efficiency can greatly be promoted by doing so, and reduce mark cost.
Description
Technical field
The present invention relates to field of computer technology, relate more specifically to a kind of data mask method, device and system and
Storage medium.
Background technique
To today, the effect of data is increasingly highlighted Artificial Intelligence Development.Training is what a neural network model,
Usually require the data of up to a million or even more than one hundred million magnitudes.The mark period of data and cost directly affect an artificial intelligence public affairs
The industrial competition of department.
The data mark process that current data marks platform is manually to mark one by one, mark circle based on this mark process
Face is also that singular strong point marks one by one.Current data mark platform has the following disadvantages:Data dimension model is to data
Progress manually marks one by one;Its mark cost is generally proportional with data set scale, usually requires when marking super large data set
Biggish human input and longer mark period.
Summary of the invention
The present invention is proposed in view of the above problem.The present invention provides a kind of data mask methods, device and system
And storage medium.
According to an aspect of the present invention, a kind of data mask method is provided.Data mask method includes:Obtain the first number
Unlabeled data and its pre- markup information, pre- markup information is to be carried out using marking model to the unlabeled data of the first number
What pre- mark obtained, pre- markup information includes pre- annotation results;Show in the display interface the first number unlabeled data and
Its pre- annotation results;User is received to the first feedback information of the unlabeled data of the first number;And according to the first feedback letter
Cease the final annotation results for determining the unlabeled data of the first number.
Illustratively, display interface includes tab area and menu bar region, and the unlabeled data of the first number is shown in
In tab area, menu bar region includes the mode control for being used to indicate the dimension model of data in tab area, dimension model
Be it is one or more in high probability mode, high parallel pattern and boundary scheme, obtain the first number unlabeled data and its
Pre- markup information includes:Determine the dimension model that user is selected by mode control;According to the dimension model of selection, mark is utilized
Model marks the unlabeled data of the second number in advance, to obtain the pre- markup information of the unlabeled data of the second number,
The pre- labeled data of first number is at least partly unlabeled data in the unlabeled data of the second number.
Illustratively, mode control includes the high probability control for being arranged in different location, high similar control and boundary
One or more in control, high probability control, high similar control and boundary control are respectively used to instruction high probability mode, Gao Xiang
Antitype and boundary scheme.
Illustratively, each single item in one or more in high similar control, high probability control and boundary control includes
Positive example control and negative example control, positive example control are used to control the aobvious of the unlabeled data for belonging to positive example under corresponding dimension model
Show, negative example control is used to control the display of the unlabeled data for belonging to negative example under corresponding dimension model.
Illustratively, mode control is drop down list control, and drop down list control provides and high probability mode, high similar mould
One or more corresponding drop-down list items in formula and boundary scheme.
Illustratively, pre- markup information further includes data score, obtains the unlabeled data and its pre- mark of the first number
Information further includes:Data score is selected to be greater than the first score threshold or obtain less than second from the unlabeled data of the second number
Divide unlabeled data of the unlabeled data of threshold value as the first number, or selects number from the unlabeled data of the second number
According to unlabeled data of the unlabeled data as the first number of the preset number of highest scoring.
Illustratively, pre- markup information further includes data score, and in the display interface, the unlabeled data of the first number is
It is arranged according to the data score of the unlabeled data of the first number.
Illustratively, display interface includes menu bar region, and menu bar region includes random control;Method further includes:When
When receiving the selection information for random control, the unlabeled data of random selection third number is concentrated from unlabeled data;
The unlabeled data of third number is shown in the display interface;User is received to the second feedback of the unlabeled data of third number
Information;And the final annotation results of the unlabeled data of third number are determined according to the second feedback information.
Illustratively, random control includes positive example control and negative example control, and positive example control is for controlling under stochastic model
Belong to the display of the unlabeled data of positive example, negative example control is used to control the unlabeled data for belonging to negative example under stochastic model
Display.
Illustratively, display interface includes menu bar region, and menu bar region includes export control, generating test set control
With it is one or more in initialization model control, wherein export control is for control will be in the unlabeled data of the first number
At least partly unlabeled data and the final annotation results of at least partly unlabeled data export as the file of predetermined format, it is raw
Being used to control from labeled data at test set control concentrates the labeled data of selection predetermined number to obtain test set, test
Collect the mark accuracy rate for testing marking model, initialization model control, which is used to control, carries out initially the parameter of marking model
Change.
Illustratively, display interface further includes information bar region, and information bar region includes for showing sample information, statistics
One or more regions in information, accuracy rate information and shortcut key information, wherein sample information includes belonging to currently wait mark
Infuse the sample of the unlabeled data of classification;Statistical information include the number of labeled data, the number of unlabeled data, belong to just
It is the number of labeled data of example, one or more in the number for the labeled data for belonging to negative example;Accuracy rate information is used for
Indicate the accuracy rate of marking model;Shortcut key information is used to indicate preset shortcut key.
Illustratively, pre- markup information further includes data score, and display interface includes reversion control, filter controls, filtering
It is one or more in threshold controls and filtering number control, wherein reversion control will be shown in the display interface for controlling
The current annotation results of unlabeled data be negative by positive example update and example or positive example be updated to by negative example, filter controls are for controlling
The filtration fraction unlabeled data from the unlabeled data marked in advance using marking model, using remaining unlabeled data as
The unlabeled data of one number is for showing, filtering threshold control is for controlling for not marking from what is marked in advance using marking model
The score threshold that unlabeled data is filtered in data is infused, filtering number control is marked not for controlling from using marking model in advance
The number of the unlabeled data filtered in labeled data.
Illustratively, filtering threshold control is slider control or Input.
Illustratively, receive user includes to the first feedback information of the unlabeled data of the first number:It receives for spy
Determine the toggling command of unlabeled data;The final annotation results of the unlabeled data of the first number are determined according to the first feedback information
Including:The current annotation results of specific unlabeled data are updated by positive example and is negative example or positive example is updated to by negative example, wherein the
The final annotation results of the unlabeled data of one number are current mark of the unlabeled data in mark finish time of the first number
Infuse result.
Illustratively, toggling command includes that the left mouse button of display area where being directed to specific unlabeled data clicks behaviour
Make.
Illustratively, receive user includes to the feedback information of the unlabeled data of the first number:Receive for it is specific not
The illegal command of labeled data;The final annotation results packet of the unlabeled data of the first number is determined according to the first feedback information
It includes:Specific unlabeled data is labeled as invalid data to obtain the current annotation results of specific unlabeled data, wherein first
The final annotation results of the unlabeled data of number are current mark of the unlabeled data in mark finish time of the first number
As a result.
Illustratively, illegal command includes the left mouse button double-click behaviour for display area where specific unlabeled data
Make.
Illustratively, display interface includes menu bar region, information bar region and tab area, and menu bar region is display
The upper area at interface, information bar region are the left area in the lower area of display interface, and tab area is lower area
In right area.
According to a further aspect of the invention, a kind of data annotation equipment is provided, including:Module is obtained, for obtaining first
The unlabeled data of number and its pre- markup information, pre- markup information are the unlabeled data using marking model to the first number
Carry out what pre- mark obtained, pre- markup information includes pre- annotation results;And display module, for showing in the display interface
The unlabeled data of one number and its pre- annotation results;Receiving module, for receiving user to the unlabeled data of the first number
The first feedback information;And result determining module, for determining the unlabeled data of the first number according to the first feedback information
Final annotation results.
According to a further aspect of the invention, a kind of data labeling system, including processor and memory are provided, wherein institute
It states and is stored with computer program instructions in memory, for executing when the computer program instructions are run by the processor
State data mask method.
According to a further aspect of the invention, a kind of storage medium is provided, stores program instruction on said storage,
Described program instruction is at runtime for executing above-mentioned data mask method.
Data mask method, device and system and storage medium according to an embodiment of the present invention, can be first by data mark
Injection system marks unlabeled data in advance, and can show these unlabeled data and its pre- mark in the display interface
As a result.User need to only change wherein wrong pre- annotation results, and annotating efficiency can greatly be promoted by doing so, reduce mark at
This.
Detailed description of the invention
The embodiment of the present invention is described in more detail in conjunction with the accompanying drawings, the above and other purposes of the present invention,
Feature and advantage will be apparent.Attached drawing is used to provide to further understand the embodiment of the present invention, and constitutes explanation
A part of book, is used to explain the present invention together with the embodiment of the present invention, is not construed as limiting the invention.In the accompanying drawings,
Identical reference label typically represents same parts or step.
Fig. 1 shows showing for the exemplary electronic device for realizing data mask method according to an embodiment of the present invention and device
Meaning property block diagram;
Fig. 2 shows the schematic flow charts of data mask method according to an embodiment of the invention;
Fig. 3 shows the schematic diagram of data labeling system according to an embodiment of the invention;
Fig. 4 shows according to an embodiment of the invention for showing display circle of unlabeled data and its pre- annotation results
The schematic diagram in face;
Fig. 5 shows the schematic block diagram of data annotation equipment according to an embodiment of the invention;And
Fig. 6 shows the schematic block diagram of data labeling system according to an embodiment of the invention.
Specific embodiment
In order to enable the object, technical solutions and advantages of the present invention become apparent, root is described in detail below with reference to accompanying drawings
According to example embodiments of the present invention.Obviously, described embodiment is only a part of the embodiments of the present invention, rather than this hair
Bright whole embodiments, it should be appreciated that the present invention is not limited by example embodiment described herein.Based on described in the present invention
The embodiment of the present invention, those skilled in the art's obtained all other embodiment in the case where not making the creative labor
It should all fall under the scope of the present invention.
To solve the above-mentioned problems, it the embodiment of the invention provides a kind of data mask method, device and system and deposits
Storage media.In Intellectualization marking platform provided in an embodiment of the present invention, data labeling system self-teaching and can select number
According to being marked in advance.The mark that mark person's (or saying user) need to only right the wrong, does not need to mark all data one by one again
Note.Data mask method and device according to an embodiment of the present invention can be applied to any required neck being labeled to data
Domain, such as face mark, card number mark etc..
Firstly, describing referring to Fig.1 for realizing the example of data mask method and device according to an embodiment of the present invention
Electronic equipment 100.
As shown in Figure 1, electronic equipment 100 includes one or more processors 102, one or more storage devices 104.It can
Selection of land, electronic equipment 100 can also include input unit 106, output device 108 and data acquisition facility 110, these groups
Part passes through the interconnection of bindiny mechanism's (not shown) of bus system 112 and/or other forms.It should be noted that electronics shown in FIG. 1 is set
Standby 100 component and structure be it is illustrative, and not restrictive, as needed, the electronic equipment also can have it
His component and structure.
The processor 102 can be central processing unit (CPU), graphics processor (GPU) or have data processing
The processing unit of ability and/or the other forms of instruction execution capability, and can control other in the electronic equipment 100
Component is to execute desired function.
The storage device 104 may include one or more computer program products, and the computer program product can
To include various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described easy
The property lost memory for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-
Volatile memory for example may include read-only memory (ROM), hard disk, flash memory etc..In the computer readable storage medium
On can store one or more computer program instructions, processor 102 can run described program instruction, to realize hereafter institute
The client functionality (realized by processor) in the embodiment of the present invention stated and/or other desired functions.In the meter
Can also store various application programs and various data in calculation machine readable storage medium storing program for executing, for example, the application program use and/or
The various data etc. generated.
The input unit 106 can be the device that user is used to input instruction, and may include keyboard, mouse, wheat
One or more of gram wind and touch screen etc..
The output device 108 can export various information (such as image and/or sound) to external (such as user), and
And the output device 108 may include display.Optionally, the output device can also be including loudspeaker etc..Optionally,
It is real using same interactive device (such as touch screen) together with the input unit 106 can integrate with the output device 108
It is existing.
The available required data of the data acquisition facility 110 (including unlabeled data and labeled data), and
And acquired data are stored in the storage device 104 for the use of other components.Optionally, data acquisition facility
110 can be the image collecting devices such as camera, camera.Optionally, data acquisition facility 110 can be wired or wireless communication
Device (including unlabeled data and has marked number for the data needed for obtaining from external equipment (server end or cloud)
According to).
Illustratively, the exemplary electronic device for realizing data mask method according to an embodiment of the present invention and device can
To be realized in the equipment of personal computer or remote server etc..
In the following, data mask method according to an embodiment of the present invention will be described with reference to Fig. 2.Fig. 2 shows according to the present invention one
The schematic flow chart of the data mask method 200 of a embodiment.As shown in Fig. 2, data mask method 200 includes the following steps
S210, S220 and S230.
In step S210, the unlabeled data and its pre- markup information of the first number are obtained, pre- markup information is to utilize mark
Injection molding type carries out what pre- mark obtained to the unlabeled data of the first number, and pre- markup information includes pre- annotation results.
Labeled data as described herein (including unlabeled data and labeled data) may be any type of data, wrap
Include but be not limited to text, image, voice, video etc..It may be noted that labeled data as described herein has quantative attribute.It is exemplary
Ground, in the case that unlabeled data is image, an image can be considered as a unlabeled data.Illustratively, number is not marked
In the case where for video, the video of one section of specific length can be considered as a unlabeled data.
First number can be any suitable number, can be set as needed, and the present invention limits not to this.
For example, can be concentrated from unlabeled data in face mark application and choose 1000 facial images (facial image and people
Face mark is related, may not may also include face comprising face in facial image) unlabeled data as the first number.This
First, second equal terms described in text are only used for distinguishing purpose, are not offered as sequence.
Illustratively, in face mark application, it includes face which can be marked out in 1000 images respectively, which
Not comprising face.It is pre- annotation results that whether every image, which includes face this result,.That is, pre- annotation results can be with
For the classification results of unlabeled data, classification results are referred to as label.Illustratively, pre- markup information can also include number
According to score.For example, the probability that every image includes face can be considered as data score.
Fig. 3 shows the schematic diagram of data labeling system according to an embodiment of the invention.As shown in figure 3, data mark
System can consist of the following components.
I. data pool (Pool):Comprising unlabeled data collection U and labeled data collection L.
II. (Agent) is acted on behalf of:As the core of system, agency plays control marking model training, chooses unlabeled data
The effects of being marked in advance.
III. marking model (Model):It is trained using data pool, and unlabeled data is predicted.Mark mould
The training method of type may include following three kinds:Supervised learning, semi-supervised learning and unsupervised learning.Model training process can be with
Independently of mark process, carrying out always from the background.
In addition, agency is contacted with external mark person (Inspector), the pre- mark knot that mark person provides agency
Fruit is checked, its error section is corrected.It may be noted that mark person can be people, it is also possible to the inspection system realized by machine.
In Fig. 3, the workflow of data labeling system is as shown in the 0-5 in figure, wherein 0 is model training process, 1-
5 be mark process.
0.Model is trained using the data that Agent is provided, these data come from Pool.
1.Agent chooses a collection of unlabeled data from Pool.
2.Agent gives the data of selection to Model and gives a forecast.Example 1:Model is for predicting belonging to unlabeled data
Classification, such as Model output unlabeled data belongs to the different other probability of predetermined class, and Agent can be according to wherein general at this time
The highest classification of rate marks unlabeled data in advance;Example 2:Model is used to extract the data characteristics of unlabeled data, this
When Agent can calculate the similarity between unlabeled data and multiple labeled data, and according between unlabeled data
The highest labeled data of similarity belonging to classification unlabeled data is marked in advance.
The result that Model is provided can be any pair of helpful output of data classification, as Model the last layer exports
Probability distribution or the result of certain middle layer output.
3.Agent marks this batch of unlabeled data using following a certain strategies after the output for obtaining Model in advance
Note, while the data score (score) of each unlabeled data can be provided, and choose the higher a part of data of data score
It exports together with its pre- annotation results to Inspector.
Following several strategies can be used in Agent:
High probability strategy:The probability distribution that Agent is exported according to Model, using the highest classification of probability as not marking number
According to pre- annotation results, such other probability is as data score.
High Similar strategies:Agent according to Model middle layer export as a result, calculating unlabeled data to each having marked
The distance for infusing data, pre- annotation results of the classification as unlabeled data belonging to the smallest labeled data, this minimum
The negative value of distance is as data score.
Boundary strategy:Agent is using accuracy of the strategy to data classification in Active Learning (Active Learning)
It gives a mark.These strategies include:
Uncertainty sampling:The classification uncertainty for calculating each sample (for example calculates the probability distribution P of Model output
Entropy), as data score.
(Query by Committee) votes in the committee:The several different submodels of training simultaneously, these submodels pair
Classification belonging to some unlabeled data carries out " ballot ", and Agent measures the disunity between these ballots using certain criterion
Degree, as data score.
4.Inspector feeds back to Agent after modifying to marking error.
Final annotation results are put into Pool by 5.Agent, update U and L.
Illustratively, marking model can be any suitable neural network model, such as conventional convolutional neural networks.
Illustratively, it can use marking model and class prediction or feature extraction carried out to each unlabeled data respectively, and according to class
Other prediction result or extracted feature determine the pre- annotation results of each unlabeled data.
In step S220, show in the display interface the first number unlabeled data and its pre- annotation results.
Display interface is shown by display device.Illustratively, display device can be liquid crystal display, organic light emitting display
The various displays such as device, cathode-ray tube (CRT) display.
In the display interface, the unlabeled data of the first number can disposably be shown, can also be shown in batches.In batches
It shows the unlabeled data of the first number or also needs to mark other after the unlabeled data of the first number and do not mark number
It is remaining during user checks the unlabeled data that current time shows in the display interface in the case where
Unlabeled data can be preloaded, avoid user check subsequent unlabeled data when also need to wait, with further
Accelerate mark progress.
In the display interface, the pre- annotation results of the unlabeled data of the first number can all show or partially show
Show.In one example, the pre- annotation results of the unlabeled data of the first number can be not quite identical, such as has plenty of number
1, have plenty of number 2, in such a case, it is possible to show the pre- annotation results of each unlabeled data.In one example,
The pre- annotation results of the unlabeled data of one number are consistent, such as are all numbers 1, therefore can only show in the display interface
Show a pre- annotation results (such as the text information for showing such as " 1 "), without showing around each unlabeled data
Its pre- annotation results.In addition, in the display interface, the pre- annotation results of the unlabeled data of the first number can directly display,
It can also show indirectly.For example, if user wants 1 to be labeled number, can click shown on display interface with
1 relevant selection control (such as the button for being marked with " number 1 ") of number, then will show pre- annotation results for number on display interface
The unlabeled data of word 1.In this case, it can be understood as not marking for the first number to 1 relevant selection control of number
A kind of indirect display mode of the pre- annotation results of data.
Fig. 4 shows according to an embodiment of the invention for showing display circle of unlabeled data and its pre- annotation results
The schematic diagram in face.As shown in figure 4, display interface may include menu bar region, information bar region and tab area three parts,
In, display interface includes menu bar region, information bar region and tab area, and menu bar region is the upper zone of display interface
Domain, information bar region are the left area in the lower area of display interface, and tab area is the right area in lower area.
It may be noted that the layout of display interface as described herein can be set as needed, it is not limited to cloth shown in Fig. 4
Office.That is, display interface might not according to model split shown in Fig. 4, and menu bar region, information bar region and
The content shown in position and each region where tab area is also not necessarily consistent with Fig. 4.
In the embodiment shown in fig. 4, the unlabeled data of the first number is shown in tab area.First number is not
Labeled data is several images, and mark purpose is to judge whether every image includes number 1.
In step S230, user is received to the first feedback information of the unlabeled data of the first number.
In the display interface, the unlabeled data of the first number can be shown, such as above-mentioned 1000 are labeled as wrapping in advance
Image containing number 1.User can check this 1000 images occur the image of mistake, example for pre- annotation results
Such as, it includes number be not 1 but 7, user can pass through the interactive devices such as mouse, keyboard, touch screen and data mark be
System (such as the system realized by above-mentioned electronic equipment 100) interaction, to correct the mark of mistake.For example, left mouse button can be used
The image of pre- marking error is clicked, so that the pre- annotation results of the image invert, the annotation results after reversion indicate the image not
Include number 1.Certainly, user can also directly input the annotation results of image by interactive device, such as annotation results are repaired
It is changed to number 7.
User by interactive device to the information that data labeling system inputs be feedback information (including the first feedback information
And the second feedback information being described below), including but not limited to above-mentioned error correction information.For example, if the user thinks that
The current annotation results of the unlabeled data of first number be correctly, can click relevant to submitting selection control (such as
It is marked with the button of " submission ").In this case, the first feedback information may include the point of selection control relevant to submission
Hit operation information.
In step S240, the final annotation results of the unlabeled data of the first number are determined according to the first feedback information.
If receiving the error correction information of user's input, the annotation results of unlabeled data can be entangled
Just.It is appreciated that the correction for each unlabeled data repeated multiple times can carry out, new mark is obtained after correcting every time
As a result.For convenience of description, the annotation results by unlabeled data at current time are indicated with current annotation results.It is appreciated that
The current annotation results of some unlabeled data can be the pre- annotation results of the unlabeled data, be also possible to through primary or more
New annotation results after secondary correction.Finally, when user confirm the first number unlabeled data mark complete (for example, with
Click " submission " control in family) when, it can determine that the current annotation results of each unlabeled data at this time are the unlabeled data
Final annotation results.
Data mask method according to an embodiment of the present invention can first carry out unlabeled data by data labeling system pre-
Mark, and these unlabeled data and its pre- annotation results can be shown in the display interface.User need to only change wherein wrong
Pre- annotation results accidentally, annotating efficiency can greatly be promoted by doing so, and reduce mark cost.
Illustratively, data mask method according to an embodiment of the present invention can be in setting with memory and processor
It is realized in standby, device or system.
Data mask method according to an embodiment of the present invention can be deployed at personal terminal, such as smart phone, plate
Computer, personal computer etc..
Alternatively, data mask method according to an embodiment of the present invention can also be deployed in server end and client with being distributed
At end.For example, can obtain labeled data (such as acquiring facial image at Image Acquisition end) in client, client be will acquire
Data transmission give server end (or cloud), by server end (or cloud) carry out data mark.
Another embodiment according to the present invention, display interface may include tab area and menu bar region, the first number
Unlabeled data may be displayed in tab area, and menu bar region may include the mark for being used to indicate data in tab area
The mode control of mode, dimension model are one or more, the step in high probability mode, high parallel pattern and boundary scheme
S210 may include:Determine the dimension model that user is selected by mode control;According to the dimension model of selection, mark mould is utilized
Type marks the unlabeled data of the second number in advance, to obtain the pre- markup information of the unlabeled data of the second number, the
The pre- labeled data of one number is at least partly unlabeled data in the unlabeled data of the second number.
The unlabeled data of second number is the data initially obtained from unlabeled data collection.It can be from the second number not
At least partly unlabeled data (i.e. the unlabeled data of the first number), and the unlabeled data that will be selected is selected in labeled data
And its pre- annotation results are exported to display device, to be shown by display device.It include filtering as described below in display interface
In the case where control, it can use filter controls control and select not mark number at least partly from the unlabeled data of the second number
According to the unlabeled data as the first number.The side of at least partly unlabeled data is selected from the unlabeled data of the second number
Formula can be the classification according to indicated by the pre- annotation results of unlabeled data and/or the data score of unlabeled data is selected
It selects.For example, the unlabeled data of the second number can be 10000 images in number mark application, every image includes one
A number, the number can be any of 0~9.Certainly, some images can not include number.By marking in advance, obtain
The pre- annotation results of every image, that is, which number know that every image includes is.If what current needs marked is number
1, then pre- annotation results can be selected from 10000 images to be exported for those of number 1 image and by the image selected to aobvious
Showing device is shown.Assuming that including number 1 there are 900 images, then the first number is 900.
Mode control can be button control.As shown in figure 4, the left-half of menu bar region includes " random ", " high phase
Like ", " high probability ", " boundary " these four controls, every kind of control is further divided into " just " and two kinds of " negative "." random ", " high phase
Seemingly ", " high probability ", " boundary " respectively correspond four kinds of different dimension models, herein referred as stochastic model, high parallel pattern, height
Conceptual schema and boundary scheme.In Fig. 4, the control currently chosen is the positive example control under high probability mode.
In above-mentioned four kinds of dimension models, high parallel pattern, high probability mode and boundary scheme correspond to different pre- marks
Note strategy.When user is by clicking a certain mode control to select corresponding dimension model, data labeling system can basis
The dimension model of user's selection marks unlabeled data in advance.
Another embodiment according to the present invention utilizes marking model not marking to the second number according to the dimension model of selection
Note data are marked in advance, may include to obtain the pre- markup information of the unlabeled data of the second number:In user's selection
In the case that dimension model is high probability mode or boundary scheme, each of unlabeled data of the second number is not marked
Unlabeled data input marking model is carried out class prediction by data, and the output result of marking model is used to indicate this and does not mark
Note data belong to the other probability of at least one predetermined class;And determine the predetermined classification of maximum probability at least one predetermined classification
For the pre- annotation results of the unlabeled data.
Illustratively, marking model can be used for predicting the classification of the unlabeled data of input.Marking model is at last
Layer (i.e. output layer) can export unlabeled data and belong to the other probability of a variety of different predetermined class (i.e. probability distribution), this epoch
Reason can be using the highest predetermined classification of wherein probability as the pre- annotation results of the unlabeled data.For example, being answered in number mark
In, by an image input marking model, marking model can export 11 dimensional vectors, and the value of 11 dimensions distinguishes table
Probability in diagram picture comprising 0~9 and other classifications (classification other than i.e. 0~9).It can according to the output result of marking model
With the classification of the affiliated maximum probability of each unlabeled data of determination.For example, after certain image is inputted marking model, according to mark
The output result of injection molding type determines that the image includes the maximum probability of number 5, then can determine that the pre- annotation results of the image are
Number 5.It is above-mentioned number mark example be more classification problems, although some embodiments of this paper using more classification problems as example into
Row description, but it is understood that, two classification problems are also applicable.In addition, it will be understood by those skilled in the art that more points
Class problem itself can also be decomposed into multiple two classification problems to be handled.For example, in number mark application, it can be by needle
Classifying and dividing to 0~9 is 10 individual two classification problems.For example, marking model may include multiple submodels, wherein the
One submodel be mainly used for judging in image whether comprising number 1, second submodel be mainly used for judging in image whether
Comprising number 2, and so on.In this case, the output result of first submodel can be only used for wrapping in instruction image
Probability containing number 1, the output result of second submodel can be only used for the probability, etc. comprising number 2 in instruction image.
In the case where dimension model is high probability mode or boundary scheme, can be described in the present embodiment by the way of
Determine the pre- annotation results of unlabeled data.
Another embodiment according to the present invention utilizes marking model not marking to the second number according to the dimension model of selection
Note data are marked in advance, may include to obtain the pre- markup information of the unlabeled data of the second number:In user's selection
In the case that dimension model is high parallel pattern, for each unlabeled data in the unlabeled data of the second number, by this
Unlabeled data inputs marking model, to extract the data characteristics of the unlabeled data;It is special according to the data of the unlabeled data
The data characteristics of sign and at least one of labeled data collection labeled data, calculate the unlabeled data and at least one
Similarity between labeled data;Classification belonging to the maximum labeled data of similarity between the determining and unlabeled data
For the pre- annotation results of the unlabeled data.
Illustratively, the phase between the unlabeled data of input and multiple labeled data can be calculated using marking model
Like degree, agency can be using classification belonging to the highest labeled data of the similarity between the unlabeled data as this at this time
The pre- annotation results of unlabeled data.
In the present embodiment, marking model can be used for the classification of the unlabeled data of prediction input.In such case
Under, data characteristics can be the output result of some middle layer (such as softmax layers preceding layer) of marking model.For example,
Assuming that unlabeled data is image, data characteristics can be the characteristic pattern of the last one convolutional layer output of marking model
(feature map)。
Similarity between two data can be measured with the distance of such as Euclidean distance.The data characteristics of two data
The distance between it is smaller, the similarity between two data is bigger.Can according to the data characteristics of unlabeled data and it is each
The distance between data characteristics of labeled data calculates the similarity between the unlabeled data and each labeled data.Ability
Field technique personnel are understood that the calculation of similarity, are repeated herein not to this.
Another embodiment according to the present invention, mode control may include the high probability control for being arranged in different location,
One or more in high similar control and boundary control, high probability control, high similar control and boundary control are respectively used to refer to
Show high probability mode, high parallel pattern and boundary scheme.
Referring to fig. 4, showing random control, high probability control, high similar control and boundary control is four kinds of different buttons
Control, they are located at different positions, control the selection of corresponding dimension model respectively.
Another embodiment according to the present invention, in one or more in high similar control, high probability control and boundary control
Each single item may include positive example control and negative example control, positive example control is used to controlling the positive example that belongs under corresponding dimension model
The display of unlabeled data, negative example control are used to control the display of the unlabeled data for belonging to negative example under corresponding dimension model.
As shown in figure 4, random control, high similar control, high probability control and boundary control are respectively divided into two kinds of controls, i.e.,
Therefore positive example control and negative example control include eight button controls relevant to dimension model on display interface shown in Fig. 4 altogether
Part.However, the classification and number of mode control shown in Fig. 4 are only exemplary rather than limitation of the present invention, for example, random control
Any control in part, high similar control, high probability control and boundary control can only include positive example control, and no longer specific
It is divided into two kinds of controls.
Positive example (Positive) as described herein refers to the case where annotation results are specified classifications, negative example
(Negative) referring to annotation results not is the case where specifying classification.Positive example and negative example itself are it can be appreciated that two
The different classification of kind.For example, if user selects positive example control, data labeling system can be defeated in face mark application
Pre- annotation results are that the unlabeled data (such as facial image) comprising face then counts, whereas if user selects negative example control out
It is the unlabeled data not comprising face that pre- annotation results, which can be exported, according to labeling system.Further, if user selection
It is the positive example control under high probability mode, then data labeling system can not mark number to the second number according to high probability mode
According to (description seen above) is marked in advance, judge whether each unlabeled data includes face.In addition, illustratively, number
The unlabeled data of the first number can also be shown according to the sequence of the probability comprising face from high to low according to labeling system.Instead
It, if user's selection is negative example control under high probability mode, data labeling system can be according to high probability mode pair
The unlabeled data of second number is marked in advance, judges whether each unlabeled data includes face.In addition, illustratively,
Data labeling system can also according to do not include face probability from high to low (namely comprising face probability from low to high)
Sequence show the unlabeled data of the first number.
Another embodiment according to the present invention, mode control can be drop down list control, and drop down list control provides and height
One or more corresponding drop-down list items in conceptual schema, high parallel pattern and boundary scheme.
The implementation of mode control shown in Fig. 4 is only a kind of example rather than limits that mode control can have other
Suitable implementation, such as can be realized using drop down list control.It, can be downward when user clicks drop down list control
Extend, shows multiple drop-down list items.User can further click on any drop-down list item, to select required mark mould
Formula.
Another embodiment according to the present invention, pre- markup information can also include data score, and step S210 can also include:
Data score is selected to be greater than the first score threshold or less than the second score threshold from the unlabeled data of second number
Unlabeled data of the unlabeled data as first number, or selected from the unlabeled data of second number
Unlabeled data of the unlabeled data of the preset number of data highest scoring as first number.
Illustratively, in the case where the dimension model that user selects is high probability mode, for not marking for the first number
Each unlabeled data in data is infused, the probability which belongs to current classification to be marked is the unlabeled data
Data score.For example, if current need to mark number 1, and user has selected high probability mode in number mark application
Under positive example control, then current classification to be marked is digital 1 positive example, i.e. classification as " including number 1 "., whereas if working as
Preceding needs mark number 1, and user has selected the negative example control under high probability mode, then and current classification to be marked is number 1
Negative example, i.e. classification as " not including number 1 ".
Illustratively, in the case where the dimension model that user selects is high parallel pattern, for not marking for the first number
Infuse each unlabeled data in data, the phase between the unlabeled data and the labeled data for belonging to current classification to be marked
It is the data score of the unlabeled data like degree.
For example, data score can be used for measuring the accuracy of the pre- annotation results of corresponding unlabeled data.For example, not
The data score of labeled data can be the unlabeled data belong to certain classification probability or with the mark that belongs to certain classification
Infuse the similarity between data.For example, it is assumed that the probability that certain image includes number 1 is 0.7, the probability comprising number 7 is
0.2, the total probability comprising other numbers is 0.1, it may be considered that the data of this image are scored at 0.7.
Illustratively, in the case where the dimension model that user selects is boundary scheme, for not marking for the first number
Each unlabeled data in data, the classification uncertainty or classification disunity degree of the unlabeled data are that this does not mark number
According to data score.
For example, data score can be used for measuring the mark value of corresponding unlabeled data.For example, unlabeled data
Data score can be the classification uncertainty or classification disunity degree of the unlabeled data.The classification of unlabeled data is not true
Fixed degree or classification disunity degree are higher, which is more difficult to classify.Such case can be understood as at unlabeled data
In on classification boundaries, being difficult to be divided into predetermined classification.For users, the accuracy of the pre- annotation results of these data compared with
It is low, therefore compared in the way of high probability or high sequencing of similarity, user needs to take more time to check data
Mark correctness.However, the data of these more difficult classification often carry more information, have very much to the training of marking model
It helps.Therefore, the classification uncertainty of unlabeled data or classification disunity degree are higher, it is believed that it marks value and gets over
Greatly.It illustratively, can be right according to the classification uncertainty of unlabeled data or the sequence of disunity degree from high to low of classifying
Unlabeled data is ranked up, and is shown according to the sequence sequenced.
Illustratively, the classification uncertainty of unlabeled data can belong at least one predetermined class for the unlabeled data
The entropy of other probability.The implementation for carrying out class prediction to unlabeled data using marking model is hereinbefore described, this
Place does not repeat.As described above, marking model can be distributed with output probability, such as above-mentioned 11 dimensional vector.It is general that these can be calculated
Classification uncertainty of the entropy of rate as unlabeled data.
Illustratively, marking model may include multiple submodels.Several different submodels can be trained simultaneously, these
Submodel carries out " ballot " to classification belonging to some unlabeled data.It can be measured using certain criterion between these ballots
Disunity degree (Disagreement), as data score.
When selecting the unlabeled data of the first number, data score can choose greater than predetermined threshold (i.e. the first score threshold
Value), it also can choose data score less than predetermined threshold (i.e. the second score threshold).Assuming that the data of unlabeled data obtain
It point is the probability that the unlabeled data belongs to certain classification.For example, in face mark application, it can not marking from the second number
The probability that selection belongs to face in note data is greater than the image of (or being less than) 0.6, and not using these images as the first number
Labeled data is shown.In another example can select to belong to from the unlabeled data of the second number face probability highest (or
It is minimum) 200 images, and shown these images as the unlabeled data of the first number.
Another embodiment according to the present invention, pre- markup information further include data score, in the display interface, the first number
Unlabeled data is arranged according to the data score of the unlabeled data of the first number.
The calculation of data score is hereinbefore described, does not repeat herein.Illustratively, if user wants mark
Image can be then shown on the desplay apparatus by number 1 according to the sequence of the probability comprising number 1 from high to low.Front is all
A possibility that probability is high, and pre- annotation results malfunction can smaller, is followed by that likelihood ratio is lower, and what pre- annotation results malfunctioned can
Energy property is bigger.User can check subsequent image whether really comprising number 1, if it is not, then can be entangled with emphasis
Just.It is quickly checked as it can be seen that sorting according to probability height and facilitating user.It is appreciated that can also according to include number 1 it is general
The sequence of rate from low to high shows image.That is, when sorting according to data score and show unlabeled data, it can basis
It needs to select preferentially to show the high or low unlabeled data of data score.
According to embodiments of the present invention, display interface may include menu bar region, and menu bar region may include controlling at random
Part;Data mask method 200 can also include:When receiving the selection information for random control, from unlabeled data collection
The unlabeled data of middle random selection third number;The unlabeled data of third number is shown in the display interface;Receive user
To the second feedback information of the unlabeled data of third number;And not marking for third number is determined according to the second feedback information
The final annotation results of data.
If the dimension model that user selects is stochastic model, random selection can be concentrated some not from unlabeled data
Labeled data is shown, is labeled by user.For example, if user is carried out by interactive device designation date labeling system
Face mark, and user has selected the positive example control under stochastic model, then selects at random in the image collection that can never mark
It selects several (example 1000 is opened) images and shows in the display interface.User can pick out the image for not including face wherein,
And information is corrected to data labeling system input error by interactive device, the annotation results of these images are corrected as not including
Face.Remaining user did not carried out the image corrected and was then defaulted as by data labeling system comprising face.
According to embodiments of the present invention, random control may include positive example control and negative example control, and positive example control is for controlling
The display of the unlabeled data for belonging to positive example under stochastic model, negative example control are used to control the negative example that belongs under stochastic model
The display of unlabeled data.
The effect of positive example control and negative example control is hereinbefore described, can understand the present embodiment with reference to above description,
Details are not described herein again.
Another embodiment according to the present invention, display interface may include menu bar region, and menu bar region may include leading
It is one or more in control, generating test set control and initialization model control out.Control is exported for controlling the first number
The final annotation results of at least partly unlabeled data and at least partly unlabeled data in purpose unlabeled data export as
The file of predetermined format, generating test set control are used to control the labeled data for concentrating selection predetermined number from labeled data
To obtain test set, test set is used to test the mark accuracy rate of marking model, and initialization model control is for controlling to mark
The parameter of model is initialized.
For example, data mask method 200 can also include:When receiving the selection information for export control, by the
The final annotation results of the unlabeled data of four numbers and the unlabeled data of the 4th number export as the file of predetermined format,
In, the unlabeled data of the 4th number is at least partly unlabeled data in the unlabeled data of the first number.
For example, concentrating selection predetermined from labeled data when receiving the selection information for generating test set control
For the labeled data of number to obtain test set, test set is used to test the mark accuracy rate of marking model.
For example, being carried out to the parameter of marking model initial when receiving the selection information for initialization model control
Change.
As shown in figure 4, showing " export ", " generating test set ", " initialization in the right area of menu bar region
These three controls of model ".It is button control that these three controls are shown in Fig. 4, merely illustrative rather than limitation.
" export " control is used to control the unlabeled data of the 4th number and its export of final annotation results.4th number
Unlabeled data can be current time shown unlabeled data in the display interface, be also possible to the first number not
Its final result indicates that it belongs to the unlabeled data of positive example (such as comprising number 1) in labeled data.For invalid data and/
Or belongs to the unlabeled data of negative example (such as not comprising number 1) and its final annotation results and can not export.Certainly, optional
Ground, can by the unlabeled data of the first number whole unlabeled data and its final annotation results export.Number is not marked
According to and its final annotation results can export as the file, such as text document, form document etc. of any suitable format.
" generating test set " control is used to control the generation of test set.When user clicks the control by interactive device,
Data labeling system can automatically generate test set.It include several labeled data in test set, practical annotation results are
Know.It can use marking model to be labeled the labeled data in test set, obtain test annotation results.Test is marked
It infuses result and practical annotation results compares, it may be determined that the mark accuracy rate of marking model.When the mark accuracy rate of marking model is super
When crossing preset threshold (such as 98%), it can use marking model and unlabeled data be labeled automatically.That is, at this
In the case of kind, marking model can be no longer labeled to annotation results (the i.e. above-mentioned pre- mark knot of acquisition to unlabeled data
Fruit) transfer to user to check.The annotating efficiency that can be further improved data is done so, and saves mark cost.
Marking model is constantly trained, and the accuracy rate marked in advance using marking model can be continuous with the carry out of mark
It improves, therefore marks and carry out to a certain extent error label usually only seldom in pre- annotation results obtained later.
" initialization model " control is used to control the initialization of marking model.Process is being labeled to unlabeled data
In, marking model can use the collection of labeled data in data pool and/or unlabeled data collection is trained.However, mark
Model is likely to occur a variety of situations, such as over-fitting, poor fitting etc. in the training process, may cause marking model increasingly
Difference, the error rate marked in advance are higher and higher.User can be carried out using parameter of the initialization model control to marking model at any time
Initialization, makes marking model be returned to original state, restarts to train.
Another embodiment according to the present invention, display interface can also include information bar region, and information bar region may include
For showing one or more regions in sample information, statistical information, accuracy rate information and shortcut key information, wherein sample
Example information may include the sample for belonging to the unlabeled data of current classification to be marked;Statistical information may include labeled data
Number, the number of unlabeled data, belong to positive example labeled data number, belong to negative example labeled data number
In it is one or more;Accuracy rate information is used to indicate the accuracy rate of marking model;Shortcut key information is used to indicate preset fast
Prompt key.
Referring to fig. 4, left side information bar region the top, show four samples (its be four images).Information
Four samples shown in column region are the sample for belonging to the unlabeled data of current classification to be marked.Due to the mode control of selection
For the positive example control under high probability mode, therefore current classification to be marked is digital 1 positive example.User can refer to sample information
Judge in unlabeled data which belongs to current classification to be marked, which is not belonging to current classification to be marked.
In the intermediate region in information bar region, statistical information is shown.As shown in figure 4, statistical information includes " not yet marking
Note ", " mark ", " positive example quantity ", " negative number of cases amount " these four information respectively indicate under current time labeled data
Number, the number of unlabeled data, belong to positive example labeled data number, belong to negative example labeled data number.
Bottom in information bar region, shows shortcut key information.For example, submit function that can be realized with space bar,
Upper jump function and lower jump function can be realized with w key and x key respectively.Upper jump, which refers to, is moved to this from current unlabeled data for cursor
A upper unlabeled data is arranged, lower jump, which refers to, is moved to the next unlabeled data of this column from current unlabeled data for cursor.Herein not
Introduce each shortcut key one by one, those skilled in the art can be described herein and Fig. 4 understands shortcut key information by reading.
In Fig. 4, accuracy rate information is shown in menu bar region.It is appreciated that Fig. 4 is only example, accuracy rate information
It can show in information bar region.Referring to fig. 4, the right area in menu bar region, show " training error rate " and
" Top1% error rate " this two information, this two information can reflect the accuracy rate of marking model." training error rate " and
" Top1% error rate " is lower, and the accuracy rate of marking model is higher.
Illustratively, display interface may include reversion control.Reversion control will be shown in the display interface for controlling
The current annotation results of unlabeled data be negative by positive example update and example or positive example be updated to by negative example.For example, step S240 can
To include:When receiving the selection information for reversion control, by all unlabeled data shown in the display interface
Current annotation results are updated by positive example to be negative example or is updated to positive example by negative example, wherein the unlabeled data of the first number is most
Whole annotation results are current annotation results of the unlabeled data in mark finish time of the first number.Referring to fig. 4, in marked area
The upper left corner area in domain shows " reversion " control, clicks the control, can working as the unlabeled data in entire display interface
Preceding annotation results are inverted, for example, being reversed to originally comprising number 1 not comprising number 1.It may sometimes be dredged due to user
Suddenly, the reasons such as systematic error cause a large amount of unlabeled data the annotation results opposite with concrete class occur, therefore with reversion
Control inverts the annotation results of a large amount of unlabeled data simultaneously, can quickly handle the mistake mark of a large amount of unlabeled data, be not necessarily to
User corrects one by one manually.
Illustratively, display interface may include one in filter controls, filtering threshold control and filtering number control
Or it is multinomial.Filter controls are used to control the filtration fraction unlabeled data from the unlabeled data marked in advance using marking model,
Be used to show using remaining unlabeled data as the unlabeled data of the first number, filtering threshold control for control for from
The score threshold of unlabeled data is filtered in the unlabeled data marked in advance using marking model, filtering number control is for controlling
The number of the unlabeled data filtered from the unlabeled data marked in advance using marking model.
For example, step S210 may include:If filter controls are in folded state, marked in advance from using marking model
Unlabeled data in selection data score be greater than third score threshold or less than the 4th score threshold unlabeled data make
For the unlabeled data of the first number, or from the unlabeled data marked in advance using marking model selection except data score most
Unlabeled data of the unlabeled data as the first number other than the unlabeled data of high or minimum predetermined number, if
Filter controls are in unfolded state, then do not mark number using the unlabeled data marked in advance using marking model as the first number
According to.
For example, data mask method 200 can also include:When receiving the operation information for filtering threshold control,
Third score threshold or the 4th score threshold are determined according to the mode of operation of filtering threshold control.
For example, data mask method 200 can also include:When receiving the operation information for filtering number control,
Predetermined number is determined according to the mode of operation of filtering number control.
It can use marking model to mark in advance several unlabeled data (such as unlabeled data of above-mentioned second number), and
According to filter controls and filtering threshold control or filter number control instruction, from all unlabeled data marked in advance select to
Small part unlabeled data is shown that remaining unlabeled data is then filtered not as the unlabeled data of the first number
It shows.
Illustratively, filter controls can be folding and expanding control, be folded state when it is "+" (as shown in Figure 4),
It is unfolded state when it is "-".For example, user can click filter controls by left mouse button to change its state.Example
Property, filtering threshold control can be slider control or Input.User can drag slider control to change
It filters threshold value (i.e. third score threshold or the 4th score threshold), or can directly input a numerical value conduct in Input
Filtering threshold.Referring to fig. 4, in the upper left corner area of tab area, "+" control is shown, which is filter controls.Filtering control
" 1 " control shown on the right side of part is filtering threshold control, and the current filter threshold value shown in Fig. 4 is " 1 ", and is now in height
Under conceptual schema, that is to say, that probability will will be filtered less than 1 for belonging to positive example (such as comprising number 1), due to not marking
Data belong to probability of all categories and are substantially less than 1, therefore actually whole unlabeled data can be shown.
In addition, referring to fig. 4, in the upper left corner area of tab area, it is also shown that control as " hiding x " hides number
Mesh x can be inputted in Input.One is also shown on the right side of Input for inputting hiding number " to submit hidden
Hiding " button control.User inputs after number x in text box, clicks " submit and hide " control, data labeling system
Using number x as hiding number, i.e., the predetermined number used when filter controls filter.It may include above-mentioned for filtering number control
Input " submit and the hide " control for the Input and right side for hiding number.For example, being labeled and locating for number 1
When under high probability mode, user inputs number 100, then it is highest can to filter the probability comprising number 1 for data labeling system
100 images, only show residual image.Under high probability or high parallel pattern, can cross filter data score it is high do not mark number
According to because the accuracy height of the pre- annotation results of these unlabeled data, may not need user's inspection.It, can under boundary scheme
To cross the low unlabeled data of filter data score, because the higher mark value of data score is higher.
Illustratively, display interface can also include submitting control.For example, data mask method 200 can also include:When
When receiving for the selection information for submitting control, determine that the current annotation results of the unlabeled data of the first number are the first number
The final annotation results of purpose unlabeled data.Referring to fig. 4, in the upper left corner area of tab area, " submission " control is shown.With
After submission control is clicked at family, data labeling system can be using the current annotation results of the unlabeled data of the first number as most
Whole annotation results, and (the 5th number does not mark number by at least partly unlabeled data in the unlabeled data of the first number
According to) and its final annotation results be stored in labeled data concentration, and concentrated from unlabeled data and remove not marking for the 5th number
Data are infused, to update data pool.
Illustratively, display interface can also include display number control.For example, data mask method 200 can also wrap
It includes:When receiving the operation information for display number control, determined according to the mode of operation of display number control current
The number for the unlabeled data that moment shows in the display interface.Illustratively, the display number control is slider control
Or Input.User can drag slider control to change display number, or can be directly in Input
A numerical value is inputted as display number.Referring to fig. 4, in the upper area of tab area, the right side of " submit and hide " control is shown
One slider control, the slider control are to show number control.
Illustratively, display interface can also include page scroll control.For example, data mask method 200 can also wrap
It includes:When receiving the operation information for page scroll control, shown according to the update of the mode of operation of page scroll control
The unlabeled data shown on interface.Referring to fig. 4, in the rightmost side region of tab area, a scroll bar control, the rolling are shown
Control is page scroll control.When the unlabeled data of the first number is more, can not disposably show in the display interface
When, the display situation of page scroll control control unlabeled data can be used.
Illustratively, display interface can also include previous wave control and/or latter wave control.For example, step S220 can
To include:When receiving the selection information for previous wave control, it is shown in during previous mark in the display interface
Shown unlabeled data;And/or it when receiving the selection information for latter wave control, shows in the display interface
The unlabeled data to be shown in annotation process next time.Referring to fig. 4, in the upper right corner of tab area, " previous wave " is shown
" latter wave " control, the two controls are shown as button control.When the user clicks when previous wave control, display interface can be shown
Show the data that last consignment of marked, this batch data has been stored in labeled data concentration as labeled data originally.When with
After family selects previous wave control, this batch data can be concentrated from labeled data and be removed, it is aobvious to be re-used as unlabeled data
Show in the display interface, and the final annotation results that this batch data obtains in upper primary annotation process can be shown, with
It is checked by user.Annotation process as described herein refers to be displayed on the display interface to mark from unlabeled data at the end of
Carve (such as user clicks at the time of submitting control) this process.When the user clicks when latter wave control, display interface can be shown
Show next group unlabeled data to be marked.
It should be noted that in various embodiments of the present invention, it, can when user is operated by executing to a certain control
To receive the selection information or operation information that user is directed to the control.Such as user can be used mouse, touch screen, keyboard or
The operation such as phonetic order or selection control.
Another embodiment according to the present invention, step S230 may include:The reversion received for specific unlabeled data refers to
It enables;Step S240 may include:By the current annotation results of specific unlabeled data by positive example update be negative example or by negative example more
It is newly positive example, wherein the final annotation results of the unlabeled data of the first number are that the unlabeled data of the first number is marking
The current annotation results of finish time.
Illustratively, toggling command may include that the left mouse button of display area where being directed to specific unlabeled data is clicked
Operation.
User can click any unlabeled data (such as any image in Fig. 4) with left mouse button, if this is not marked
The current annotation results of note data are positive example (such as being noted as image includes number 1), then can be by the current annotation results
Update is negative example (such as be noted as image and do not include number 1).
Toggling command can also be other instructions.Refer in the shortcut key information in information bar region for example, with reference to Fig. 4
Show that reverse function may be implemented in the s key on keyboard.Therefore, if user by cursor dwell on any unlabeled data, and press
S key on lower keyboard then can equally invert the current annotation results of the unlabeled data.
Another embodiment according to the present invention, step S230 may include:Receive the invalid finger for being directed to specific unlabeled data
It enables;Step S240 may include:Specific unlabeled data is labeled as invalid data to obtain the current of specific unlabeled data
Annotation results, wherein the final annotation results of the unlabeled data of the first number are that the unlabeled data of the first number is marking
The current annotation results of finish time.
Invalid data is exactly the data for not being suitable for carrying out current class, such as current needs carry out the mark of number 1,
But be doped with a facial image in image collection relevant to digital mark, then the facial image can be labeled as in vain
Data, it is completely irrelevant with number mark, it can choose and no longer carry out any and digital correlation mark to it.
Illustratively, illegal command includes the left mouse button double-click behaviour for display area where specific unlabeled data
Make.
According to a further aspect of the invention, a kind of data annotation equipment is provided.Fig. 5 is shown according to an embodiment of the present invention
Data annotation equipment 500 schematic block diagram.
As shown in figure 5, data annotation equipment 500 according to an embodiment of the present invention includes obtaining module 510, display module
520, receiving module 530 and result determining module 540.Optionally, device 500 can also include display device.Each mould
Block can execute each step/function above in conjunction with Fig. 2-4 data mask method described respectively.Below only to the data mark
The major function of each component of dispensing device 500 is described, and omits the detail content having been described above.
Unlabeled data and its pre- markup information that module 510 is used to obtain the first number are obtained, pre- markup information is benefit
Carry out what mark in advance obtained with unlabeled data of the marking model to the first number, pre- markup information includes pre- annotation results.It obtains
The program instruction that modulus block 510 can store in 102 Running storage device 104 of processor in electronic equipment as shown in Figure 1
To realize.
Display module 520 be used for show in the display interface the first number unlabeled data and its pre- annotation results.It is aobvious
Show the program instruction that module 520 can store in 102 Running storage device 104 of processor in electronic equipment as shown in Figure 1
To realize.
Receiving module 530 is for receiving user to the first feedback information of the unlabeled data of the first number.Receiving module
530 program instructions that can be stored in 102 Running storage device 104 of processor in electronic equipment as shown in Figure 1 are realized.
As a result determining module 540 is used to determine the final mark of the unlabeled data of the first number according to the first feedback information
As a result.As a result determining module 540 can store in 102 Running storage device 104 of processor in electronic equipment as shown in Figure 1
Program instruction realize.
Illustratively, display interface includes tab area and menu bar region, and the unlabeled data of the first number is shown in
In tab area, menu bar region includes the mode control for being used to indicate the dimension model of data in tab area, dimension model
Be it is one or more in high probability mode, high parallel pattern and boundary scheme, obtain module 510 be specifically used for:Determine user
The dimension model selected by mode control;According to the dimension model of selection, marking model not marking to the second number is utilized
Data are marked in advance, and to obtain the pre- markup information of the unlabeled data of the second number, the pre- labeled data of the first number is
At least partly unlabeled data in the unlabeled data of second number.
Illustratively, mode control includes the high probability control for being arranged in different location, high similar control and boundary
One or more in control, high probability control, high similar control and boundary control are respectively used to instruction high probability mode, Gao Xiang
Antitype and boundary scheme.
Illustratively, each single item in one or more in high similar control, high probability control and boundary control includes
Positive example control and negative example control, positive example control are used to control the aobvious of the unlabeled data for belonging to positive example under corresponding dimension model
Show, negative example control is used to control the display of the unlabeled data for belonging to negative example under corresponding dimension model.
Illustratively, mode control is drop down list control, and drop down list control provides and high probability mode, high similar mould
One or more corresponding drop-down list items in formula and boundary scheme.
Illustratively, pre- markup information further includes data score, obtains module 510 and is specifically also used to:From the second number
Selected in unlabeled data data score be greater than the first score threshold or less than the second score threshold unlabeled data as
The unlabeled data of first number, or the preset number of selection data highest scoring from the unlabeled data of the second number
Unlabeled data of the unlabeled data as the first number.
Illustratively, pre- markup information further includes data score, and in the display interface, the unlabeled data of the first number is
It is arranged according to the data score of the unlabeled data of the first number.
Illustratively, display interface includes menu bar region, and menu bar region includes random control;Data annotation equipment
500 further include:Selecting module (not shown), for when receiving the selection information for random control, from unlabeled data
Concentrate the unlabeled data of random selection third number;Display module 520 is also used to show third number in the display interface
Unlabeled data;Receiving module 530 is also used to receive user to the second feedback information of the unlabeled data of third number;And
As a result determining module 540 is also used to determine the final annotation results of the unlabeled data of third number according to the second feedback information.
Illustratively, random control includes positive example control and negative example control, and positive example control is for controlling under stochastic model
Belong to the display of the unlabeled data of positive example, negative example control is used to control the unlabeled data for belonging to negative example under stochastic model
Display.
Illustratively, display interface includes menu bar region, and menu bar region includes export control, generating test set control
With it is one or more in initialization model control, wherein export control is for control will be in the unlabeled data of the first number
At least partly unlabeled data and the final annotation results of at least partly unlabeled data export as the file of predetermined format, it is raw
Being used to control from labeled data at test set control concentrates the labeled data of selection predetermined number to obtain test set, test
Collect the mark accuracy rate for testing marking model, initialization model control, which is used to control, carries out initially the parameter of marking model
Change.
Illustratively, display interface further includes information bar region, and information bar region includes for showing sample information, statistics
One or more regions in information, accuracy rate information and shortcut key information, wherein sample information includes belonging to currently wait mark
Infuse the sample of the unlabeled data of classification;Statistical information include the number of labeled data, the number of unlabeled data, belong to just
It is the number of labeled data of example, one or more in the number for the labeled data for belonging to negative example;Accuracy rate information is used for
Indicate the accuracy rate of marking model;Shortcut key information is used to indicate preset shortcut key.
Illustratively, pre- markup information further includes data score, and display interface includes reversion control, filter controls, filtering
It is one or more in threshold controls and filtering number control, wherein reversion control will be shown in the display interface for controlling
The current annotation results of unlabeled data be negative by positive example update and example or positive example be updated to by negative example, filter controls are for controlling
The filtration fraction unlabeled data from the unlabeled data marked in advance using marking model, using remaining unlabeled data as
The unlabeled data of one number is for showing, filtering threshold control is for controlling for not marking from what is marked in advance using marking model
The score threshold that unlabeled data is filtered in data is infused, filtering number control is marked not for controlling from using marking model in advance
The number of the unlabeled data filtered in labeled data.
Illustratively, filtering threshold control is slider control or Input.
Illustratively, receiving module 530 is specifically used for:Receive the toggling command for being directed to specific unlabeled data;As a result really
Cover half block 540 is specifically used for:The current annotation results of specific unlabeled data are negative by positive example update and example or are updated by negative example
For positive example, wherein the final annotation results of the unlabeled data of the first number are that the unlabeled data of the first number is tied in mark
The current annotation results at beam moment.
Illustratively, toggling command includes that the left mouse button of display area where being directed to specific unlabeled data clicks behaviour
Make.
Illustratively, receiving module 530 is specifically used for:Receive the illegal command for being directed to specific unlabeled data;As a result really
Cover half block 540 is specifically used for:Specific unlabeled data is labeled as invalid data to obtain the current mark of specific unlabeled data
Infuse result, wherein the final annotation results of the unlabeled data of the first number are that the unlabeled data of the first number is tied in mark
The current annotation results at beam moment.
Illustratively, illegal command includes the left mouse button double-click behaviour for display area where specific unlabeled data
Make.
Illustratively, display interface includes menu bar region, information bar region and tab area, and menu bar region is display
The upper area at interface, information bar region are the left area in the lower area of display interface, and tab area is lower area
In right area.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
The scope of the present invention.
Fig. 6 shows the schematic block diagram of data labeling system 600 according to an embodiment of the invention.Data mark system
System 600 includes display device 610, storage device 620 and processor 630.
The display device 610 be used for show show unlabeled data, unlabeled data pre- annotation results and other
Information.
The storage of storage device 620 is for realizing the corresponding steps in data mask method according to an embodiment of the present invention
Computer program instructions.
The processor 630 is for running the computer program instructions stored in the storage device 620, to execute basis
The corresponding steps of the data mask method of the embodiment of the present invention.
In one embodiment, for executing following step when the computer program instructions are run by the processor 630
Suddenly:The unlabeled data and its pre- markup information of the first number are obtained, pre- markup information is using marking model to the first number
Unlabeled data carry out what pre- mark obtained, pre- markup information includes pre- annotation results;The first number of display in the display interface
Purpose unlabeled data and its pre- annotation results;User is received to the first feedback information of the unlabeled data of the first number;With
And the final annotation results of the unlabeled data of the first number are determined according to the first feedback information.
Illustratively, display interface includes tab area and menu bar region, and the unlabeled data of the first number is shown in
In tab area, menu bar region includes the mode control for being used to indicate the dimension model of data in tab area, dimension model
To be one or more in high probability mode, high parallel pattern and boundary scheme, the computer program instructions are by the processing
The step of unlabeled data and its pre- markup information of the first number of acquisition of used execution, includes when the operation of device 630:It determines
The dimension model that user is selected by mode control;According to the dimension model of selection, using marking model to the second number not
Labeled data is marked in advance, to obtain the pre- markup information of the unlabeled data of the second number, the pre- mark number of the first number
According at least partly unlabeled data in the unlabeled data for the second number.
Illustratively, mode control includes the high probability control for being arranged in different location, high similar control and boundary
One or more in control, high probability control, high similar control and boundary control are respectively used to instruction high probability mode, Gao Xiang
Antitype and boundary scheme.
Illustratively, each single item in one or more in high similar control, high probability control and boundary control includes
Positive example control and negative example control, positive example control are used to control the aobvious of the unlabeled data for belonging to positive example under corresponding dimension model
Show, negative example control is used to control the display of the unlabeled data for belonging to negative example under corresponding dimension model.
Illustratively, mode control is drop down list control, and drop down list control provides and high probability mode, high similar mould
One or more corresponding drop-down list items in formula and boundary scheme.
Illustratively, pre- markup information further includes data score, and the computer program instructions are transported by the processor 630
The step of unlabeled data and its pre- markup information of the first number of acquisition of used execution when row further includes:From the second number
Unlabeled data in selection data score be greater than the first score threshold or less than the second score threshold unlabeled data make
For the unlabeled data of the first number, or select from the unlabeled data of the second number the preset number of data highest scoring
Unlabeled data of the unlabeled data as the first number.
Illustratively, pre- markup information further includes data score, and in the display interface, the unlabeled data of the first number is
It is arranged according to the data score of the unlabeled data of the first number.
Illustratively, display interface includes menu bar region, and menu bar region includes random control;The computer program
Instruction is also used to execute following steps when being run by the processor 630:When receiving the selection information for random control,
The unlabeled data of random selection third number is concentrated from unlabeled data;Not marking for third number is shown in the display interface
Data;User is received to the second feedback information of the unlabeled data of third number;And is determined according to the second feedback information
The final annotation results of the unlabeled data of three numbers.
Illustratively, random control includes positive example control and negative example control, and positive example control is for controlling under stochastic model
Belong to the display of the unlabeled data of positive example, negative example control is used to control the unlabeled data for belonging to negative example under stochastic model
Display.
Illustratively, display interface includes menu bar region, and menu bar region includes export control, generating test set control
With it is one or more in initialization model control, wherein export control is for control will be in the unlabeled data of the first number
At least partly unlabeled data and the final annotation results of at least partly unlabeled data export as the file of predetermined format, it is raw
Being used to control from labeled data at test set control concentrates the labeled data of selection predetermined number to obtain test set, test
Collect the mark accuracy rate for testing marking model, initialization model control, which is used to control, carries out initially the parameter of marking model
Change.
Illustratively, display interface further includes information bar region, and information bar region includes for showing sample information, statistics
One or more regions in information, accuracy rate information and shortcut key information, wherein sample information includes belonging to currently wait mark
Infuse the sample of the unlabeled data of classification;Statistical information include the number of labeled data, the number of unlabeled data, belong to just
It is the number of labeled data of example, one or more in the number for the labeled data for belonging to negative example;Accuracy rate information is used for
Indicate the accuracy rate of marking model;Shortcut key information is used to indicate preset shortcut key.
Illustratively, pre- markup information further includes data score, and display interface includes reversion control, filter controls, filtering
It is one or more in threshold controls and filtering number control, wherein reversion control will be shown in the display interface for controlling
The current annotation results of unlabeled data be negative by positive example update and example or positive example be updated to by negative example, filter controls are for controlling
The filtration fraction unlabeled data from the unlabeled data marked in advance using marking model, using remaining unlabeled data as
The unlabeled data of one number is for showing, filtering threshold control is for controlling for not marking from what is marked in advance using marking model
The score threshold that unlabeled data is filtered in data is infused, filtering number control is marked not for controlling from using marking model in advance
The number of the unlabeled data filtered in labeled data.
Illustratively, filtering threshold control is slider control or Input.
Illustratively, the reception user of used execution when the computer program instructions are run by the processor 630
Include to the step of the first feedback information of the unlabeled data of the first number:The reversion received for specific unlabeled data refers to
It enables;The computer program instructions when being run by the processor 630 used execution according to the first feedback information determine
The step of final annotation results of the unlabeled data of one number includes:By the current annotation results of specific unlabeled data by just
Example, which updates, to be negative example or is updated to positive example by negative example, wherein the final annotation results of the unlabeled data of the first number are first
Current annotation results of the unlabeled data of number in mark finish time.
Illustratively, toggling command includes that the left mouse button of display area where being directed to specific unlabeled data clicks behaviour
Make.
Illustratively, the reception user of used execution when the computer program instructions are run by the processor 630
Include to the step of feedback information of the unlabeled data of the first number:Receive the illegal command for being directed to specific unlabeled data;
The computer program instructions when being run by the processor 630 used execution according to the first feedback information determine first number
The step of final annotation results of purpose unlabeled data includes:Specific unlabeled data is labeled as invalid data to obtain spy
Determine the current annotation results of unlabeled data, wherein the final annotation results of the unlabeled data of the first number are the first number
Unlabeled data mark finish time current annotation results.
Illustratively, illegal command includes the left mouse button double-click behaviour for display area where specific unlabeled data
Make.
Illustratively, display interface includes menu bar region, information bar region and tab area, and menu bar region is display
The upper area at interface, information bar region are the left area in the lower area of display interface, and tab area is lower area
In right area.
In addition, according to embodiments of the present invention, additionally providing a kind of storage medium, storing program on said storage
Instruction, when described program instruction is run by computer or processor for executing the data mask method of the embodiment of the present invention
Corresponding steps, and for realizing the corresponding module in data annotation equipment according to an embodiment of the present invention.The storage medium
It such as may include the storage card of smart phone, the storage unit of tablet computer, the hard disk of personal computer, read-only memory
(ROM), Erasable Programmable Read Only Memory EPROM (EPROM), portable compact disc read-only memory (CD-ROM), USB storage,
Or any combination of above-mentioned storage medium.
In one embodiment, described program instruction can make computer or place when being run by computer or processor
Reason device realizes each functional module of data annotation equipment according to an embodiment of the present invention, and and/or can execute according to this
The data mask method of inventive embodiments.
In one embodiment, described program instruction is at runtime for executing following steps:Obtain the first number not
Labeled data and its pre- markup information, pre- markup information are marked in advance using unlabeled data of the marking model to the first number
What note obtained, pre- markup information includes pre- annotation results;The unlabeled data of the first number and its pre- is shown in the display interface
Annotation results;User is received to the first feedback information of the unlabeled data of the first number;And it is true according to the first feedback information
The final annotation results of the unlabeled data of fixed first number.
Illustratively, display interface includes tab area and menu bar region, and the unlabeled data of the first number is shown in
In tab area, menu bar region includes the mode control for being used to indicate the dimension model of data in tab area, dimension model
To be one or more in high probability mode, high parallel pattern and boundary scheme, described program instruction is used at runtime to be held
The step of unlabeled data and its pre- markup information of capable the first number of acquisition includes:Determine that user is selected by mode control
Dimension model;According to the dimension model of selection, marked in advance using unlabeled data of the marking model to the second number, with
The pre- markup information of the unlabeled data of the second number is obtained, the pre- labeled data of the first number does not mark number for the second number
At least partly unlabeled data in.
Illustratively, mode control includes the high probability control for being arranged in different location, high similar control and boundary
One or more in control, high probability control, high similar control and boundary control are respectively used to instruction high probability mode, Gao Xiang
Antitype and boundary scheme.
Illustratively, each single item in one or more in high similar control, high probability control and boundary control includes
Positive example control and negative example control, positive example control are used to control the aobvious of the unlabeled data for belonging to positive example under corresponding dimension model
Show, negative example control is used to control the display of the unlabeled data for belonging to negative example under corresponding dimension model.
Illustratively, mode control is drop down list control, and drop down list control provides and high probability mode, high similar mould
One or more corresponding drop-down list items in formula and boundary scheme.
Illustratively, pre- markup information further includes data score, and what is executed used in described program instruction at runtime obtains
The step of taking the unlabeled data and its pre- markup information of the first number further include:It is selected from the unlabeled data of the second number
Data score is greater than the first score threshold or unlabeled data not the marking as the first number less than the second score threshold
Data, or select from the unlabeled data of the second number data highest scoring preset number unlabeled data as
The unlabeled data of one number.
Illustratively, pre- markup information further includes data score, and in the display interface, the unlabeled data of the first number is
It is arranged according to the data score of the unlabeled data of the first number.
Illustratively, display interface includes menu bar region, and menu bar region includes random control;Described program instruction exists
It is also used to execute following steps when operation:When receiving the selection information for random control, from unlabeled data concentrate with
The unlabeled data of machine selection third number;The unlabeled data of third number is shown in the display interface;User is received to the
Second feedback information of the unlabeled data of three numbers;And the unlabeled data of third number is determined according to the second feedback information
Final annotation results.
Illustratively, random control includes positive example control and negative example control, and positive example control is for controlling under stochastic model
Belong to the display of the unlabeled data of positive example, negative example control is used to control the unlabeled data for belonging to negative example under stochastic model
Display.
Illustratively, display interface includes menu bar region, and menu bar region includes export control, generating test set control
With it is one or more in initialization model control, wherein export control is for control will be in the unlabeled data of the first number
At least partly unlabeled data and the final annotation results of at least partly unlabeled data export as the file of predetermined format, it is raw
Being used to control from labeled data at test set control concentrates the labeled data of selection predetermined number to obtain test set, test
Collect the mark accuracy rate for testing marking model, initialization model control, which is used to control, carries out initially the parameter of marking model
Change.
Illustratively, display interface further includes information bar region, and information bar region includes for showing sample information, statistics
One or more regions in information, accuracy rate information and shortcut key information, wherein sample information includes belonging to currently wait mark
Infuse the sample of the unlabeled data of classification;Statistical information include the number of labeled data, the number of unlabeled data, belong to just
It is the number of labeled data of example, one or more in the number for the labeled data for belonging to negative example;Accuracy rate information is used for
Indicate the accuracy rate of marking model;Shortcut key information is used to indicate preset shortcut key.
Illustratively, pre- markup information further includes data score, and display interface includes reversion control, filter controls, filtering
It is one or more in threshold controls and filtering number control, wherein reversion control will be shown in the display interface for controlling
The current annotation results of unlabeled data be negative by positive example update and example or positive example be updated to by negative example, filter controls are for controlling
The filtration fraction unlabeled data from the unlabeled data marked in advance using marking model, using remaining unlabeled data as
The unlabeled data of one number is for showing, filtering threshold control is for controlling for not marking from what is marked in advance using marking model
The score threshold that unlabeled data is filtered in data is infused, filtering number control is marked not for controlling from using marking model in advance
The number of the unlabeled data filtered in labeled data.
Illustratively, filtering threshold control is slider control or Input.
Illustratively, the used reception user executed does not mark number to the first number at runtime for described program instruction
According to the first feedback information the step of include:Receive the toggling command for being directed to specific unlabeled data;Described program instruction is being transported
When row the step of the final annotation results of unlabeled data for determining the first number according to the first feedback information of used execution
Including:The current annotation results of specific unlabeled data are updated by positive example and is negative example or positive example is updated to by negative example, wherein the
The final annotation results of the unlabeled data of one number are current mark of the unlabeled data in mark finish time of the first number
Infuse result.
Illustratively, toggling command includes that the left mouse button of display area where being directed to specific unlabeled data clicks behaviour
Make.
Illustratively, the used reception user executed does not mark number to the first number at runtime for described program instruction
According to feedback information the step of include:Receive the illegal command for being directed to specific unlabeled data;Described program instructs at runtime
Used execution according to the first feedback information determine the first number unlabeled data final annotation results the step of include:
Specific unlabeled data is labeled as invalid data to obtain the current annotation results of specific unlabeled data, wherein the first number
The final annotation results of purpose unlabeled data are current mark knot of the unlabeled data in mark finish time of the first number
Fruit.
Illustratively, illegal command includes the left mouse button double-click behaviour for display area where specific unlabeled data
Make.
Illustratively, display interface includes menu bar region, information bar region and tab area, and menu bar region is display
The upper area at interface, information bar region are the left area in the lower area of display interface, and tab area is lower area
In right area.
Each module in data labeling system according to an embodiment of the present invention can pass through reality according to an embodiment of the present invention
The processor computer program instructions that store in memory of operation of the electronic equipment of data mark are applied to realize, or can be with
The computer instruction stored in the computer readable storage medium of computer program product according to an embodiment of the present invention is counted
Calculation machine is realized when running.
Although describing example embodiment by reference to attached drawing here, it should be understood that above example embodiment are only exemplary
, and be not intended to limit the scope of the invention to this.Those of ordinary skill in the art can carry out various changes wherein
And modification, it is made without departing from the scope of the present invention and spiritual.All such changes and modifications are intended to be included in appended claims
Within required the scope of the present invention.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
The scope of the present invention.
In several embodiments provided herein, it should be understood that disclosed device and method can pass through it
Its mode is realized.For example, apparatus embodiments described above are merely indicative, for example, the division of the unit, only
Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied
Another equipment is closed or is desirably integrated into, or some features can be ignored or not executed.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the present invention and help to understand one or more of the various inventive aspects,
To in the description of exemplary embodiment of the present invention, each feature of the invention be grouped together into sometimes single embodiment, figure,
Or in descriptions thereof.However, the method for the invention should not be construed to reflect following intention:It is i.e. claimed
The present invention claims features more more than feature expressly recited in each claim.More precisely, such as corresponding power
As sharp claim reflects, inventive point is that the spy of all features less than some disclosed single embodiment can be used
Sign is to solve corresponding technical problem.Therefore, it then follows thus claims of specific embodiment are expressly incorporated in this specific
Embodiment, wherein each, the claims themselves are regarded as separate embodiments of the invention.
It will be understood to those skilled in the art that any combination pair can be used other than mutually exclusive between feature
All features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed any method
Or all process or units of equipment are combined.Unless expressly stated otherwise, this specification (is wanted including adjoint right
Ask, make a summary and attached drawing) disclosed in each feature can be replaced with an alternative feature that provides the same, equivalent, or similar purpose.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any
Can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors
Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice
Microprocessor or digital signal processor (DSP) realize some moulds in data annotation equipment according to an embodiment of the present invention
The some or all functions of block.The present invention is also implemented as a part or complete for executing method as described herein
The program of device (for example, computer program and computer program product) in portion.It is such to realize that program of the invention can store
On a computer-readable medium, it or may be in the form of one or more signals.Such signal can be from internet
Downloading obtains on website, is perhaps provided on the carrier signal or is provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch
To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame
Claim.
The above description is merely a specific embodiment or to the explanation of specific embodiment, protection of the invention
Range is not limited thereto, and anyone skilled in the art in the technical scope disclosed by the present invention, can be easily
Expect change or replacement, should be covered by the protection scope of the present invention.Protection scope of the present invention should be with claim
Subject to protection scope.
Claims (21)
1. a kind of data mask method, including:
The unlabeled data and its pre- markup information of the first number are obtained, the pre- markup information is using marking model to described
The unlabeled data of first number carries out what pre- mark obtained, and the pre- markup information includes pre- annotation results;
Show in the display interface first number unlabeled data and its pre- annotation results;
User is received to the first feedback information of the unlabeled data of first number;And
The final annotation results of the unlabeled data of first number are determined according to first feedback information.
2. the method for claim 1, wherein the display interface includes tab area and menu bar region, and described
The unlabeled data of one number is shown in the tab area, and the menu bar region includes being used to indicate the tab area
The mode control of the dimension model of interior data, the dimension model are in high probability mode, high parallel pattern and boundary scheme
It is one or more,
The unlabeled data for obtaining the first number and its pre- markup information include:
Determine the dimension model that user is selected by the mode control;
According to the selected dimension model, marked in advance using unlabeled data of the marking model to the second number,
To obtain the pre- markup information of the unlabeled data of second number, the pre- labeled data of first number is described second
At least partly unlabeled data in the unlabeled data of number.
3. method according to claim 2, wherein the mode control includes the high probability control for being arranged in different location
One or more, the high probability control, the similar control of the height and the boundary in part, high similar control and boundary control
Control is respectively used to indicate the high probability mode, the high parallel pattern and the boundary scheme.
4. method as claimed in claim 3, wherein the similar control of the height, the high probability control and the boundary control
In it is one or more in each single item include positive example control and negative example control, the positive example control is for controlling corresponding mark mould
The display of the unlabeled data for belonging to positive example under formula, the negative example control belong to negative example under corresponding dimension model for controlling
Unlabeled data display.
5. method according to claim 2, wherein the mode control is drop down list control, the drop down list control
It provides and one or more corresponding drop-downs in the high probability mode, the high parallel pattern and the boundary scheme
List items.
6. method according to claim 2, wherein the pre- markup information further includes data score, described to obtain the first number
Purpose unlabeled data and its pre- markup information further include:
Data score is selected to be greater than the first score threshold or less than the second score from the unlabeled data of second number
Unlabeled data of the unlabeled data of threshold value as first number, or from the unlabeled data of second number
Select the unlabeled data of the preset number of data highest scoring as the unlabeled data of first number.
7. the method for claim 1, wherein the pre- markup information further includes data score, in the display interface
On, the unlabeled data of first number is arranged according to the data score of the unlabeled data of first number.
8. the method for claim 1, wherein the display interface includes menu bar region, the menu bar region packet
Include random control;
The method also includes:
When receiving the selection information for the random control, random selection third number is concentrated not from unlabeled data
Labeled data;
The unlabeled data of the third number is shown on the display interface;
User is received to the second feedback information of the unlabeled data of the third number;And
The final annotation results of the unlabeled data of the third number are determined according to second feedback information.
9. method according to claim 8, wherein the random control includes positive example control and negative example control, the positive example
Control is used to control the display of the unlabeled data for belonging to positive example under stochastic model, and the negative example control is for controlling random mould
The display of the unlabeled data for belonging to negative example under formula.
10. the method for claim 1, wherein the display interface includes menu bar region, the menu bar region packet
It includes one or more in export control, generating test set control and initialization model control, wherein
At least partly unlabeled data and institute that the export control is used to control by the unlabeled data of first number
The final annotation results for stating at least partly unlabeled data export as the file of predetermined format,
The generating test set control, which is used to control from labeled data, concentrates the labeled data of selection predetermined number to obtain
Test set, the test set are used to test the mark accuracy rate of the marking model,
The initialization model control initializes the parameter of the marking model for controlling.
11. the method for claim 1, wherein the display interface further includes information bar region, the information bar region
Including for showing one or more regions in sample information, statistical information, accuracy rate information and shortcut key information,
In,
The sample information includes the sample for belonging to the unlabeled data of current classification to be marked;
The statistical information includes the number of labeled data, the number of unlabeled data, the labeled data for belonging to positive example
It is one or more in number, the number for the labeled data for belonging to negative example;
The accuracy rate information is used to indicate the accuracy rate of the marking model;
The shortcut key information is used to indicate preset shortcut key.
12. the method for claim 1, wherein the pre- markup information further includes data score, the display interface packet
It includes one or more in reversion control, filter controls, filtering threshold control and filtering number control, wherein
The reversion control is used to control the current annotation results for the unlabeled data that will be shown on the display interface by just
Example, which updates, to be negative example or is updated to positive example by negative example,
The filter controls are not marked for controlling the filtration fraction from the unlabeled data marked in advance using the marking model
Data are used to show using remaining unlabeled data as the unlabeled data of first number,
The filtering threshold control is for controlling for the mistake from the unlabeled data marked in advance using the marking model
The score threshold of unlabeled data is filtered,
The filtering number control is used to control to filter from the unlabeled data marked in advance using the marking model
The number of unlabeled data.
13. method as claimed in claim 12, wherein the filtering threshold control is slider control or Input.
14. the method for claim 1, wherein
The reception user includes to the first feedback information of the unlabeled data of first number:
Receive the toggling command for being directed to specific unlabeled data;
The final annotation results of the unlabeled data that first number is determined according to first feedback information include:
The current annotation results of the specific unlabeled data are updated by positive example and is negative example or positive example is updated to by negative example,
In, the final annotation results of the unlabeled data of first number are that the unlabeled data of first number terminates in mark
The current annotation results at moment.
15. method as claimed in claim 14, wherein the toggling command includes for where the specific unlabeled data
The left mouse button single-click operation of display area.
16. the method for claim 1, wherein
The reception user includes to the feedback information of the unlabeled data of first number:
Receive the illegal command for being directed to specific unlabeled data;
The final annotation results of the unlabeled data that first number is determined according to first feedback information include:
The specific unlabeled data is labeled as invalid data to obtain the current annotation results of the specific unlabeled data,
Wherein, the final annotation results of the unlabeled data of first number are that the unlabeled data of first number is tied in mark
The current annotation results at beam moment.
17. the method described in claim 16, wherein the illegal command includes for where the specific unlabeled data
The left mouse button double click operation of display area.
18. such as the described in any item methods of claim 1 to 17, wherein the display interface includes menu bar region, information bar
Region and tab area, the menu bar region are the upper area of the display interface, and the information bar region is described aobvious
Show that the left area in the lower area at interface, the tab area are the right area in the lower area.
19. a kind of data annotation equipment, including:
Module is obtained, for obtaining the unlabeled data and its pre- markup information of the first number, the pre- markup information is to utilize
Marking model carries out what pre- mark obtained to the unlabeled data of first number, and the pre- markup information includes pre- mark knot
Fruit;And
Display module, for show in the display interface first number unlabeled data and its pre- annotation results;
Receiving module, for receiving user to the first feedback information of the unlabeled data of first number;And
As a result determining module, the final mark of the unlabeled data for determining first number according to first feedback information
Infuse result.
20. a kind of data labeling system, including display device, processor and memory, wherein be stored with meter in the memory
Calculation machine program instruction, for executing as claim 1 to 18 is any when the computer program instructions are run by the processor
Data mask method described in.
21. a kind of storage medium stores program instruction on said storage, described program instruction is at runtime for holding
Row such as the described in any item data mask methods of claim 1 to 18.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810064918.0A CN108875769A (en) | 2018-01-23 | 2018-01-23 | Data mask method, device and system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810064918.0A CN108875769A (en) | 2018-01-23 | 2018-01-23 | Data mask method, device and system and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108875769A true CN108875769A (en) | 2018-11-23 |
Family
ID=64326003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810064918.0A Pending CN108875769A (en) | 2018-01-23 | 2018-01-23 | Data mask method, device and system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108875769A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109710933A (en) * | 2018-12-25 | 2019-05-03 | 广州天鹏计算机科技有限公司 | Acquisition methods, device, computer equipment and the storage medium of training corpus |
CN110263853A (en) * | 2019-06-20 | 2019-09-20 | 杭州睿琪软件有限公司 | The method and device of artificial client state is checked using error sample |
CN110378396A (en) * | 2019-06-26 | 2019-10-25 | 北京百度网讯科技有限公司 | Sample data mask method, device, computer equipment and storage medium |
CN111339325A (en) * | 2018-12-19 | 2020-06-26 | 财团法人工业技术研究院 | Data marking system and data marking method |
CN111859872A (en) * | 2020-07-07 | 2020-10-30 | 中国建设银行股份有限公司 | Text labeling method and device |
CN112163132A (en) * | 2020-09-21 | 2021-01-01 | 中国建设银行股份有限公司 | Data labeling method and device, storage medium and electronic equipment |
CN112446404A (en) * | 2019-09-04 | 2021-03-05 | 天津职业技术师范大学(中国职业培训指导教师进修中心) | Online image sample labeling system based on active learning, labeling method and application thereof |
CN113704650A (en) * | 2020-05-21 | 2021-11-26 | 阿里巴巴集团控股有限公司 | Information display method, device, system, equipment and storage medium |
CN113839953A (en) * | 2021-09-27 | 2021-12-24 | 上海商汤科技开发有限公司 | Labeling method and device, electronic equipment and storage medium |
CN115712745A (en) * | 2023-01-09 | 2023-02-24 | 荣耀终端有限公司 | User annotation data acquisition method and system and electronic equipment |
CN116385459A (en) * | 2023-03-08 | 2023-07-04 | 阿里巴巴(中国)有限公司 | Image segmentation method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090076989A1 (en) * | 2007-09-14 | 2009-03-19 | Accenture Global Service Gmbh | Automated classification algorithm comprising at least one input-invariant part |
CN104850832A (en) * | 2015-05-06 | 2015-08-19 | 中国科学院信息工程研究所 | Hierarchical iteration-based large-scale image sample marking method and system |
CN107067025A (en) * | 2017-02-15 | 2017-08-18 | 重庆邮电大学 | A kind of data automatic marking method based on Active Learning |
CN107153822A (en) * | 2017-05-19 | 2017-09-12 | 北京航空航天大学 | A kind of smart mask method of the semi-automatic image based on deep learning |
CN107492135A (en) * | 2017-08-21 | 2017-12-19 | 维沃移动通信有限公司 | A kind of image segmentation mask method, device and computer-readable recording medium |
-
2018
- 2018-01-23 CN CN201810064918.0A patent/CN108875769A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090076989A1 (en) * | 2007-09-14 | 2009-03-19 | Accenture Global Service Gmbh | Automated classification algorithm comprising at least one input-invariant part |
CN104850832A (en) * | 2015-05-06 | 2015-08-19 | 中国科学院信息工程研究所 | Hierarchical iteration-based large-scale image sample marking method and system |
CN107067025A (en) * | 2017-02-15 | 2017-08-18 | 重庆邮电大学 | A kind of data automatic marking method based on Active Learning |
CN107153822A (en) * | 2017-05-19 | 2017-09-12 | 北京航空航天大学 | A kind of smart mask method of the semi-automatic image based on deep learning |
CN107492135A (en) * | 2017-08-21 | 2017-12-19 | 维沃移动通信有限公司 | A kind of image segmentation mask method, device and computer-readable recording medium |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111339325A (en) * | 2018-12-19 | 2020-06-26 | 财团法人工业技术研究院 | Data marking system and data marking method |
CN109710933A (en) * | 2018-12-25 | 2019-05-03 | 广州天鹏计算机科技有限公司 | Acquisition methods, device, computer equipment and the storage medium of training corpus |
CN110263853A (en) * | 2019-06-20 | 2019-09-20 | 杭州睿琪软件有限公司 | The method and device of artificial client state is checked using error sample |
WO2020253741A1 (en) * | 2019-06-20 | 2020-12-24 | 杭州睿琪软件有限公司 | Method and device for checking status of manual client by using error samples |
CN110378396A (en) * | 2019-06-26 | 2019-10-25 | 北京百度网讯科技有限公司 | Sample data mask method, device, computer equipment and storage medium |
CN112446404A (en) * | 2019-09-04 | 2021-03-05 | 天津职业技术师范大学(中国职业培训指导教师进修中心) | Online image sample labeling system based on active learning, labeling method and application thereof |
CN113704650A (en) * | 2020-05-21 | 2021-11-26 | 阿里巴巴集团控股有限公司 | Information display method, device, system, equipment and storage medium |
CN111859872A (en) * | 2020-07-07 | 2020-10-30 | 中国建设银行股份有限公司 | Text labeling method and device |
CN112163132A (en) * | 2020-09-21 | 2021-01-01 | 中国建设银行股份有限公司 | Data labeling method and device, storage medium and electronic equipment |
CN112163132B (en) * | 2020-09-21 | 2024-05-10 | 中国建设银行股份有限公司 | Data labeling method and device, storage medium and electronic equipment |
CN113839953A (en) * | 2021-09-27 | 2021-12-24 | 上海商汤科技开发有限公司 | Labeling method and device, electronic equipment and storage medium |
CN115712745A (en) * | 2023-01-09 | 2023-02-24 | 荣耀终端有限公司 | User annotation data acquisition method and system and electronic equipment |
CN116385459A (en) * | 2023-03-08 | 2023-07-04 | 阿里巴巴(中国)有限公司 | Image segmentation method and device |
CN116385459B (en) * | 2023-03-08 | 2024-01-09 | 阿里巴巴(中国)有限公司 | Image segmentation method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108875769A (en) | Data mask method, device and system and storage medium | |
CN108875768A (en) | Data mask method, device and system and storage medium | |
CN110750959B (en) | Text information processing method, model training method and related device | |
CN108363790A (en) | For the method, apparatus, equipment and storage medium to being assessed | |
CN107111608A (en) | Automatic generation of N-grams and concept relationships from linguistic input data | |
CN108647205A (en) | Fine granularity sentiment analysis model building method, equipment and readable storage medium storing program for executing | |
CN103870001B (en) | A kind of method and electronic device for generating candidates of input method | |
CN109461157A (en) | Image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field | |
CN109471945A (en) | Medical file classification method, device and storage medium based on deep learning | |
CN110249341A (en) | Classifier training | |
CN109101469A (en) | The information that can search for is extracted from digitized document | |
CN109657204A (en) | Use the automatic matching font of asymmetric metric learning | |
CN103534697B (en) | For providing the method and system of statistics dialog manager training | |
CN107818491A (en) | Electronic installation, Products Show method and storage medium based on user's Internet data | |
CN107609563A (en) | Picture semantic describes method and device | |
Jamalpur et al. | Machine learning intersections and challenges in deep learning | |
CN109816438A (en) | Information-pushing method and device | |
Ma et al. | UniTranSeR: A unified transformer semantic representation framework for multimodal task-oriented dialog system | |
CN109154945A (en) | New connection based on data attribute is recommended | |
CN108536784A (en) | Comment information sentiment analysis method, apparatus, computer storage media and server | |
CN109740515A (en) | One kind reading and appraising method and device | |
CN110309114A (en) | Processing method, device, storage medium and the electronic device of media information | |
CN110399547A (en) | For updating the method, apparatus, equipment and storage medium of model parameter | |
CN112837466B (en) | Bill recognition method, device, equipment and storage medium | |
CN110490237A (en) | Data processing method, device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181123 |