CN109977822A - Data supply method, model training method, device, system, equipment and medium - Google Patents
Data supply method, model training method, device, system, equipment and medium Download PDFInfo
- Publication number
- CN109977822A CN109977822A CN201910197522.8A CN201910197522A CN109977822A CN 109977822 A CN109977822 A CN 109977822A CN 201910197522 A CN201910197522 A CN 201910197522A CN 109977822 A CN109977822 A CN 109977822A
- Authority
- CN
- China
- Prior art keywords
- data
- video
- training
- model
- distributed storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/192—Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
- G06V30/194—References adjustable by an adaptive method, e.g. learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
The invention discloses a kind of data supply method, model training method, device, system, equipment and media.Wherein, which includes: to obtain the train request for being directed to Video Model, and the train request includes preset batch processing mechanism and this trains corresponding Data Identification;Matched target video data is obtained in distributed storage data set according to the Data Identification, the distributed storage data set includes all types of video data;The target video data is handled according to the batch processing mechanism, obtains the corresponding training data of the Video Model.Technical solution provided in an embodiment of the present invention, is directly trained video data, and the memory space of required occupancy is small, and the time spent needed for reading video data is few, improves the training effectiveness of Video Model.
Description
Technical field
The present embodiments relate to video field more particularly to a kind of data supply method, model training method, device,
System, equipment and medium.
Background technique
It is directed to the training of Video Model at present, generally first obtains video data, is corresponding picture by Digital video resolution
(i.e. video frame) and audio-frequency information, are stored as data file for picture and audio-frequency information respectively, in the process of training video model
One of in, at least in the following way: it is trained and from audio data file from picture is read in picture data files
Audio-frequency information is read to be trained.
Using the training data supply mode of existing Video Model: if directly storing video data, required occupancy
Memory space is larger, and the memory space occupied needed for picture data files and audio data file is than corresponding video data
Memory space is bigger, and therefore, it is necessary to occupy the more memory spaces of trained equipment.In addition, due to picture data files and audio
The data volume of data file is very big, in the training process, needs to spend more time for reading picture and audio-frequency information extremely
It is one of few, so that the efficiency of training is lower.
Summary of the invention
The embodiment of the invention provides a kind of data supply method, model training method, device, system, equipment and medium,
Improve the training effectiveness of Video Model.
In a first aspect, the embodiment of the invention provides a kind of data supply methods, this method comprises:
The train request for being directed to Video Model is obtained, the train request includes preset batch processing mechanism and this
The corresponding Data Identification of training;
Matched target video data is obtained in distributed storage data set according to the Data Identification, the distribution is deposited
Storing up data set includes all types of video datas;
The target video data is handled according to the batch processing mechanism, obtains the corresponding instruction of the Video Model
Practice data.
Second aspect, the embodiment of the invention provides a kind of model training methods, this method comprises:
According to the data supply method in first aspect, the corresponding training data of Video Model is obtained;
The training data is inputted into the Video Model, the Video Model after being trained.
The third aspect, the embodiment of the invention provides a kind of data supply device, which includes:
Train request obtains module, and for obtaining the train request for being directed to Video Model, the train request includes preparatory
The batch processing mechanism of setting and this corresponding Data Identification of training;
Target data obtains module, for obtaining matched target in distributed storage data set according to the Data Identification
Video data, the distributed storage data set include all types of video datas;
Training data determining module is obtained for being handled according to the batch processing mechanism the target video data
To the corresponding training data of the Video Model.
Fourth aspect, the embodiment of the invention provides a kind of model training apparatus, which includes:
Training data obtains module, for it is corresponding to obtain Video Model according to the data supply method in first aspect
Training data;
Video Model training module, for the training data to be inputted the Video Model, the video after being trained
Model.
5th aspect, the embodiment of the invention provides a kind of data feed system, which includes: Distributed Storage
End, batch loading end and the data supply side being connect respectively with Distributed Storage end and batch loading end;The distribution number
According to storage end distributed storage storing data collection;Described batch of loading end stores batch processing mechanism, and generates train request;The number
It is arranged according to supply side just like the data supply device in the third aspect.
6th aspect, the embodiment of the invention provides a kind of model training systems, which includes: Distributed Storage
End, batch loading end and the model training end being connect respectively with Distributed Storage end and batch loading end;The distribution number
According to storage end distributed storage storing data collection;Described batch of loading end stores batch processing mechanism, and generates train request;The mould
Type training end is arranged just like the model training apparatus in fourth aspect.
7th aspect, the embodiment of the invention provides a kind of equipment, which includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes data supply method described in first aspect present invention, or realizes the instruction of model described in second aspect of the present invention
Practice method.
Eighth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey
Sequence realizes data supply method described in first aspect present invention, or realizes the present invention when program is executed by processor
Model training method described in second aspect.
The embodiment of the invention provides a kind of data supply method, model training method, device, system, equipment and medium,
Matched target video data is obtained in distributed storage data set by the Data Identification in train request, while according to pre-
The batch processing mechanism first set handles the target video data, and data processing function is set without taking a significant amount of time
Can, obtain the corresponding training data of Video Model to be trained, compared with the existing technology in picture or audio-frequency information are carried out
Trained Video Model training method is directly trained video data using the technical solution of the embodiment of the present invention, required
The memory space of occupancy is small, and the time spent needed for reading video data is few, improves the training effectiveness of Video Model.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, of the invention other
Feature, objects and advantages will become more apparent upon:
Figure 1A is a kind of flow chart for data supply method that the embodiment of the present invention one provides;
Figure 1B is the original block diagram that the data that the embodiment of the present invention one provides supply process;
Fig. 2A is a kind of flow chart of data supply method provided by Embodiment 2 of the present invention;
Fig. 2 B is the schematic illustration that a kind of data provided by Embodiment 2 of the present invention supply process;
Fig. 3 A is a kind of flow chart for model training method that the embodiment of the present invention three provides;
Fig. 3 B is the schematic illustration for the model training process that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural schematic diagram for data supply device that the embodiment of the present invention four provides;
Fig. 5 is a kind of structural schematic diagram for model training apparatus that the embodiment of the present invention five provides;
Fig. 6 is a kind of schematic illustration for data feed system that the embodiment of the present invention six provides;
Fig. 7 is a kind of schematic illustration for model training systems that the embodiment of the present invention seven provides;
Fig. 8 is a kind of structural schematic diagram for equipment that the embodiment of the present invention eight provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.In addition, in the absence of conflict, this
The feature in embodiment and embodiment in invention can be combined with each other.
Embodiment one
Figure 1A is a kind of flow chart for data supply method that the embodiment of the present invention one provides, one kind provided in this embodiment
Data supply method can be executed by data supply device provided in an embodiment of the present invention, the device can by software and/
Or the mode of hardware is realized, and is integrated in the equipment for executing this method, which can be carries at corresponding data
Any intelligent terminal of reason ability.
Specifically, this method may include steps of with reference to Figure 1A:
S110 obtains the train request for being directed to Video Model.
Wherein, train request includes preset batch processing mechanism and this trains corresponding Data Identification.Specifically,
It, being capable of simulative neural network behavior by depth learning technology with extensive use of the neural network model in terms of data processing
Feature carries out information processing, to reach video processing intent, constructs with self study and adaptive all kinds of videos processing
The Video Model of function, the Video Model in the present embodiment can be it is any can be by the network parameter and nerve of building
Meta structure executes the neural network model of corresponding identification or classification feature to a certain video, such as disobeys to whether there is in video
The video audit model etc. that rule content is judged.When constructing Video Model, need a large amount of training data to initial setting
Neural network model be iterated training so that the Video Model after training can accurately reach video for any video
The purpose of processing, therefore the train request in the present embodiment can serve to indicate that for Video Model to be trained, and need in advance
Obtain training data of the Video Model in subsequent training process.
Optionally, since the training demand of Video Model is different, being directed at this time includes needle in the train request of Video Model
The batch processing that the training data under batch should meet is corresponded to when preset each repetitive exercise to different training missions
Mechanism, and train for this Data Identification of corresponding training data.It wherein, include according to wait train in batch processing mechanism
The corresponding training mission of Video Model it is different and the different numbers of training data under batch are corresponded in the corresponding repetitive exercise that sets
According to composition requirement;Data Identification is the mark for referring to uniquely indicate that this trains required training data, in the present embodiment
Data Identification can be the uniform resource locator (Uniform Resource Locator, URL) of video data, which can
It to be used to indicate the file address that the video data in local or network is stored, while also including when obtaining the video data
The information such as the corresponding agreement met and path.
Specifically, needing to be trained a certain Video Model, when it being made to have corresponding video processing function, user
Corresponding train request can be generated by executing corresponding training operation, training operation can be selection and participate in this instruction
The Data Identification of experienced training data to generate corresponding identification list, and sets the batch processor that this training should meet
System, and then this training is generated according to the identification list and the batch processing mechanism and is asked for the training of Video Model to be trained
It asks, so that subsequent obtain participates in this training data trained.
Illustratively, " the criticizing " in the present embodiment in batch processing mechanism refers to the batch in machine learning training, that is,
In an iteration training, the corresponding whole training datas participated in;At this time according to the difference of training mission, under a batch
The composition of training data has different requirements;Such as common visual classification training, it is desirable that carry various labels in batch
The quantity of video data balances as far as possible;Video training for pairs of (pair-wise), it is desirable that the video data inside batch
Occur in pairs;Training with loss function (triplet-loss), then requiring the video data inside batch is three one
What group occurred;The sequence for having some training missions that can load to training data simultaneously requires, other training missions example in distress
The demand of excavation, can be dynamically according to the composition of training result adjusting training collection;It is equal for these different training demands at this time
It can be set in batch processing mechanism in the present embodiment.
In addition, the functional block diagram that data supply in the present embodiment is as shown in Figure 1B, user can execute on batch loading end
The corresponding trigger action of model training participates in the Data Identification of the video data of this training by choosing on batch loading end,
And corresponding identification list is generated, while obtaining the batch processing mechanism that this training is chosen, it is generated jointly with the identification list pair
The train request answered is asked so that data supply side be enable to obtain this for Video Model training generated to be trained
It asks.
S120 obtains matched target video data in distributed storage data set according to Data Identification.
Wherein, distributed storage data set includes all types of video data;Specifically, in order to improve the spirit of data supply
Activity, distributed storage data set can support storage and the reading manner of various video data, and distribution is deposited in the present embodiment
Storage data set may include the video data that local disk is stored according to single file mode, the video that will allow to participate in training
Data are gathered into a training data packet by way of packing and then are stored in the video data of local disk, using distribution
The video data that data storage protocols carry out multiterminal storage (is such as stored in Hadoop distributed file system (Hadoop
Distributed Filesystem, HDFS) on single video data or video data packet) and pass through network data
Video data (the video counts stored on the arbitrary network address such as by URL access that agreement is stored on arbitrary network address
According to, may include interconnect Web realease video data, be buffered to content distributing network (Content Distribution
Network, CDN) on video data, be uploaded to open source distributed file system (Fast Distributed File
System, FastDFS) on video data and same local area network in open hypertext transfer protocol (Hyper Text
Transport Protocol, HTTP) service the shared video data etc. of each server).
Optionally, it is contemplated that it is different for the training demand of different video model, it is required in Video Model training
Training data format is also different, and the distributed storage data set of the present embodiment can provide two different data store strategies,
One is the storage strategies accessed according to single video sample (including the single video data stored in local disk or network
In URL access etc.), another kind is that multiple video samples are carried out with the storage strategy of packing access (including to deposit in local disk
The video data packet of storage or the video data packet etc. being stored in HDFS/FastDFS distributed file system).Further
, the storage for single video data, can store in distributed storage data set the URL of the single video data, filename,
The videos such as data label and other additional informations are associated with content, provide corresponding instruction to Video Model to be trained so as to subsequent
When practicing data, all kinds of relevant informations for participating in the video data of training can be obtained;It is additionally provided in the present embodiment simultaneously a set of
It then can be using in the present embodiment in video data storage if there is the demand of packing for the packing program of video data
The set provided is packaged program and is packaged all kinds of content informations of corresponding video data correlation, and then with video data packet
Form stores the corresponding position into distributed storage data set.In addition, different data store strategies has different advantages,
The mode of single video sample access can provide the random access of video data, be suitble to dynamic generation data, or to data
The more demanding occasion of sequence randomness;And the characteristics of being packaged access mode is that data reading speed is fast, can overcome random visit
Ask distributed storage data set bring data input/output delay issue;User is in batch loading end by holding in the present embodiment
When the corresponding trigger action of row generates train request, it can adapt to select according to the training mission of this Video Model to be trained
The video data under different storage strategies is taken, to improve the flexibility of training data acquisition;Number in the present embodiment at this time
The mark that can be single video data according to mark is also possible to the packet mark of the video data packet after being packaged.
Specifically, the present embodiment is being got for when the train request for the Video Model trained, it can be to the training
Request is parsed, and the preset batch processing mechanism for being suitble to this training mission for including in the train request and right is obtained
The Data Identification answered;And then according to including the Data Identification for participating in all videos data of this training in train request, dividing
Cloth storing data, which is concentrated, obtains matched target video data;The target video data is then the view for participating in this training at this time
Frequency evidence includes the videos association content such as filename, label and other additional informations of corresponding storage.
Illustratively, as shown in Figure 1B, the storage location in the distributed storage data set in the present embodiment may include
Local file system, CDN cluster, HDFS cluster and FastDFS cluster, wherein CDN cluster, FastDFS cluster, HDFS collection
Group is interspersion with the training server cluster where respectively Video Model to be trained, all containing hundreds of G in every server
Disk array (the Redundant Arrays of of the trained video card of memory size, several piece support model, tens of T capacity
Independent Disks, RAID), the central processing unit (Central Processing Unit, CPU) of tens of cores, and
10,000,000,000 network connections are used between each server;The data frequently accessed at this time can be cached in memory, in this way in training,
Hard disk input/output can farthest be reduced, promote the reading speed of video data, be effectively utilized the memory of server
And hard disk resources;Carry out centrally stored video data using distributed storage mode simultaneously, distributed training can be made not have to mention
Preceding copy training data, data preparation stage when acceleration model is trained.
S130 is handled target video data according to batch processing mechanism, obtains the corresponding training data of Video Model.
Specifically, training corresponding Data Identification to get matched mesh in distributed storage data set according to this
When marking video data, the target video data can be carried out according to the preset batch processing mechanism carried in train request
Corresponding batch processing;Specifically, open transmission control protocol (Transmission Control can be passed through in the present embodiment
Protocol, TCP) port receives the corresponding all kinds of videos of target video data under corresponding batch and is associated with contents, and by the batch
Under target video data be grouped processing according to the packet mode of training mission demand, and be loaded onto memory, so
To the corresponding training data of Video Model to be trained;When the subsequent training to Video Model, training data can be carried out corresponding
Decoding and pretreatment operation, training data is converted into the specified format of Video Model to be trained, so as to subsequent training.
Using scheme provided in this embodiment treat trained Video Model be trained data supply when, as long as according to this
The corresponding Data Identification of secondary training can obtain corresponding target video data, beat without being in advance downloaded video data
Packet processing does not need for the video data for participating in training to be transmitted to all participation training when carrying out distributed training yet
Machine on, substantially reduce the time of the training data of model training;It is stored in distributed storage data set simultaneously
Video data itself be it is compressed, downloaded from distributed storage data set according to Data Identification target video data to
When memory, the bandwidth resources of trained equipment itself will not be largely occupied, are also not take up the disk input/output resource of itself, after
That the decoding and pretreatment of continuous training data mainly occupy is also the CPU of itself, can be with occupancy image processing unit
The Video Model training parallel processing of (graphics processing unit, GPU), is waiting number without the ancillary cost time
Above Data preprocess, accordingly reduce the time of Video Model training itself, the hardware greatly improved in trained equipment utilizes
Rate.Very time-consuming data preparation and data prediction are standardized simultaneously, and provide a set of flexible customized batch of life
At interface, algorithm engineering teacher can be allowed to be absorbed in the improvement of Video Model or the improvement of training method, it is not necessary to when spending a large amount of
Between processing data on;And it the characteristics of training method can make Video Model more be bonded business datum end to end, can
Better model is trained, the flexibility of Video Model training is improved.
Technical solution provided in this embodiment is obtained in distributed storage data set by the Data Identification in train request
Matched target video data is taken, while the target video data is handled according to preset batch processing mechanism, nothing
It need to take a significant amount of time and obtain the corresponding training data of Video Model to be trained to set data processing function, directly to view
For frequency according to being trained, the memory space of required occupancy is small, and the time spent needed for reading video data is few, improves video screen module
The training effectiveness of type.
Embodiment two
Fig. 2A is a kind of flow chart of data supply method provided by Embodiment 2 of the present invention, and Fig. 2 B is the embodiment of the present invention
A kind of schematic illustration of the two data supply processes provided.It is in technical solution provided by the above embodiment in the present embodiment
On the basis of optimize.Specifically, mainly to the tool for obtaining target video data in distributed storage data set in the present embodiment
Body process carries out detailed explanation.
Optionally, as shown in Figure 2 A, it may include steps of in the present embodiment:
S210, obtain be directed to Video Model train request, the train request include preset batch processing mechanism and
This trains corresponding Data Identification.
S220 determines the type of Data Identification, if Data Identification is single video mark, is being divided according to single video mark
Cloth storing data collection obtains matched single video data;If Data Identification is to be packaged video identifier, regarded according to being packaged
Frequency marking, which is known, obtains matched packing video data in distributed storage data set.
Specifically, video access two is accessed and is packaged according to single video sample due to existing in distributed storage data set
The different data store strategy of kind, the video counts stored using different storage strategies can be chosen according to training mission difference
According to Data Identification, therefore when being parsed to obtain the Data Identification for participating in the video data of this training to train request,
Firstly the need of the type for judging the Data Identification, at this time if Data Identification is single video mark, directly according to haplopia frequency marking
Knowledge obtains matched single video data in distributed storage data set, including what is stored in local disk or arbitrary network
Single video data;If Data Identification is to be packaged video identifier, directly according to packing video identifier in distributed storage number
Matched packing video data is obtained according to concentrating, including is deposited in local disk or HDFS/FastDFS distributed file system
The video data packet etc. of storage;And then obtain the matched target video data for participating in this training.
Optionally, the centrally stored video data of distributed storage data includes local disk or distributed file system
The interior video resource of middle storage further includes the external video resource stored on arbitrary network, therefore is being divided according to Data Identification
Cloth storing data collection obtains matched target video data, comprising: if target video data is interior video data, root
Matched target video data is obtained in distributed storage data set according to the Data Identification;If target video data is outside
Video data, then distributed storage data set exists after external network obtains matched target video data according to Data Identification
Distributed storage data set obtains matched target video data.
Specifically, judging that matched target video data is according to Data Identification first when obtaining target video data
It is no be interior video data, can under distributed storage data set local file system, CDN cluster, HDFS cluster and
Inquiry whether there is corresponding target video number in FastDFS cluster;If it exists, then illustrate the target video data for internal view
Frequency evidence directly obtains matched target video data in distributed storage data set at this time;If it does not exist, then illustrate the mesh
Mark video data is external video data, to avoid frequently accessing external video bring campus network expense, improves access speed
Degree, as shown in Figure 2 B, distributed storage data set can obtain in corresponding external network matched according to Data Identification at this time
Target video data, and store into distributed storage data set, and the target video data is sent to corresponding data and is supplied
It is downloaded to end, so that data supply side obtains matched target video data in external network in distributed storage data set
Afterwards, matched target video data is obtained in distributed storage data set according to Data Identification.Specifically, outer in the present embodiment
Portion's video data is stored in the caching of distributed storage data set.
Illustratively, when the target video data that access obtains is the video data of public network, for same outer
Portion's video data obtains target video data from source address by distributed storage data set when accessing for the first time, and stores under it
CDN caching in, it is subsequent need to access the target video data again when, can be directly from distributed storage data set
It is read in CDN caching, while saving downloading flow, greatly improves the speed of download of target video data.In addition, this reality
The video data being downloaded in local disk in example is applied, can be uploaded to dedicated in distributed storage data set
In FastDFS, so that obtaining target view on the slave FastDFS that each training machine can be unified when carrying out distributed training
Frequency evidence, existing expense when avoiding multitude of video data copy to each training machine.
Optionally, the data storage method in the present embodiment need not be confined to HDFS, FastDFS and NFS, divide as long as meeting
Cloth data storage protocols or network data agreement, while data buffer storage network need not also be confined to CDN network, it is any
Agreement with data buffer storage and load balancing can use.
S230 is handled target video data according to batch processing mechanism, obtains the corresponding training data of Video Model.
Technical solution provided in this embodiment is identified and is packaged using single video the two different modes of video identifier and dividing
Cloth storing data, which is concentrated, obtains matched target video data, is adapted to different training missions, improves training data and obtain
The flexibility taken, while external video data being cached in distributed storage data set, improve target video data
Speed of download, improve the training effectiveness of Video Model.
Embodiment three
Fig. 3 A is a kind of flow chart for model training method that the embodiment of the present invention three provides, and the present embodiment can be applied to appoint
In the case where a kind of pair of Video Model is trained.A kind of model training method provided in this embodiment can be implemented by the present invention
The model training apparatus that example provides executes, which can be realized by way of software and/or hardware, and be integrated in and hold
In the equipment of row this method, which can be any intelligent terminal for carrying corresponding data-handling capacity.
Optionally, the present embodiment may include steps of:
S310 obtains the corresponding training data of Video Model according to above-mentioned data supply method.
Specifically, above-mentioned data supply method is the data supply method provided in any other embodiment of the present invention, this
Using the data supply method in above-described embodiment in embodiment, the corresponding trained number of Video Model to be trained can be obtained
According to, and have the identical beneficial effect of the data supply method in above-described embodiment.
S320, the Video Model by training data input video model, after being trained.
Optionally, after getting the corresponding training data of Video Model, which can be directly inputted to
In trained Video Model, and the Video Model is trained using existing neural network training method, after being trained
Video Model, enable training after Video Model corresponding video can be realized accurately for any video data
Processing intent.
Illustratively, as shown in Figure 3B, training data input video model can be specifically included: is adopted in the present embodiment
Training data is decoded with multithreading;Pre-process decoded training data;By pretreated training data input video model.
Specifically, the present embodiment after obtaining the corresponding training data of Video Model, which can be loaded onto
The memory of training machine, and using multithreading by training data be decoded into the matched specified format of Video Model, so as to subsequent
It is trained;Decoded training data is pre-processed simultaneously, which may include applying number to training data
It is handled according to enhancing, and is converted into the format of Video Model needs;And then pretreated training data is input to be trained
It is trained in Video Model, with the Video Model after being trained.Wherein, decoded video data class is supported in the present embodiment
Type includes that common all videos and audio file formats, decoding process can choose CPU and GPU, it is possible to specify from video counts
Any position in starts to decode, and supports to come with specified transmission frame number per second (Frames Per Second, FPS) defeated
Decoding frame out, while the video flowing and audio stream for supporting decoded output to be aligned can provide outside video frame and audio stream, additionally
There is provided video data whether original FPS, frame be wide, vertical frame dimension, video playing duration, code rate and containing audio stream and every frame
The information such as Presentation Time Stamp (Presentation Time Stamp, PTS);And it is carried out after being decoded to training data pre-
Processing, it is therefore an objective to RGB image format and pulse code modulation (the Pulse Code that will include in decoded training data
Modulation, PCM) audio stream do some data enhancement operations, be then converted into Video Model needs format, regard at this time
The pretreatment of frequency frame may include common random cropping, random brightness, random contrast and random scaling etc.;Audio stream
Pretreatment may include stochastic gain transformation, turns log spectrum, turn Meier frequency spectrum, random interception selections and with specified energy
Ratio is superimposed two section audios etc.;Random FPS conversion function is supported in decoding, while can be according to specified dimension after data enhancing
Degree arrangement mode will export after data transposition, obtain the training data for meeting Video Model training requirement.Further, this implementation
Example can be supported treated training data after training data is decoded and is pre-processed with the format of Numpy array
Output, and then meet the input requirements of the Video Model of mainstream;In addition the Video Model of GPU operation interface is disclosed for some
(such as mxnet), the present embodiment are also supported directly to load treated training data into GPU with its format needed, this
Operation and training are parallel, can save in Video Model training and training data is waited to load the time into GPU;In the present embodiment
All calculating and input/output be all it is parallel, the resource of training machine can be made full use of, it is maximized to promote processing speed
Degree.
The present embodiment to the video frame and audio stream for including in video data while can carry out respective handling, at this time can be with
It trains that index is higher, the better Video Model of performance based on synchronous video frame and audio-frequency information, simplifies multi-modal video
The training operation of model;The audio data processing mode in the present embodiment supports numerous common audio preprocess methods simultaneously,
It can be compatible with presently disclosed most of audio processing mode, simply specify several parameters can be so that data supplying module
Output meets the data that open source mode input requires, and greatlies simplify the operating procedure of verifying open source model performance.
Technical solution provided in this embodiment, the data supply method provided through the foregoing embodiment obtain Video Model
Corresponding training data, and the training data is inputted in Video Model to be trained and is trained, guarantee the instruction of Video Model
Practice efficiency, improves the performance of Video Model.
Example IV
Fig. 4 is a kind of structural schematic diagram for data supply device that the embodiment of the present invention four provides, specifically, such as Fig. 4 institute
Show, the apparatus may include:
Train request obtains module 410, and for obtaining the train request for being directed to Video Model, which includes preparatory
The batch processing mechanism of setting and this corresponding Data Identification of training;
Target data obtains module 420, for obtaining matched target in distributed storage data set according to Data Identification
Video data, the distributed storage data set include all types of video datas;
Training data determining module 430 obtains video for handling according to batch processing mechanism target video data
The corresponding training data of model.
Technical solution provided in this embodiment is obtained in distributed storage data set by the Data Identification in train request
Matched target video data is taken, while the target video data is handled according to preset batch processing mechanism, nothing
It need to take a significant amount of time and obtain the corresponding training data of Video Model to be trained to set data processing function, directly to view
For frequency according to being trained, the memory space of required occupancy is small, and the time spent needed for reading video data is few, improves video screen module
The training effectiveness of type.
Further, above-mentioned target data obtains module 420, can be specifically used for:
It is matched in the acquisition of distributed storage data set according to single video mark if Data Identification is single video mark
Single video data;
If Data Identification is to be packaged video identifier, according to packing video identifier in the acquisition of distributed storage data set
The packing video data matched.
Further, above-mentioned target data obtains module 420, can also be specifically used for:
If target video data is interior video data, according to Data Identification in the acquisition of distributed storage data set
The target video data matched;
If target video data is external video data, distributed storage data set obtains matched in external network
After target video data, matched target video data is obtained in distributed storage data set according to Data Identification.
Further, said external video data is stored in the caching of distributed storage data set.
Further, above-mentioned batch processing mechanism includes the packet mode to target video data.
Data supply device provided in this embodiment is applicable to the data supply method that above-mentioned any embodiment provides, tool
Standby corresponding function and beneficial effect.
Embodiment five
Fig. 5 is a kind of structural schematic diagram for model training apparatus that the embodiment of the present invention five provides, specifically, such as Fig. 5 institute
Show, the apparatus may include:
Training data obtains module 510 and obtains video for the data supply method in any embodiment according to the present invention
The corresponding training data of model;
Video Model training module 520, for the Video Model by training data input video model, after being trained.
Technical solution provided in this embodiment, the data supply method provided through the foregoing embodiment obtain Video Model
Corresponding training data, and the training data is inputted in Video Model to be trained and is trained, guarantee the instruction of Video Model
Practice efficiency, improves the performance of Video Model.
Further, above-mentioned Video Model training module 520, can be specifically used for:
Training data is decoded using multithreading;
Pre-process decoded training data;
By pretreated training data input video model.
Model training apparatus provided in this embodiment is applicable to the model training method that above-mentioned any embodiment provides, tool
Standby corresponding function and beneficial effect.
Embodiment six
Fig. 6 is a kind of schematic illustration for data feed system that the embodiment of the present invention six provides.It is main in the present embodiment
It is described in detail for the training data supply process of Video Model.Referring to Fig. 6, the data feed system 60 of the present embodiment
May include Distributed Storage end 610, batch loading end 620 and respectively with Distributed Storage end 610 and batch load
The data supply side 630 of 620 connection of end.
Wherein, distributed storage storing data collection in Distributed Storage end 610;It criticizes in loading end 620 at storage batch
Reason mechanism, and generate train request;Data supply side 630 is provided with the data supply device of any embodiment of that present invention offer.
Specifically, the Distributed Storage end 610 for including in data feed system 60, batch loading end 620 and data
The building principle of supply side 630 is not made in detail herein referring in particular to the description in data supply method provided in an embodiment of the present invention
It describes in detail bright.
Embodiment seven
Fig. 7 is a kind of schematic illustration for model training systems that the embodiment of the present invention seven provides.It is main in the present embodiment
It is described in detail for the training data supply process of Video Model.Referring to Fig. 7, the model training systems 70 of the present embodiment
May include Distributed Storage end 710, batch loading end 720 and respectively with Distributed Storage end 710 and batch load
The model training end 730 of 720 connection of end.
Wherein, distributed storage storing data collection in Distributed Storage end 710;It criticizes in loading end 720 at storage batch
Reason mechanism, and generate train request;Model training end 730 is provided with the model training apparatus of any embodiment of that present invention offer.
Specifically, the Distributed Storage end 710 for including in model training systems 70, batch loading end 720 and model
The building principle at training end 730 is not made in detail herein referring in particular to the description in model training method provided in an embodiment of the present invention
It describes in detail bright.
Embodiment eight
Fig. 8 is a kind of structural schematic diagram for equipment that the embodiment of the present invention eight provides, as shown in figure 8, the equipment includes place
Manage device 80, storage device 81 and communication device 82;The quantity of processor 80 can be one or more in equipment, with one in Fig. 8
For a processor 80;Processor 80, storage device 81 and communication device 82 in equipment can pass through bus or other modes
It connects, in Fig. 8 for being connected by bus.
Storage device 81 is used as a kind of computer readable storage medium, and it is executable to can be used for storing software program, computer
Program and module, as the corresponding program of data supply method or model training method provided in the embodiment of the present invention refers to
Order/module.Software program, instruction and the module that processor 80 is stored in storage device 81 by operation, thereby executing setting
Standby various function application and data processing, that is, realize above-mentioned data supply method or model training method.
Storage device 81 can mainly include storing program area and storage data area, wherein storing program area can store operation
Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to terminal.
It can also include nonvolatile memory in addition, storage device 81 may include high-speed random access memory, for example, at least one
A disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, storage device 81 can
It further comprise the memory remotely located relative to processor 80, these remote memories can be by network connection to setting
It is standby.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Communication device 82 can be used for realizing the network connection or mobile data cube computation of equipment room.
A kind of equipment provided in this embodiment can be used for executing the data supply method that above-mentioned any embodiment provides or
Model training method has corresponding function and beneficial effect.
Embodiment nine
The embodiment of the present invention nine additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should
Program can realize the data supply method in above-mentioned any embodiment when being executed by processor.This method can specifically include:
The train request for being directed to Video Model is obtained, which includes preset batch processing mechanism and this instruction
Practice corresponding Data Identification;
Matched target video data, the distributed storage data are obtained in distributed storage data set according to Data Identification
Collection includes all types of video data;
Target video data is handled according to batch processing mechanism, obtains the corresponding training data of Video Model.
Alternatively, realizing the model training method in above-mentioned any embodiment, this method be can specifically include:
Data supply method in any embodiment according to the present invention obtains the corresponding training data of Video Model;
Video Model by training data input video model, after being trained.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention
Data supply provided by any embodiment of the invention can also be performed in the method operation that executable instruction is not limited to the described above
Relevant operation in method or model training method.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention
It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more
Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art
Part can be embodied in the form of software products, which can store in computer readable storage medium
In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer
Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set
Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
It is worth noting that, included is each in above-mentioned data supply device or the embodiment of model training apparatus
Unit and module are only divided according to the functional logic, but are not limited to the above division, as long as can be realized corresponding
Function;In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, it is not intended to restrict the invention
Protection scope.
The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art
For, the invention can have various changes and changes.All any modifications made within the spirit and principles of the present invention are equal
Replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (13)
1. a kind of data supply method characterized by comprising
The train request for being directed to Video Model is obtained, the train request includes preset batch processing mechanism and this training
Corresponding Data Identification;
Matched target video data, the distributed storage number are obtained in distributed storage data set according to the Data Identification
It include all types of video datas according to collection;
The target video data is handled according to the batch processing mechanism, obtains the corresponding trained number of the Video Model
According to.
2. the method according to claim 1, wherein it is described according to the Data Identification in distributed storage data
Collection obtains matched target video data, comprising:
If the Data Identification is single video mark, according to single video mark in the acquisition of distributed storage data set
The single video data matched;
If the Data Identification is to be packaged video identifier, obtained according to the packing video identifier in distributed storage data set
Take matched packing video data.
3. method according to claim 1 or 2, which is characterized in that it is described according to the Data Identification in distributed storage
Data set obtains matched target video data, comprising:
If the target video data is interior video data, obtained according to the Data Identification in distributed storage data set
Take matched target video data;
If the target video data is external video data, the distributed storage data set is in external network acquisition
After the target video data matched, matched target video data is obtained in distributed storage data set according to the Data Identification.
4. according to the method described in claim 3, it is characterized in that, the external video data is stored in the distributed storage
In the caching of data set.
5. method according to claim 1 or 2, which is characterized in that the batch processing mechanism includes to the target video
The packet mode of data.
6. a kind of model training method characterized by comprising
Data supply method according to any one of claims 1 to 5 obtains the corresponding training data of Video Model;
The training data is inputted into the Video Model, the Video Model after being trained.
7. according to the method described in claim 6, it is characterized in that, described input the Video Model for the training data,
Include:
The training data is decoded using multithreading;
Pre-process decoded training data;
Pretreated training data is inputted into the Video Model.
8. a kind of data supply device characterized by comprising
Train request obtains module, and for obtaining the train request for being directed to Video Model, the train request includes presetting
Batch processing mechanism and this train corresponding Data Identification;
Target data obtains module, for obtaining matched target video in distributed storage data set according to the Data Identification
Data, the distributed storage data set include all types of video datas;
Training data determining module obtains institute for handling according to the batch processing mechanism the target video data
State the corresponding training data of Video Model.
9. a kind of model training apparatus characterized by comprising
Training data obtains module, is used for data supply method according to any one of claims 1 to 5, obtains video screen module
The corresponding training data of type;
Video Model training module, for the training data to be inputted the Video Model, the Video Model after being trained.
10. a kind of data feed system characterized by comprising Distributed Storage end, batch loading end and respectively with point
The data supply side that cloth the data storage end is connected with batch loading end;
The Distributed Storage end distributed storage storing data collection;Described batch of loading end stores batch processing mechanism, and raw
At train request;The data supply side is provided with data supply device as claimed in claim 8.
11. a kind of model training systems characterized by comprising Distributed Storage end, batch loading end and respectively with point
The model training end that cloth the data storage end is connected with batch loading end;
The Distributed Storage end distributed storage storing data collection;Described batch of loading end stores batch processing mechanism, and raw
At train request;The model training end is provided with model training apparatus as claimed in claim 9.
12. a kind of equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as data supply method as claimed in any one of claims 1 to 5, or the realization instruction of the model as described in claim 6 or 7
Practice method.
13. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
Such as data supply method as claimed in any one of claims 1 to 5 is realized when execution, or is realized as described in claim 6 or 7
Model training method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910197522.8A CN109977822B (en) | 2019-03-15 | 2019-03-15 | Data supply method, model training method, device, system, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910197522.8A CN109977822B (en) | 2019-03-15 | 2019-03-15 | Data supply method, model training method, device, system, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109977822A true CN109977822A (en) | 2019-07-05 |
CN109977822B CN109977822B (en) | 2023-05-09 |
Family
ID=67079035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910197522.8A Active CN109977822B (en) | 2019-03-15 | 2019-03-15 | Data supply method, model training method, device, system, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977822B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427998A (en) * | 2019-07-26 | 2019-11-08 | 上海商汤智能科技有限公司 | Model training, object detection method and device, electronic equipment, storage medium |
CN110912926A (en) * | 2019-12-04 | 2020-03-24 | 湖南快乐阳光互动娱乐传媒有限公司 | Data resource back-source method and device |
CN112395070A (en) * | 2019-08-12 | 2021-02-23 | 阿里巴巴集团控股有限公司 | Data processing system and method |
WO2021190715A1 (en) * | 2020-03-27 | 2021-09-30 | Continental Automotive Gmbh | Computer-implemented method and distributed storage system for providing reliable data objects |
CN114697682A (en) * | 2020-12-29 | 2022-07-01 | 阿里巴巴集团控股有限公司 | Video processing method and system |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040001143A1 (en) * | 2002-06-27 | 2004-01-01 | Beal Matthew James | Speaker detection and tracking using audiovisual data |
CN102222213A (en) * | 2010-07-29 | 2011-10-19 | 郑文明 | Distributed vision computing method based on open type Web Service framework |
US20130027568A1 (en) * | 2011-07-29 | 2013-01-31 | Dekun Zou | Support vector regression based video quality prediction |
US9204103B1 (en) * | 2011-12-30 | 2015-12-01 | Emc Corporation | Technique for parallel, distributed video processing |
CN107741899A (en) * | 2017-10-16 | 2018-02-27 | 北京小米移动软件有限公司 | The method, apparatus and system of processing terminal data |
CN108108754A (en) * | 2017-12-15 | 2018-06-01 | 北京迈格威科技有限公司 | The training of identification network, again recognition methods, device and system again |
CN108829518A (en) * | 2018-05-31 | 2018-11-16 | 北京百度网讯科技有限公司 | Method and apparatus for pushed information |
CN108876166A (en) * | 2018-06-27 | 2018-11-23 | 平安科技(深圳)有限公司 | Financial risk authentication processing method, device, computer equipment and storage medium |
US20190005069A1 (en) * | 2017-06-28 | 2019-01-03 | Google Inc. | Image Retrieval with Deep Local Feature Descriptors and Attention-Based Keypoint Descriptors |
CN109284417A (en) * | 2018-08-27 | 2019-01-29 | 广州飞磨科技有限公司 | Video pushing method, device, computer equipment and storage medium |
-
2019
- 2019-03-15 CN CN201910197522.8A patent/CN109977822B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040001143A1 (en) * | 2002-06-27 | 2004-01-01 | Beal Matthew James | Speaker detection and tracking using audiovisual data |
CN102222213A (en) * | 2010-07-29 | 2011-10-19 | 郑文明 | Distributed vision computing method based on open type Web Service framework |
US20130027568A1 (en) * | 2011-07-29 | 2013-01-31 | Dekun Zou | Support vector regression based video quality prediction |
US9204103B1 (en) * | 2011-12-30 | 2015-12-01 | Emc Corporation | Technique for parallel, distributed video processing |
US20190005069A1 (en) * | 2017-06-28 | 2019-01-03 | Google Inc. | Image Retrieval with Deep Local Feature Descriptors and Attention-Based Keypoint Descriptors |
CN107741899A (en) * | 2017-10-16 | 2018-02-27 | 北京小米移动软件有限公司 | The method, apparatus and system of processing terminal data |
CN108108754A (en) * | 2017-12-15 | 2018-06-01 | 北京迈格威科技有限公司 | The training of identification network, again recognition methods, device and system again |
CN108829518A (en) * | 2018-05-31 | 2018-11-16 | 北京百度网讯科技有限公司 | Method and apparatus for pushed information |
CN108876166A (en) * | 2018-06-27 | 2018-11-23 | 平安科技(深圳)有限公司 | Financial risk authentication processing method, device, computer equipment and storage medium |
CN109284417A (en) * | 2018-08-27 | 2019-01-29 | 广州飞磨科技有限公司 | Video pushing method, device, computer equipment and storage medium |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427998A (en) * | 2019-07-26 | 2019-11-08 | 上海商汤智能科技有限公司 | Model training, object detection method and device, electronic equipment, storage medium |
CN112395070A (en) * | 2019-08-12 | 2021-02-23 | 阿里巴巴集团控股有限公司 | Data processing system and method |
CN110912926A (en) * | 2019-12-04 | 2020-03-24 | 湖南快乐阳光互动娱乐传媒有限公司 | Data resource back-source method and device |
CN110912926B (en) * | 2019-12-04 | 2022-03-25 | 湖南快乐阳光互动娱乐传媒有限公司 | Data resource back-source method and device |
WO2021190715A1 (en) * | 2020-03-27 | 2021-09-30 | Continental Automotive Gmbh | Computer-implemented method and distributed storage system for providing reliable data objects |
DE102020204033A1 (en) | 2020-03-27 | 2021-09-30 | Continental Automotive Gmbh | Computer implemented method and distributed storage system for providing trusted data objects |
CN114697682A (en) * | 2020-12-29 | 2022-07-01 | 阿里巴巴集团控股有限公司 | Video processing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109977822B (en) | 2023-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977822A (en) | Data supply method, model training method, device, system, equipment and medium | |
CN111258744B (en) | Task processing method based on heterogeneous computation and software and hardware frame system | |
US11061731B2 (en) | Method, device and computer readable medium for scheduling dedicated processing resource | |
US11374995B2 (en) | Multimedia file processing | |
AU2009213013B2 (en) | Pipelined image processing engine | |
CN105144722B (en) | Network coded storage with multi-resolution codes | |
CN109918184A (en) | Picture processing system, method and relevant apparatus and equipment | |
CN101388892B (en) | Method and apparatus for client-side aggregation of asynchronous fragmented requests | |
EP2044749B1 (en) | Dispatching request fragments from a response aggregating surrogate | |
CN107656777A (en) | A kind of flow path processing method and system based on event | |
CN108605160A (en) | Information processing equipment and information processing method | |
CN101977218A (en) | Internet playing file transcoding method and system | |
CN107749893A (en) | The quick method for receiving and storing data is realized in a kind of shared-file system | |
CN105915587A (en) | Content push method, content push system and buffer memory server | |
CN108008959A (en) | A kind of Software Development Kit SDK cut-in methods, system and device | |
CN115905061A (en) | Data transfer device, DMA device, electronic apparatus, and data transfer method | |
US20140358996A1 (en) | Distributed encoding and decoding system, method, and device | |
CN108182119A (en) | Read and write abruption control method and device, storage medium and electronic device | |
CN107147706A (en) | Data export method and device | |
CN106649716A (en) | Multithread-based online file format conversion method and system | |
CN101296373B (en) | Multimedia data processing system and method based on material exchange format | |
EP2690560B1 (en) | Method of benchmarking the behaviour of a replacement information system with the old system | |
CN102487401A (en) | File download method and apparatus thereof | |
CN105812327B (en) | Composite type multipurpose communication method and system | |
CN208046796U (en) | Internet of things experimental device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |