CN110348369A

CN110348369A - A kind of video scene classification method, device, mobile terminal and storage medium

Info

Publication number: CN110348369A
Application number: CN201910612133.7A
Authority: CN
Inventors: 郭冠军
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-07-08
Filing date: 2019-07-08
Publication date: 2019-10-18
Anticipated expiration: 2039-07-08
Also published as: CN110348369B

Abstract

The embodiment of the present disclosure discloses a kind of video scene classification method, device, mobile terminal and storage medium.Wherein, this method comprises: obtaining first group of the video object and second group of the video object；It include the video object in first group of the video object in second group of the video object；According to image-region of first group of the video object in the first frame video image and the second frame video image, movement velocity of first group of the video object in video image plane is determined；According to movement velocity, expection image-region of first group of the video object in the undivided video image of each frame of current video is determined, and according to expected image-region, image segmentation is carried out to the undivided video image of each frame, obtains corresponding the video object；According to the video object corresponding with each frame video image of current video, the classification results of target video are determined.The segmentation accuracy of the video object continuously moved can be improved in the embodiment of the present disclosure, accurately can carry out scene classification to video.

Description

A kind of video scene classification method, device, mobile terminal and storage medium

Technical field

The embodiment of the present disclosure is related to technical field of video processing more particularly to a kind of video scene classification method, device, shifting Dynamic terminal and storage medium.

Background technique

With popularizing for mobile terminal, user can carry out video capture by mobile terminal under various scenes.Usually Scene classification can be carried out to the video that user shoots, obtain the corresponding scene type of video, can then proceed in scene type will User video is stored in photograph album, is shared convenient for user.

In the prior art, generally the video object in each frame video image is split respectively, then according to segmentation The video object out determines the corresponding scene type of video.

Drawback of the prior art is that being difficult to protect for the segmentation accuracy of the video object continuously moved in video Card may cause the scene classification error of video.For example, the video object continuously moved is more and more far away from camera lens, Moving Objects Image-region in video image is smaller and smaller, and the video object is not partitioned into subsequent video image, to lead Cause the scene classification error of video.

Summary of the invention

The disclosure provides a kind of video scene classification method, device, mobile terminal and storage medium, accurately right to realize Video carries out scene classification.

In a first aspect, the embodiment of the present disclosure provides a kind of video scene classification method, comprising:

The first frame video image of current video and the second frame video image are input to preset image segmentation mould respectively Type obtains first group of the video object corresponding with the first frame video image, and second group corresponding with the second frame video image The video object；Wherein, including the video object in first group of the video object in second group of the video object；

According to image-region of first group of the video object in the first frame video image and the second frame video image, determine Movement velocity of first group of the video object in video image plane；

According to movement velocity, expection of first group of the video object in the undivided video image of each frame of current video is determined Image-region, and according to expected image-region, image segmentation is carried out to the undivided video image of each frame, obtains corresponding video pair As；

According to the video object corresponding with each frame video image of current video, the classification results of target video are determined.

Second aspect, the embodiment of the present disclosure additionally provide a kind of video scene sorter,

First image segmentation module, for respectively that the first frame video image of current video and the second frame video image is defeated Enter to preset Image Segmentation Model, obtain first group of the video object corresponding with the first frame video image, and with the second frame The corresponding second group of the video object of video image；Wherein, including the video in first group of the video object in second group of the video object Object；

Movement velocity determining module is used for according to first group of the video object in the first frame video image and the second frame video Image-region in image determines movement velocity of first group of the video object in video image plane；

Second image segmentation module, for determining first group of the video object in each frame of current video according to movement velocity Expection image-region in undivided video image, and according to expected image-region, figure is carried out to the undivided video image of each frame As segmentation, corresponding the video object is obtained；

Visual classification module, for determining target according to the video object corresponding with each frame video image of current video The classification results of video.

The third aspect, the embodiment of the present disclosure additionally provide a kind of mobile terminal, comprising:

One or more processing units；

Storage device, for storing one or more programs；

When one or more programs are executed by one or more processing units, so that one or more processing units are realized such as Video scene classification method described in the embodiment of the present disclosure.

Fourth aspect, the embodiment of the present disclosure additionally provide a kind of computer readable storage medium, are stored thereon with computer Program realizes the video scene classification method as described in the embodiment of the present disclosure when computer program is executed by processor.

The embodiment of the present disclosure by according to first group of the video object in the first frame video image and the second frame video image In image-region, determine movement velocity of first group of the video object in video image plane, then according to movement velocity, really Fixed expection image-region of first group of the video object in the undivided video image of each frame of current video, and according to expected image Region carries out image segmentation to the undivided video image of each frame, obtains corresponding the video object, then basis and current video The corresponding the video object of each frame video image, determines the classification results of target video, solves the prior art in video The problem of segmentation accuracy of the video object continuously moved is difficult to ensure, may cause the scene classification error of video, can be with According to image-region of the video object in the first frame video image and the second frame video image, determine the video object in video Movement velocity on the plane of delineation, and image point is carried out to the video object in the undivided video image of each frame according to movement velocity It cuts, the segmentation accuracy of the video object continuously moved can be improved, scene classification accurately can be carried out to video.

Detailed description of the invention

In conjunction with attached drawing and refer to following specific embodiments, the above and other feature, advantage of each embodiment of the disclosure and Aspect will be apparent.In attached drawing, the same or similar appended drawing reference indicates the same or similar element.It should manage Solution attached drawing is schematically that original part and element are not necessarily drawn to scale.

Fig. 1 is a kind of flow chart for video scene classification method that the embodiment of the present disclosure provides；

Fig. 2 is a kind of flow chart for video scene classification method that the embodiment of the present disclosure provides；

Fig. 3 is a kind of flow chart for video scene classification method that the embodiment of the present disclosure provides；

Fig. 4 is a kind of structural schematic diagram for video scene sorter that the embodiment of the present disclosure provides；

Fig. 5 is a kind of structural schematic diagram for mobile terminal that the embodiment of the present disclosure provides.

Specific embodiment

Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the certain of the disclosure in attached drawing Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this In the embodiment that illustrates, providing these embodiments on the contrary is in order to more thorough and be fully understood by the disclosure.It should be understood that It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection scope of the disclosure.

It should be appreciated that each step recorded in disclosed method embodiment can execute in a different order, And/or parallel execution.In addition, method implementation may include additional step and/or omit the step of execution is shown.This public affairs The range opened is not limited in this respect.

Terms used herein " comprising " and its deformation are that opening includes, i.e., " including but not limited to ".Term "based" It is " being based at least partially on ".Term " one embodiment " expression " at least one embodiment "；Term " another embodiment " indicates " at least one other embodiment "；Term " some embodiments " expression " at least some embodiments ".The correlation of other terms is fixed Justice provides in will be described below.

It is noted that the concepts such as " first " that refers in the disclosure, " second " are only used for different devices, module or list Member distinguishes, and is not intended to limit the sequence or relation of interdependence of function performed by these devices, module or unit.

It is noted that referred in the disclosure "one", the modification of " multiple " be schematically and not restrictive this field It will be appreciated by the skilled person that being otherwise construed as " one or more " unless clearly indicate otherwise in context.

The being merely to illustrate property of title of the message or information that are interacted between multiple devices in disclosure embodiment Purpose, and be not used to limit the range of these message or information.

Fig. 1 is a kind of flow chart for video scene classification method that the embodiment of the present disclosure provides.The present embodiment is applicable to The case where carrying out scene classification to video, this method can be executed by video scene sorter, which can use soft The mode of part and/or hardware realizes that the device can be configured in mobile terminal.As shown in Figure 1, this method may include as follows Step:

First frame video image of current video and the second frame video image are input to preset figure respectively by step 101 As parted pattern, first group of the video object corresponding with the first frame video image is obtained, and corresponding with the second frame video image Second group of the video object；Wherein, including the video object in first group of the video object in second group of the video object.

Wherein, the video that current video can be shot for user by the camera of mobile terminal.Current video is by multiframe Video image is constituted.

Image segmentation exactly divides the image into the technology and process in several regions specific, with unique properties.It is existing Some image partition methods are broadly divided into following a few classes: the dividing method based on threshold value, is based on side at the dividing method based on region The dividing method of edge and the dividing method based on specific theory etc..From the point of view of mathematical angle, image segmentation is by digital picture It is divided into the process in mutually disjoint region.The process of image segmentation is also a labeling process.Usually by determining image packet Classification belonging to each pixel included, to realize the image segmentation to the image.

The first frame video image of current video and the second frame video image are input to preset image segmentation mould respectively Type.Preset Image Segmentation Model analyzes the first frame video image, obtains each pixel in the first frame video image For the probability of various the video object classifications, the corresponding the video object classification of select probability maximum value is view belonging to this pixel Frequency object type obtains first group of the video object corresponding with the first frame video image.Each the video object includes first frame Belong to all pixels point of the video object classification in video image.Optionally, preset various video object type can wrap Include who object and subject.For example, the video object classification can be people, desk or automobile etc..Preset image segmentation Model analyzes the second frame video image, and obtaining each pixel in the second frame video image is various the video object classifications Probability, the corresponding the video object classification of select probability maximum value be this pixel belonging to the video object classification, obtain with The corresponding second group of the video object of second frame video image.Each the video object includes belonging to the view in the second frame video image The all pixels point of frequency object type.

It include the video object in first group of the video object in second group of the video object.The first frame video figure of current video As short with the time interval between the second frame video image, comprising identical in the first frame video image and the second frame video image The video object.For example, first group of the video object corresponding with the first frame video image includes: to belong to people in the first frame video image All pixels point, belong to all pixels point of desk in the first frame video image.Corresponding with the second frame video image second Group the video object includes: all pixels point for belonging to people in the second frame video image, belongs to desk in the second frame video image All pixels point.

Step 102, the image according to first group of the video object in the first frame video image and the second frame video image Region determines movement velocity of first group of the video object in video image plane.

Wherein, image-region of each of first group of the video object the video object in the first frame video image refers to Belong to the region that all pixels point of corresponding the video object classification is constituted in first frame video image.First group of the video object Each of the video object the image-region in the second frame video image refer to belong in the second frame video image it is corresponding The region that all pixels point of the video object classification is constituted.

To each of first group of the video object the video object in the first frame video image and the second frame video image In image-region positioned, the displacement of the image-region of each the video object is determined, then according to each video pair Time interval between the displacement of the image-region of elephant and the first frame video image and the second frame video image calculates each Movement velocity of a the video object in video image plane.Movement velocity includes the direction of motion and movement rate.

For example, first group of the video object includes: all pixels point for belonging to people, all pixels point for belonging to desk.First Image-region of the group the video object in the first frame video image and the second frame video image includes: in the first frame video image Belong in the region, the first frame video image that all pixels point of people constituted and belongs to the area that all pixels point of desk is constituted Belong in the region, the second frame video image that all pixels point of people constituted in domain, the second frame video image and belongs to desk The region that all pixels point is constituted.To belonging to all pixels o'clock of people in the first frame video image and the second frame video image In image-region positioned, the then displacement of the image-region according to corresponding to all pixels point for belonging to people, Yi Ji Time interval between one frame video image and the second frame video image, all pixels point that calculating belongs to people are flat in video image Movement velocity on face.To belonging to figure of all pixels o'clock of desk in the first frame video image and the second frame video image As region is positioned, the then displacement and first frame of the image-region according to corresponding to all pixels point for belonging to desk Time interval between video image and the second frame video image calculates all pixels point for belonging to desk in video image plane On movement velocity.

Step 103, according to movement velocity, determine first group of the video object in the undivided video image of each frame of current video In expection image-region image segmentation is carried out to the undivided video image of each frame, is corresponded to and according to expected image-region The video object.

Wherein, it is contemplated that image-region is in the undivided video image determined according to the movement tendency of the video object comprising view The image-region of frequency object.According to movement speed of each the video object in first group of the video object in video image plane Time interval between degree and each frame video image, determines each the video object figure locating for corresponding timing node As region, i.e. expection image-region of each the video object in the undivided video image of each frame of current video.

For the undivided video image of each frame, according to first group of the video object current video the undivided video of each frame Expection image-region in image is cut out and each the video object in first group of the video object in undivided video image The expection image matched.Each is expected to include a matched the video object in image.It then respectively will be with each the video object The expection image matched is input to preset Image Segmentation Model, obtains the corresponding the video object of undivided video image.

Step 104, basis the video object corresponding with each frame video image of current video, determine the classification of target video As a result.

In a specific example, the rule of correspondence of the video object and scene type is preset.The video object and scene The rule of correspondence of classification can video image is included according to the video object type, the image-region area of the video object, And the positional relationship of the video object, determine the rule of the scene type of video image.

For example, the video object corresponding with each frame video image of current video includes all pixels point for belonging to people, with And belong to all pixels point of automobile, the scene type of video image is thereby determined that drive.With each frame video of current video The corresponding the video object of image includes all pixels point for belonging to desk, belongs to the image-region area of all pixels point of desk Greater than preset area threshold value, thereby determine that the scene type of video image for desk displaying.With each frame video figure of current video As corresponding the video object includes belonging to that all pixels of people are selected, the institute of food is selected and belonged to all pixels for belonging to desk There is pixel.The all pixels that desk was selected, belonged to all pixels for belonging to people are selected and are belonged between all pixels point of food Relative distance be less than pre-determined distance threshold value, thereby determine that the scene type of video image for dining.

According to the rule of correspondence of preset the video object and scene type, and video pair corresponding with each frame video image As determining scene type corresponding with each frame video image.Then the corresponding video image frame number of each scene type is counted, will be regarded Classification results of the most scene type of frequency number of image frames as target video.

For example, target video includes 100 frame video images.According to preset the video object rule corresponding with scene type Then, and the video object corresponding with each frame video image, scene type corresponding with each frame video image is determined.Then it counts The corresponding video image frame number of each scene type: " desk displaying " corresponding video image frame number is 16, " dining " corresponding view Frequency number of image frames is 57, and " food display " corresponding video image frame number is 27.By the most scene type of video image frame number The classification results of " dining " as target video.

The technical solution of the present embodiment, by being regarded according to first group of the video object in the first frame video image and the second frame Image-region in frequency image determines movement velocity of first group of the video object in video image plane, then according to movement Speed, determines expection image-region of first group of the video object in the undivided video image of each frame of current video, and according to It is expected that image-region, image segmentation is carried out to the undivided video image of each frame, obtains corresponding the video object, then according to work as The corresponding the video object of each frame video image of preceding video, determines the classification results of target video, solves the prior art and be directed to The segmentation accuracy of the video object continuously moved in video is difficult to ensure, may cause asking for the scene classification error of video Topic, can determine video pair according to image-region of the video object in the first frame video image and the second frame video image As the movement velocity in video image plane, and according to movement velocity to the video object in the undivided video image of each frame into The segmentation accuracy of the video object continuously moved can be improved in row image segmentation, accurately can carry out scene point to video Class.

Fig. 2 is a kind of flow chart for video scene classification method that the embodiment of the present disclosure provides.The present embodiment can with it is upper It states each optinal plan in one or more embodiment to combine, in the present embodiment, according to first group of the video object first Image-region in frame video image and the second frame video image determines first group of the video object in video image plane Movement velocity may include: using optical flow method, according to first group of the video object in the first frame video image and the second frame video Image-region in image determines movement velocity of first group of the video object in video image plane.

And according to movement velocity, determine first group of the video object in the undivided video image of each frame of current video Expection image-region image segmentation is carried out to the undivided video image of each frame, is obtained corresponding and according to expected image-region The video object may include: successively to obtain the undivided video image of a frame as currently processed video image；According to movement speed Degree, the image-region of the frame per second of current video and first group of the video object in previous frame video image, determine first group of view Expection image-region of the frequency object in currently processed video image；Using with each the video object in first group of the video object The frames images matched are cut out and the matched expection of each the video object according to expected image-region in currently processed video image Image；It will be input to preset Image Segmentation Model with the matched expected image of each the video object, obtained and currently processed video The corresponding the video object of image；It returns to execute and successively obtains behaviour of the undivided video image of a frame as currently processed video image Make, until completing the processing to the undivided video image of the whole of current video.

And according to the video object corresponding with each frame video image of current video, determine the classification knot of target video Fruit may include: the rule of correspondence according to preset the video object and scene type, and view corresponding with each frame video image Frequency object determines scene type corresponding with each frame video image；Count the corresponding video image frame number of each scene type；It will view Classification results of the most scene type of frequency number of image frames as target video.

As shown in Fig. 2, this method may include steps of:

First frame video image of current video and the second frame video image are input to preset figure respectively by step 201 As parted pattern, first group of the video object corresponding with the first frame video image is obtained, and corresponding with the second frame video image Second group of the video object；Wherein, including the video object in first group of the video object in second group of the video object.

Step 202, using optical flow method, according to first group of the video object in the first frame video image and the second frame video figure Image-region as in, determines movement velocity of first group of the video object in video image plane.

Wherein, the core of optical flow method is exactly to solve the light stream of moving target, i.e. speed.According to theoretical basis and mathematics side Optical flow computation technology is divided into four kinds by the difference of method: method based on gradient, based on matched method, the method based on energy, Method based on phase.Based on two kinds that matched optical flow computation method includes based on feature and region.

Method based on feature is constantly positioned and is tracked to target main feature.Method based on region is first to class As region positioned, light stream is then calculated by the displacement of similar area.The embodiment of the present disclosure uses the side based on region Method, to figure of each of first group of the video object the video object in the first frame video image and the second frame video image As region is positioned, the displacement of the image-region of each the video object is determined, then according to the figure of each the video object As region displacement and the first frame video image and the second frame video image between time interval, calculate each video Light stream of the object in video image plane, i.e. movement velocity.

Step 203 successively obtains the undivided video image of a frame as currently processed video image.

Wherein, the video image in target video arranges sequentially in time.

Step 204, according to movement velocity, the frame per second of current video and first group of the video object in former frame video figure Image-region as in, determines expection image-region of first group of the video object in currently processed video image.

Optionally, according to movement velocity, the frame per second of current video and first group of the video object in previous frame video image In image-region, determine expection image-region of first group of the video object in currently processed video image, may include: root According to the frame per second of current video, the interval time of previous frame video image Yu currently processed video image is determined；According to interval time And movement velocity, determine first group of the video object in the displacement of interval time；According to first group of the video object in former frame video Image-region in image, and displacement, determine expection image district of first group of the video object in currently processed video image Domain.

The frame per second of video is the measurement for measuring display frame number.So-called measurement unit is display frame number per second (frames per second, fps).According to the frame per second of current video determine in current video between each frame video image when Between be spaced.For example, the frame per second of target video is 25fps, i.e. 25 frame video images of display per second.It determines in current video as a result, Time interval between each frame video image is 0.04 second.

According to movement velocity and each frame of each the video object in first group of the video object in video image plane Time interval between video image calculates each the video object in the displacement of time interval.Each the video object when Between be spaced displacement be equal to movement velocity and time interval of each the video object in video image plane product.

Then the image-region according to first group of the video object in previous frame video image, and be calculated each A the video object determines the image-region that each the video object is reached after time interval, i.e., in the displacement of time interval Expection image-region of first group of the video object in currently processed video image.

Step 205, using with the matched frames images of each the video object in first group of the video object, according to expected image district Domain is cut out and the matched expected image of each the video object in currently processed video image.

Wherein, the size of frames images is determined according to the image area size of matched the video object.Frames images are greater than video The image-region of object.For each the video object, frames images are matched with expected image-region, make expected image district Domain is included in frames images, then along the frames images, is cut out in currently processed video image and is matched with each the video object Expection image.

Optionally, frames images are matched with expected image-region, expected image-region is made to be located at the center of frames images Between, then along the frames images, cut out in currently processed video image and the matched expected image of each the video object.

Step 206 will be input to preset Image Segmentation Model with the matched expected image of each the video object, obtain and work as The corresponding the video object of pre-treatment video image.

Wherein, it will be input to preset Image Segmentation Model with the matched expected image of each the video object respectively, and will obtain figure As parted pattern goes out corresponding the video object according to expected image segmentation.

Step 207, return, which execute, successively obtains operation of the undivided video image of a frame as currently processed video image, Until completing the processing to the undivided video image of the whole of current video.

Step 208, according to the rule of correspondence of preset the video object and scene type, and it is corresponding with each frame video image The video object, determine corresponding with each frame video image scene type.

Step 209, the corresponding video image frame number of each scene type of statistics.

Step 210, using the most scene type of video image frame number as the classification results of target video.

The technical solution of the present embodiment, by using optical flow method, according to first group of the video object in the first frame video image And the second image-region in frame video image, determine movement velocity of first group of the video object in video image plane, Then the undivided video image of a frame is successively obtained as currently processed video image, and first group of view is determined according to movement velocity Expection image-region of the frequency object in currently processed video image, and according to expected image-region, to the undivided video of each frame Image carries out image segmentation, obtains corresponding the video object, until completing to the undivided video image of the whole of current video Processing, later according to the rule of correspondence of preset the video object and scene type, and video corresponding with each frame video image Object determines scene type corresponding with each frame video image, regards the most scene type of video image frame number as target The classification results of frequency can determine movement velocity of the video object in video image plane using optical flow method, can be according to view The movement velocity of frequency object determines expection image-region of the video object in currently processed video image, and cut out with it is each The corresponding expected image of a the video object, so as to according to the expection image comprising the video object to currently processed video image In each the video object accurately divided.

Fig. 3 is a kind of flow chart for video scene classification method that the embodiment of the present disclosure provides.The present embodiment can with it is upper It states each optinal plan in one or more embodiment to combine, in the present embodiment, respectively by the first frame of current video Video image and the second frame video image are input to before preset Image Segmentation Model, can be with further include: are obtained and each scene The corresponding training sample set of classification, it includes the image for setting quantity corresponding with scene type that training sample, which is concentrated,；Use instruction Practice sample set to be trained neural network model, obtains preset Image Segmentation Model.

As shown in figure 3, this method may include steps of:

Step 301 obtains corresponding with each scene type training sample set, and training sample, which is concentrated, includes and scene type The image of corresponding setting quantity.

Wherein, the image of the setting quantity corresponding with each scene type of acquisition in advance, and image is saved to each scene class In not corresponding training sample set.Setting quantity can be arranged according to business demand.For example, being directed to each scene type Class acquires 2000 images corresponding with scene type, and 2000 images of acquisition is saved instruction corresponding to scene type Practice in sample set.

Optionally, it is corresponding with scene type setting quantity image are as follows: original image and by original image according to pre- If image procossing rule process after obtained image.

Original image is image gathered in advance.Preset image procossing rule can for according to predetermined manner by original graph The video object as in is moved to predeterminated position, covers the original pixel of predeterminated position, and using image mending technology to view The home position of frequency object is filled, the image that obtains that treated.For example, predetermined manner can be translate up setting away from From, downwards translate set distance, move setting to left set distance, to right translation set distance or along any direction Distance.

Then it is protected by original image and by original image according to the image obtained after preset image procossing rule process It deposits into the corresponding training sample set of scene.

Thus, it is possible to increase the sample size of each training sample set, the sample to each training sample set is realized This enhancing.

Step 302 is trained neural network model using training sample set, obtains preset image segmentation mould Type.

Wherein, neural network model is trained using training sample set corresponding with each scene type, is obtained pre- If Image Segmentation Model.Preset image classification model is for receiving video image, the segmentation result of output video image, i.e., Each the video object in output video image.Each the video object includes belonging to the institute of the video object classification in video image There is pixel.

First frame video image of current video and the second frame video image are input to preset figure respectively by step 303 As parted pattern, first group of the video object corresponding with the first frame video image is obtained, and corresponding with the second frame video image Second group of the video object；Wherein, including the video object in first group of the video object in second group of the video object.

Step 304, the image according to first group of the video object in the first frame video image and the second frame video image Region determines movement velocity of first group of the video object in video image plane.

Step 305, according to movement velocity, determine first group of the video object in the undivided video image of each frame of current video In expection image-region image segmentation is carried out to the undivided video image of each frame, is corresponded to and according to expected image-region The video object.

Step 306, basis the video object corresponding with each frame video image of current video, determine the classification of target video As a result.

The technical solution of the present embodiment, by obtaining training sample set corresponding with each scene type, training sample set In include setting quantity corresponding with scene type image, and neural network model is instructed using training sample set Practice, obtain preset Image Segmentation Model, can train to obtain and can receive video image, the segmentation result of output video image Image Segmentation Model, can according to preset Image Segmentation Model export video image segmentation result to video carry out field Scape classification.

Fig. 4 is a kind of structural schematic diagram for video scene sorter that the embodiment of the present disclosure provides.The present embodiment can fit The case where for carrying out scene classification to video.The device can realize by the way of software and/or hardware, which can be with It is configured at mobile terminal.As shown in figure 4, the apparatus may include: the first image segmentation module 401, movement velocity determining module 402, the second image segmentation module 403 and visual classification module 404.

Wherein, the first image segmentation module 401, for respectively regarding the first frame video image of current video and the second frame Frequency image is input to preset Image Segmentation Model, obtains first group of the video object corresponding with the first frame video image, and Second group of the video object corresponding with the second frame video image；It wherein, include first group of the video object in second group of the video object In the video object；Movement velocity determining module 402, for according to first group of the video object in the first frame video image and the Image-region in two frame video images determines movement velocity of first group of the video object in video image plane；Second figure As segmentation module 403, for determining first group of the video object in the undivided video figure of each frame of current video according to movement velocity Expection image-region as in, and according to expected image-region, carries out image segmentation to the undivided video image of each frame, obtains pair The video object answered；Visual classification module 404, for according to the video object corresponding with each frame video image of current video, Determine the classification results of target video.

Optionally, based on the above technical solution, movement velocity determining module 402 may include: that speed determines list Member, for using optical flow method, according to figure of first group of the video object in the first frame video image and the second frame video image As region, movement velocity of first group of the video object in video image plane is determined.

Optionally, based on the above technical solution, the second image segmentation module 403 may include: that image obtains list Member, for successively obtaining the undivided video image of a frame as currently processed video image；Area determination unit, for according to fortune The dynamic image-region of speed, the frame per second of current video and first group of the video object in previous frame video image, determines first Expection image-region of the group the video object in currently processed video image；Image cropping unit.For using and first group of view The matched frames images of each the video object in frequency object are cut out in currently processed video image according to expected image-region With the matched expected image of each the video object；Object acquisition unit, for that will be inputted with the matched expected image of each the video object To preset Image Segmentation Model, the video object corresponding with currently processed video image is obtained；Image processing unit, for returning Receipt row successively obtains operation of the undivided video image of a frame as currently processed video image, until completing to current video The undivided video image of whole processing.

Optionally, based on the above technical solution, area determination unit may include: to determine subelement the time, use In the frame per second according to current video, the interval time of previous frame video image Yu currently processed video image is determined；Displacement determines Subelement, for determining first group of the video object in the displacement of interval time according to interval time and movement velocity；Region determines Subelement determines first group for the image-region according to first group of the video object in previous frame video image, and displacement Expection image-region of the video object in currently processed video image.

Optionally, based on the above technical solution, visual classification module 404 may include: that scene type determines list Member, for the rule of correspondence according to preset the video object and scene type, and video pair corresponding with each frame video image As determining scene type corresponding with each frame video image；Frames statistic unit, for counting the corresponding video of each scene type Number of image frames；Classification results determination unit, for using the most scene type of video image frame number as the classification of target video As a result.

It optionally, based on the above technical solution, can be with further include: set obtains module, for obtaining and each field The corresponding training sample set of scape classification, it includes the image for setting quantity corresponding with scene type that training sample, which is concentrated,；Model Training module obtains preset Image Segmentation Model for being trained using training sample set to neural network model.

Optionally, based on the above technical solution, the image of setting quantity corresponding with scene type are as follows: original graph As and by original image according to the image obtained after preset image procossing rule process.

Video field provided by the embodiment of the present disclosure can be performed in video scene sorter provided by the embodiment of the present disclosure Scape classification method has the corresponding functional module of execution method and beneficial effect.

Below with reference to Fig. 5, it illustrates the structural representations for the mobile terminal 500 for being suitable for being used to realize the embodiment of the present disclosure Figure.Mobile terminal in the embodiment of the present disclosure can include but is not limited to such as mobile phone, laptop, digital broadcasting and connect Receive device, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as vehicle Carry navigation terminal) etc..Mobile terminal shown in Fig. 5 is only an example, function to the embodiment of the present disclosure and should not be made With range band come any restrictions.

As shown in figure 5, mobile terminal 500 may include processing unit (such as central processing unit, graphics processor etc.) 501, random access can be loaded into according to the program being stored in read-only memory (ROM) 502 or from storage device 506 Program in memory (RAM) 503 and execute various movements appropriate and processing.In RAM 503, it is also stored with mobile terminal Various programs and data needed for 500 operations.Processing unit 501, ROM 502 and RAM 503 pass through the phase each other of bus 504 Even.Input/output (I/O) interface 505 is also connected to bus 504.

In general, following device can connect to I/O interface 505: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 506 of head, microphone, accelerometer, gyroscope etc.；Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 507 of dynamic device etc.；Storage device 506 including such as tape, hard disk etc.；And communication device 509.Communication device 509, which can permit mobile terminal 500, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 5 shows tool There is the mobile terminal 500 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising being carried on non-transient computer can The computer program on medium is read, which includes the program code for method shown in execution flow chart.At this In the embodiment of sample, which can be downloaded and installed from network by communication device 509, or be filled from storage It sets 506 to be mounted, or is mounted from ROM 502.When the computer program is executed by processing unit 501, the disclosure is executed The above-mentioned function of being limited in the method for embodiment.

It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable and deposit Any computer-readable medium other than storage media, the computer-readable signal media can send, propagate or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. are above-mentioned Any appropriate combination.

In some embodiments, client, server can use such as HTTP (HyperText Transfer Protocol, hypertext transfer protocol) etc the network protocols of any currently known or following research and development communicated, and can To be interconnected with the digital data communications (for example, communication network) of arbitrary form or medium.The example of communication network includes local area network (" LAN "), wide area network (" WAN "), Internet (for example, internet) and ad-hoc network are (for example, the end-to-end net of ad hoc Network) and any currently known or following research and development network.

Above-mentioned computer-readable medium can be included in above-mentioned mobile terminal；It is also possible to individualism, and not It is fitted into the mobile terminal.

Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by the shifting When dynamic terminal executes, so that the mobile terminal: respectively that the first frame video image of current video and the second frame video image is defeated Enter to preset Image Segmentation Model, obtain first group of the video object corresponding with the first frame video image, and with the second frame The corresponding second group of the video object of video image；Wherein, including the video in first group of the video object in second group of the video object Object；According to image-region of first group of the video object in the first frame video image and the second frame video image, is determined Movement velocity of one group of the video object in video image plane；According to movement velocity, determine first group of the video object current Expection image-region in the undivided video image of each frame of video, and according to expected image-region, to the undivided video of each frame Image carries out image segmentation, obtains corresponding the video object；According to video pair corresponding with each frame video image of current video As determining the classification results of target video.

The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof Machine program code, above procedure design language include but is not limited to object oriented program language-such as Java, Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).

Flow chart and block diagram in attached drawing, illustrate according to the method, apparatus of the various embodiments of the disclosure, mobile terminal and The architecture, function and operation in the cards of computer program product.In this regard, each side in flowchart or block diagram Frame can represent a part of a module, program segment or code, and a part of the module, program segment or code includes one Or multiple executable instructions for implementing the specified logical function.It should also be noted that in some implementations as replacements, side The function of being marked in frame can also occur in a different order than that indicated in the drawings.For example, two sides succeedingly indicated Frame can actually be basically executed in parallel, they can also be executed in the opposite order sometimes, this according to related function and It is fixed.It is also noted that the group of each box in block diagram and or flow chart and the box in block diagram and or flow chart It closes, can be realized with the dedicated hardware based system for executing defined functions or operations, or specialized hardware can be used Combination with computer instruction is realized.

Being described in module, unit and subelement involved in the embodiment of the present disclosure can be real by way of software It is existing, it can also be realized by way of hardware.Wherein, the title of module, unit or subelement is not under certain conditions Constitute restriction to the module, unit or subelement itself, for example, visual classification module be also described as " according to The corresponding the video object of each frame video image of current video, determines the module of the classification results of target video ", image obtains single Member is also described as " successively obtaining unit of the undivided video image of a frame as currently processed video image ", and the time is true Stator unit is also described as " according to the frame per second of current video, determining previous frame video image and currently processed video figure The subelement of the interval time of picture ".

Function described herein can be executed at least partly by one or more hardware logic components.Example Such as, without limitation, the hardware logic component for the exemplary type that can be used include: field programmable gate array (FPGA), specially With integrated circuit (ASIC), Application Specific Standard Product (ASSP), system on chip (SOC), complex programmable logic equipment (CPLD) etc. Deng.

In the context of the disclosure, machine readable media can be tangible medium, may include or is stored for The program that instruction execution system, device or equipment are used or is used in combination with instruction execution system, device or equipment.Machine can Reading medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media can include but is not limited to electricity Son, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content any conjunction Suitable combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable meter Calculation machine disk, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM Or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage facilities or Any appropriate combination of above content.

According to one or more other embodiments of the present disclosure, example one provides a kind of video scene classification method, comprising:

The first frame video image of current video and the second frame video image are input to preset image segmentation mould respectively Type obtains first group of the video object corresponding with first frame video image, and corresponding with second frame video image Second group of the video object；Wherein, including the video object in first group of the video object in second group of the video object；

According to first group of the video object in first frame video image and second frame video image Image-region determines movement velocity of the first group of the video object in video image plane；

According to the movement velocity, determine first group of the video object in the undivided video of each frame of the current video Expection image-region in image, and according to the expected image-region, image is carried out to each undivided video image of frame Segmentation, obtains corresponding the video object；

According to the video object corresponding with each frame video image of the current video, the classification of the target video is determined As a result.

According to one or more other embodiments of the present disclosure, example two provides a kind of video scene classification method, in example On the basis of one video scene classification method, it is described according to first group of the video object first frame video image with And the image-region in second frame video image, determine movement of the first group of the video object in video image plane Speed, comprising:

Using optical flow method, regarded according to first group of the video object in first frame video image and second frame Image-region in frequency image determines movement velocity of the first group of the video object in video image plane.

According to one or more other embodiments of the present disclosure, example three provides a kind of video scene classification method, in example It is described according to the movement velocity on the basis of one video scene classification method, determine first group of the video object in institute The expection image-region in the undivided video image of each frame of current video is stated, and according to the expected image-region, to described Each undivided video image of frame carries out image segmentation, obtains corresponding the video object, comprising:

The undivided video image of a frame is successively obtained as currently processed video image；

It is regarded according to the movement velocity, the frame per second of the current video and first group of the video object in former frame Image-region in frequency image determines expection image district of the first group of the video object in the currently processed video image Domain；

Using with the matched frames images of each the video object in first group of the video object, according to the expected image district Domain is cut out and each matched expected image of the video object in the currently processed video image；

It will be input to preset Image Segmentation Model with each matched expected image of the video object, and obtain working as with described The corresponding the video object of pre-treatment video image；

It returns to execute and successively obtains operation of the undivided video image of a frame as currently processed video image, until completing Processing to the undivided video image of the whole of the current video.

According to one or more other embodiments of the present disclosure, example four provides a kind of video scene classification method, in example On the basis of three video scene classification method, according to the movement velocity, the frame per second of the current video and described first Image-region of the group the video object in previous frame video image, determines first group of the video object in the currently processed view Expection image-region in frequency image, comprising:

According to the frame per second of the current video, the previous frame video image and the currently processed video image are determined Interval time；

According to the interval time and the movement velocity, determine first group of the video object in the interval time Displacement；

According to image-region of the first group of the video object in previous frame video image and the displacement, determine Expection image-region of the first group of the video object in the currently processed video image.

According to one or more other embodiments of the present disclosure, example five provides a kind of video scene classification method, in example On the basis of one video scene classification method, basis video pair corresponding with each frame video image of the current video As determining the classification results of the target video, comprising:

According to the rule of correspondence of preset the video object and scene type, and video pair corresponding with each frame video image As determining scene type corresponding with each frame video image；

Count the corresponding video image frame number of each scene type；

Using the most scene type of video image frame number as the classification results of the target video.

According to one or more other embodiments of the present disclosure, example six provides a kind of video scene classification method, in example On the basis of one video scene classification method, respectively by the first frame video image of current video and the second frame video image It is input to before preset Image Segmentation Model, further includes:

Training sample set corresponding with each scene type is obtained, the training sample concentration includes corresponding with scene type Setting quantity image；

Neural network model is trained using the training sample set, obtains preset Image Segmentation Model.

According to one or more other embodiments of the present disclosure, example seven provides a kind of video scene classification method, in example On the basis of six video scene classification method, the image of setting quantity corresponding with scene type are as follows: original image and general The original image is according to the image obtained after preset image procossing rule process.

According to one or more other embodiments of the present disclosure, example eight provides a kind of video scene sorter, comprising:

First image segmentation module, for respectively that the first frame video image of current video and the second frame video image is defeated Enter to preset Image Segmentation Model, obtain first group of the video object corresponding with first frame video image, and with institute State the corresponding second group of the video object of the second frame video image；It wherein, include described first group in second group of the video object The video object in the video object；

Movement velocity determining module is used for according to first group of the video object in first frame video image and institute The image-region in the second frame video image is stated, determines movement speed of the first group of the video object in video image plane Degree；

Second image segmentation module, for determining that first group of the video object is worked as described according to the movement velocity Expection image-region in the undivided video image of each frame of preceding video, and according to the expected image-region, to each frame Undivided video image carries out image segmentation, obtains corresponding the video object；

Visual classification module, for determining according to the video object corresponding with each frame video image of the current video The classification results of the target video.

According to one or more other embodiments of the present disclosure, example nine provides a kind of mobile terminal, comprising:

One or more processing units；

Storage device, for storing one or more programs；

When one or more programs are executed by one or more processing units, so that one or more processing units are realized such as Any video scene classification method in example one to seven.

According to one or more other embodiments of the present disclosure, example ten provides a kind of computer readable storage medium, thereon It is stored with computer program, the video field as described in any in example one to seven is realized when which is executed by processor Scape classification method.

Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that the open scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from design disclosed above, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Although this is not construed as requiring these operations with institute in addition, depicting each operation using certain order The certain order that shows executes in sequential order to execute.Under certain environment, multitask and parallel processing may be advantageous 's.Similarly, although containing several specific implementation details in being discussed above, these are not construed as to this public affairs The limitation for the range opened.Certain features described in the context of individual embodiment can also be realized in combination single real It applies in example.On the contrary, the various features described in the context of single embodiment can also be individually or with any suitable The mode of sub-portfolio is realized in various embodiments.

Although having used specific to this theme of the language description of structure feature and/or method logical action, answer When understanding that theme defined in the appended claims is not necessarily limited to special characteristic described above or movement.On on the contrary, Special characteristic described in face and movement are only to realize the exemplary forms of claims.

Claims

1. a kind of video scene classification method characterized by comprising

The first frame video image of current video and the second frame video image are input to preset Image Segmentation Model respectively, obtained To first group of the video object corresponding with first frame video image, and with second frame video image corresponding second Group the video object；Wherein, including the video object in first group of the video object in second group of the video object；

According to image of the first group of the video object in first frame video image and second frame video image Region determines movement velocity of the first group of the video object in video image plane；

According to the movement velocity, determine first group of the video object in the undivided video image of each frame of the current video In expection image-region image segmentation is carried out to each undivided video image of frame and according to the expected image-region, Obtain corresponding the video object；

According to the video object corresponding with each frame video image of the current video, the classification knot of the target video is determined Fruit.

2. the method according to claim 1, wherein it is described according to first group of the video object described first Image-region in frame video image and second frame video image determines first group of the video object in video image Movement velocity in plane, comprising:

Using optical flow method, according to first group of the video object in first frame video image and the second frame video figure Image-region as in, determines movement velocity of the first group of the video object in video image plane.

3. determining described first group the method according to claim 1, wherein described according to the movement velocity Expection image-region of the video object in the undivided video image of each frame of the current video, and according to the expected image Region carries out image segmentation to each undivided video image of frame, obtains corresponding the video object, comprising:

According to the movement velocity, the frame per second of the current video and first group of the video object in former frame video figure Image-region as in, determines expection image-region of the first group of the video object in the currently processed video image；

Using with the matched frames images of each the video object in first group of the video object, according to the expected image-region, It is cut out in the currently processed video image and each matched expected image of the video object；

It will be input to preset Image Segmentation Model with each matched expected image of the video object, obtained and the current place Manage the corresponding the video object of video image；

It returns to execute and successively obtains operation of the undivided video image of a frame as currently processed video image, until completing to institute State the processing of the undivided video image of whole of current video.

4. according to the method described in claim 3, it is characterized in that, according to the movement velocity, the frame per second of the current video, And image-region of the first group of the video object in previous frame video image, determine first group of the video object in institute State the expection image-region in currently processed video image, comprising:

According to the frame per second of the current video, the interval of the previous frame video image Yu the currently processed video image is determined Time；

According to the interval time and the movement velocity, determine first group of the video object in the position of the interval time It moves；

According to image-region of the first group of the video object in previous frame video image and the displacement, determine described in Expection image-region of first group of the video object in the currently processed video image.

5. the method according to claim 1, wherein each frame video image of the basis and the current video Corresponding the video object determines the classification results of the target video, comprising:

According to the rule of correspondence of preset the video object and scene type, and the video object corresponding with each frame video image, Determine scene type corresponding with each frame video image；

Count the corresponding video image frame number of each scene type；

6. the method according to claim 1, wherein respectively by the first frame video image of current video and Two frame video images are input to before preset Image Segmentation Model, further includes:

Training sample set corresponding with each scene type is obtained, the training sample concentration includes sets corresponding with scene type The image of fixed number amount；

7. according to the method described in claim 6, it is characterized in that, the image of setting quantity corresponding with scene type are as follows: former Beginning image and by the original image according to the image obtained after preset image procossing rule process.

8. a kind of video scene sorter characterized by comprising

First image segmentation module, for being respectively input to the first frame video image of current video and the second frame video image Preset Image Segmentation Model, obtains first group of the video object corresponding with first frame video image, and with described The corresponding second group of the video object of two frame video images；It wherein, include first group of video in second group of the video object The video object in object；

Movement velocity determining module, for according to first group of the video object in first frame video image and described the Image-region in two frame video images determines movement velocity of the first group of the video object in video image plane；

Second image segmentation module, for determining that first group of the video object works as forward sight described according to the movement velocity Expection image-region in the undivided video image of each frame of frequency, and according to the expected image-region, each frame is not divided It cuts video image and carries out image segmentation, obtain corresponding the video object；

Visual classification module, described in determining according to the video object corresponding with each frame video image of the current video The classification results of target video.

9. a kind of mobile terminal, which is characterized in that the mobile terminal includes:

One or more processing units；

Storage device, for storing one or more programs；

When one or more of programs are executed by one or more of processing units, so that one or more of processing fill Set the video scene classification method realized as described in any in claim 1-7.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt The video scene classification method as described in any in claim 1-7 is realized when processor executes.