WO2020224127A1 - 视频流截取方法、装置及存储介质 - Google Patents
视频流截取方法、装置及存储介质 Download PDFInfo
- Publication number
- WO2020224127A1 WO2020224127A1 PCT/CN2019/103615 CN2019103615W WO2020224127A1 WO 2020224127 A1 WO2020224127 A1 WO 2020224127A1 CN 2019103615 W CN2019103615 W CN 2019103615W WO 2020224127 A1 WO2020224127 A1 WO 2020224127A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- picture
- video stream
- customer
- identity
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000012545 processing Methods 0.000 claims abstract description 27
- 238000010801 machine learning Methods 0.000 claims description 33
- 238000013527 convolutional neural network Methods 0.000 claims description 25
- 230000006403 short-term memory Effects 0.000 claims description 21
- 230000007787 long-term memory Effects 0.000 claims description 17
- 230000015654 memory Effects 0.000 claims description 17
- 239000000463 material Substances 0.000 abstract description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000011161 development Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 26
- 230000008569 process Effects 0.000 description 11
- 238000013135 deep learning Methods 0.000 description 6
- 238000010191 image analysis Methods 0.000 description 6
- 238000013507 mapping Methods 0.000 description 6
- 238000003062 neural network model Methods 0.000 description 6
- 210000002569 neuron Anatomy 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 230000001815 facial effect Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000013136 deep learning model Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 238000012827 research and development Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003631 expected effect Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 239000000872 buffer Substances 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/12—Fingerprints or palmprints
- G06V40/13—Sensors therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/12—Fingerprints or palmprints
- G06V40/1365—Matching; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to a video stream interception method, device, and computer-readable storage medium.
- the current payment method is completed by writing a set of image processing software in the background to turn on the camera, obtain the video stream and intercept the pictures. Because it is applied in the background, this series of processing is time-consuming and labor-intensive, and requires computer performance. It is also relatively high.
- This application provides a video stream interception method, electronic device, and computer-readable storage medium.
- the main purpose of the method is to open the camera device on the html front end to obtain the video, and use the canvas method to intercept the video and then perform the interception in the background. Recognition, so as to solve the problem that the current image processing must also be processed in the background, saving development workload, and reducing manpower and material resources.
- the present application provides an electronic device, which includes a memory, a processor, and a camera device, the memory includes a video stream interception program, and the video stream interception program is executed by the processor as follows step:
- the video stream data includes video stream information of each frame of image, and pictures are intercepted in the video stream information
- the intercepted picture is transmitted to the background, and the picture is recognized in the background to determine the identity of the customer.
- this application also provides a video stream interception method, the method includes:
- the video stream data includes video stream information of each frame of image, and pictures are intercepted in the video stream information
- the intercepted picture is transmitted to the background, and the background recognizes the palm print and the picture to determine the identity of the customer.
- the present application also provides a computer-readable storage medium, the computer-readable storage medium includes a video stream interception program, and when the video stream interception program is executed by a processor, the following steps are implemented:
- the video stream data includes video stream information of each frame of image, and pictures are intercepted in the video stream information
- the intercepted picture is transmitted to the background, and the picture is recognized in the background to determine the identity of the customer.
- the video stream interception method, device, and computer-readable storage medium proposed in this application use the camera device to obtain the video of the customer whose identity is to be confirmed by opening the camera device at the html front end; processing the video into a picture and using the canvas to analyze the video Generate video stream data.
- the video stream data includes the video stream information of each frame of image, and the picture is intercepted in the video stream information.
- FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a video stream interception method of this application;
- FIG. 2 is a schematic diagram of modules of a preferred embodiment of the video stream interception program in FIG. 1;
- FIG. 3 is a flowchart of a preferred embodiment of a method for intercepting a video stream of this application.
- This application provides a video stream interception method, which is applied to an electronic device 1.
- FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of a video stream interception method of this application.
- the electronic device 1 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
- the electronic device 1 includes a processor 12, a memory 11, a camera device 13, a network interface 14, and a communication bus 15.
- the memory 11 includes at least one type of readable storage medium.
- the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 11, and the like.
- the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
- the readable storage medium may also be the external memory 11 of the electronic device 1, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1. , Secure Digital (SD) card, Flash Card, etc.
- SD Secure Digital
- the readable storage medium of the memory 11 is usually used to store the video stream interception program 10 installed in the electronic device 1 and the like.
- the memory 11 can also be used to temporarily store data that has been output or will be output.
- the processor 12 may be a central processing unit (CPU), microprocessor or other data processing chip, used to run program codes or process data stored in the memory 11, for example, to perform video stream interception Procedure 10 etc.
- CPU central processing unit
- microprocessor or other data processing chip, used to run program codes or process data stored in the memory 11, for example, to perform video stream interception Procedure 10 etc.
- the imaging device 13 may be a part of the electronic device 1 or may be independent of the electronic device 1.
- the electronic device 1 is a terminal device with a camera such as a smart phone, a tablet computer, or a portable computer, and the camera device 13 is the camera of the electronic device 1.
- the electronic device 1 may be a server, and the camera device 13 is independent of the electronic device 1 and is connected to the electronic device 1 via a network.
- the camera device 13 is installed in a specific place, such as an office. , Monitoring area, real-time shooting of the target entering the specific place to obtain real-time images, and transmitting the real-time images obtained by shooting to the processor 12 through the network.
- the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.
- the communication bus 15 is used to realize the connection and communication between these components.
- FIG. 1 only shows the electronic device 1 with the components 11-15, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
- the electronic device 1 may also include a user interface.
- the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc.
- the user interface may also include a standard wired interface and a wireless interface.
- the electronic device 1 may also include a display, and the display may also be called a display screen or a display unit.
- the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device.
- OLED Organic Light-Emitting Diode
- the display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
- the electronic device 1 further includes a touch sensor.
- the area provided by the touch sensor for the user to perform a touch operation is called a touch area.
- the touch sensor described here may be a resistive touch sensor, a capacitive touch sensor, or the like.
- the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like.
- the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
- the area of the display of the electronic device 1 may be the same as or different from the area of the touch sensor.
- the display and the touch sensor are stacked to form a touch display screen. The device detects the touch operation triggered by the user based on the touch screen.
- the electronic device 1 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
- RF radio frequency
- the memory 11 as a computer storage medium may include an operating system and a video stream interception program 10; when the processor 12 executes the video stream interception program 10 stored in the memory 11, the implementation is as follows step:
- the video stream data includes video stream information of each frame of image, and a picture is intercepted in the video stream information
- the intercepted picture is transmitted to the background, and the picture is recognized in the background to determine the identity of the customer.
- the video is a palm print video or a face video; in this application, the palm print video or the face video can be obtained through the camera device 13.
- the palm print video is obtained, the palm of the customer is set according to the required gesture and If the part is within the range that the camera device can capture, the camera device 13 obtains a valid palmprint video of the customer; when obtaining the customer’s face video, the customer stands within the range that can be photographed in front of the camera device 13 according to the specified requirements.
- the camera device can capture the face video of the effective customer.
- the picture intercepted in the video stream information is a palm print picture
- the intercepted palmprint pictures are transmitted to the background, where the palmprint pictures are matched with the standard palmprint pictures in the background database to determine the identity of the customer.
- the picture intercepted in the video stream information is a face picture
- the intercepted face picture is transmitted to the backstage, where the face picture is matched with the standard face picture in the backstage database to determine the identity of the customer.
- the html front-end uses code to open the camera, and the underlying functions are integrated into the navigator object; that is, the html front-end uses navigator and video to open the camera; where the Navigator object contains information about the browser, and all browsing All browsers support this object; specifically, the attributes contained in the Navigator object describe the browser being used, and these attributes can be used for platform-specific configuration.
- the name of this object is obviously Netscape's Navigator browser, other browsers that implement JavaScript also support this object.
- the instance of the Navigator object is unique, and it can be referenced using the navigator property of the Window object.
- the camera device 13 When the camera device 13 captures a video, the camera device 13 sends the captured video to the processor 12.
- the processor 12 receives the video, it first buffers or parses the captured video stream, and compares it according to the canvas format.
- the video stream data is parsed to generate the video stream data of each frame of the video stream data; then a picture is intercepted in the video stream every 300ms (0.3 seconds), and the intercepted picture is converted to base64 and transmitted to the background .
- the specific process is as follows:
- one picture is taken every 300ms (0.3 seconds). In this application, a total of 10 pictures are taken, and more pictures can be taken as needed.
- the purpose of the above step is to set the size of the intercepted picture.
- the size of the intercepted picture is 800*600, and the time interval of the interception.
- the intercepted pictures are trained to determine the identity of the customer through a machine learning model, where the machine learning model includes a convolutional neural network and a long and short-term memory network.
- a long and short-term memory network to perform image analysis on the input palmprints of supermarket shopping customers, analyze whether the customer's palmprint matches the standard palmprint of the customer in the back-end database, so as to determine the identity of the customer through the palmprint information; or Perform image analysis on the input customer's face picture, analyze whether the customer's face picture matches the standard face picture of the customer in the back-end database, and determine the identity of the customer through the face information.
- Machine learning models do not need to be specified.
- deep learning models are used.
- deep learning is to build a network
- this network also refers to the deep learning neural network model
- deep learning can generally be summarized into the following three steps:
- the neural network model is a complex function composed of simple functions.
- a neural network model is designed, and then a computer is used to train to obtain some parameters from the given training data. These parameters ensure that the model can reach the design in the test set Expected effect, and has generalization ability.
- a cost function is defined according to the training data.
- the validity of the parameters can be evaluated through the cost function.
- the definition of a cost function is designed according to the specific task and actual training data.
- the third step is to find the best function based on the results of the previous two steps. For example, use gradient descent to find the best function.
- the deep learning model in this application may be CNN (Convolutional Neural Network, convolutional neural network) and LSTM (Long Short-Term Memory, long short-term memory network).
- the convolutional neural network CNN is a feed-forward neural network. Its artificial neurons can respond to a part of the surrounding units in the coverage area. It has excellent performance for large-scale image processing. It includes convolutional layer and pooling. Pooling layer.
- the basic structure of CNN includes two layers. One is the feature extraction layer. The input of each neuron is connected to the local receptive field of the previous layer, and the local features are extracted. Once the local feature is extracted, the positional relationship between it and other features is also determined; the second is the feature mapping layer, each computing layer of the network is composed of multiple feature maps, and each feature map is a plane. The weights of all neurons on the plane are equal.
- the feature mapping structure uses a sigmoid function with a small influencing function core as the activation function of the convolutional network, so that the feature mapping has displacement invariance.
- LSTM is a long and short-term memory network, a time recurrent neural network, suitable for processing and predicting important events with relatively long intervals and delays in time series.
- LSTM-based systems can learn to translate languages, control robots, image analysis, and documents Abstract, speech recognition, image recognition, handwriting recognition, control chat robots, predicting diseases, click-through rates and stocks, synthesizing music and other tasks.
- the long-short-term memory network is used to analyze the input palmprint pictures or facial pictures of supermarket shopping customers to determine whether the customer's palmprint matches the standard palmprint of the customer in the database, or determine the customer Whether the face picture of is matched with the standard face picture of the customer in the database, so as to determine the identity of the customer.
- the electronic device 1 proposed in the above embodiment opens the camera device from the HTML front end, and uses the camera device to obtain the video of the customer whose identity is to be confirmed; the video is processed into a picture, and the canvas is used to analyze the video to generate video stream data. Include the video stream information of each frame of image, and take pictures from the video stream information. Combining machine learning models to train and learn the intercepted pictures to determine the identity of customers, effectively improve the efficiency of research and development, reduce manpower and material resources, and computer performance requirements.
- the video stream interception program 10 may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by the processor 12 to complete the application.
- the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
- FIG. 2 it is a program module diagram of a preferred embodiment of the video stream interception program 10 in FIG. 1.
- the video stream interception program 10 can be divided into: a video acquisition module 110, a picture interception module 120, and a picture recognition module 130.
- the functions or operation steps implemented by the modules 110-130 are similar to the above, and will not be described in detail here. For example, for example:
- the video acquisition module 110 is configured to open a camera device at the html front end, and use the camera device to obtain a video of a customer whose identity is to be confirmed;
- the picture interception module 120 is configured to process the video into a picture, wherein the canvas is used to parse the video to generate video stream data.
- the video stream data includes video stream information of each frame of image. Take pictures from the information;
- the picture recognition module 130 is configured to transmit the intercepted picture to the background, and identify the picture in the background to determine the identity of the customer.
- this application also provides a video stream interception method.
- FIG. 3 it is a flowchart of a preferred embodiment of a method for intercepting a video stream of this application.
- the method can be executed by a device, and the device can be implemented by software and/or hardware.
- the video stream interception method includes: step S110-step S130.
- Step S110 Turn on the camera device at the html front end, and use the camera device to obtain the video of the customer whose identity is to be confirmed.
- the html front end uses navigator and video to turn on the camera device.
- the html front end uses code to open the camera, and the underlying functions are integrated into the navigator object.
- the Navigator object contains information about the browser, and all browsers support this object.
- the video is a palm print video or a face video; in this application, a palm print video or a face video can be obtained through a camera device.
- the palm print video is obtained, the palm of the customer is set according to the required gestures and parts Within the range that the camera device can capture, the camera device captures the effective palmprint video of the customer; when the customer’s facial video is captured, the customer stands within the range that can be captured in front of the camera device according to the specified requirements, so that the camera device Capture a video of the face of a valid customer.
- the picture intercepted in the video stream information is a palm print picture
- the intercepted palmprint pictures are transmitted to the background, where the palmprint pictures are matched with the standard palmprint pictures in the background database to determine the identity of the customer.
- the picture intercepted in the video stream information is a face picture
- the intercepted face picture is transmitted to the backstage, where the face picture is matched with the standard face picture in the backstage database to determine the identity of the customer.
- Step S130 processing the video into a picture, wherein the canvas is used to parse the video to generate video stream data, the video stream data includes the video stream information of each frame of image, and the picture is intercepted in the video stream information .
- the canvas method is used to process the video captured by the camera into a picture, in which one picture is taken every 300ms.
- the captured video stream is first buffered or parsed, and the video stream data is parsed according to the canvas format to generate the video stream data of each frame of the video stream data; then according to every 300ms ( 0.3 seconds) Take a picture in the video stream.
- the step of processing the video into a picture includes:
- one picture is taken every 300ms (0.3 seconds). In this application, a total of 10 pictures are taken, and more pictures can be taken as needed.
- the purpose of the above step is to set the size of the intercepted picture.
- the size of the intercepted picture is 800*600, and the time interval of the interception.
- Step S130 the intercepted picture is transmitted to the background, the picture is recognized in the background, and the identity of the customer is determined.
- the image is trained to determine the identity of the customer through a machine learning model, where the machine learning model includes a convolutional neural network and a long- and short-term memory network.
- a long and short-term memory network to perform image analysis on the input palmprints of supermarket shopping customers, analyze whether the customer's palmprint matches the standard palmprint of the customer in the back-end database, so as to determine the identity of the customer through the palmprint information; or Perform image analysis on the input customer's face picture, analyze whether the customer's face picture matches the standard face picture of the customer in the back-end database, and determine the identity of the customer through the face information.
- Machine learning models do not need to be specified.
- deep learning models are used.
- deep learning is to build a network
- this network also refers to the deep learning neural network model
- deep learning can generally be summarized into the following three steps:
- the neural network model is a complex function composed of simple functions.
- a neural network model is designed, and then a computer is used to train to obtain some parameters from the given training data. These parameters ensure that the model can reach the design in the test set Expected effect, and has generalization ability.
- a cost function is defined according to the training data.
- the validity of the parameters can be evaluated through the cost function.
- the definition of a cost function is designed according to the specific task and actual training data.
- the third step is to find the best function based on the results of the previous two steps, such as using gradient descent to find the best function.
- the deep learning model in this application may be CNN (Convolutional Neural Network, convolutional neural network) and LSTM (Long Short-Term Memory, long short-term memory network).
- the convolutional neural network CNN is a feed-forward neural network. Its artificial neurons can respond to a part of the surrounding units in the coverage area. It has excellent performance for large-scale image processing. It includes convolutional layer and pooling. Pooling layer.
- the basic structure of CNN includes two layers. One is the feature extraction layer. The input of each neuron is connected to the local receptive field of the previous layer, and the local features are extracted. Once the local feature is extracted, the positional relationship between it and other features is also determined; the second is the feature mapping layer, each computing layer of the network is composed of multiple feature maps, and each feature map is a plane. The weights of all neurons on the plane are equal.
- the feature mapping structure uses a sigmoid function with a small influencing function core as the activation function of the convolutional network, so that the feature mapping has displacement invariance.
- LSTM is a long and short-term memory network, a time recurrent neural network, suitable for processing and predicting important events with relatively long intervals and delays in time series.
- LSTM-based systems can learn to translate languages, control robots, image analysis, and documents Abstract, speech recognition, image recognition, handwriting recognition, control chat robots, predicting diseases, click-through rates and stocks, synthesizing music and other tasks.
- the long-short-term memory network is used to analyze the input palmprint pictures or facial pictures of supermarket shopping customers to determine whether the customer's palmprint matches the standard palmprint of the customer in the database, or determine the customer Whether the face picture of is matched with the standard face picture of the customer in the database, so as to determine the identity of the customer.
- the video stream interception method proposed in the above embodiment uses the camera device from the HTML front end to obtain the video of the customer whose identity is to be confirmed; the video is processed into pictures, and the canvas is used to analyze the video to generate video stream data.
- the data includes the video stream information of each frame of image, and the picture is intercepted in the video stream information. Combining machine learning models to train and learn the intercepted pictures to determine the identity of customers, effectively improve the efficiency of research and development, reduce manpower and material resources, and computer performance requirements.
- an embodiment of the present application also proposes a computer-readable storage medium, the computer-readable storage medium includes a video stream interception program, and when the video stream interception program is executed by a processor, the following operations are implemented:
- the video stream data includes video stream information of each frame of image, and pictures are intercepted in the video stream information
- the intercepted picture is transmitted to the background, and the picture is recognized in the background to determine the identity of the customer.
- the video is a palm print video or a face video; wherein,
- the picture intercepted in the video stream information is a palm print picture
- the intercepted palmprint pictures are transmitted to the background, where the palmprint pictures are matched with the standard palmprint pictures in the background database to determine the identity of the customer.
- the picture intercepted in the video stream information is a face picture
- the intercepted face picture is transmitted to the background, where the face picture is matched with the standard face picture in the background database to determine the identity of the customer.
- the step of processing the video into a picture includes:
- the drawn and intercepted pictures are converted into base64 and transmitted to the background.
- the machine learning model is used to train the image to determine the identity of the customer, wherein the machine learning model includes a convolutional neural network and a long- and short-term memory network.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
本申请涉及人工智能技术领域,提出一种视频流截取方法、装置及存储介质,其中的方法包括:在html前端打开摄像装置,利用摄像装置获取待确认身份的客户的视频;将视频处理成图片,其中,利用canvas对视频进行解析生成视频流数据,视频流数据包括每一帧图像的视频流信息,在视频流信息中截取图片;将截取的图片传输至后台,在后台对图片进行识别,确定客户的身份。本申请通过在html前端打开摄像装置获取视频,并利用canvas方式将视频截取图片,然后图片在后台进行识别,从而解决目前图像处理也必须在后台处理的问题,节省了开发工作量,并降低了人力和物力。
Description
本申请要求于2019年5月5日提交中国专利局,申请号为201910367384.3、发明名称为“视频流截取方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能技术领域,尤其涉及一种视频流截取方法、装置及计算机可读存储介质。
目前,在无人超市中买东西付款时,需要采集视频信息进行匹配身份,将采集到的视频流截取图片,并将截图的图片进行处理,然后与数据库中的标准图片进行匹配,匹配成功后,需要支付的货款就直接支付到无人超市;这种支付方式更加方便和快速。
但是,目前这种支付方式是由后台编写一套图像处理软件来开启摄像头、获取视频流和截取图片来完成的,由于是在后台应用,这一系列处理耗时耗力,对电脑的性能要求也比较高。
为解决上述问题,亟需一种新的视频流截取方法。
发明内容
本申请提供一种视频流截取方法、电子装置及计算机可读存储介质,其主要目的在于通过在html前端打开摄像装置获取视频,并利用canvas方式将视频截取图片,然后将截取的图片在后台进行识别,从而解决目前的图像处理也必须在后台处理的问题,节省了开发工作量,并降低了人力和物力。
为实现上述目的,本申请提供一种电子装置,该装置包括:存储器、处理器及摄像装置,所述存储器中包括视频流截取程序,所述视频流截取程序被所述处理器执行时实现如下步骤:
在html前端打开摄像装置,利用所述摄像装置获取待确认身份的客户的 视频;
将所述视频处理成图片,其中,利用canvas对所述视频进行解析生成视频流数据,所述视频流数据包括每一帧图像的视频流信息,在所述视频流信息中截取图片;
将截取的所述图片传输至后台,在所述后台对所述图片进行识别,确定客户的身份。
此外,为实现上述目的,本申请还提供一种视频流截取方法,所述方法包括:
在html前端打开摄像装置,利用所述摄像装置获取待确认身份的客户的视频;
将所述视频处理成图片,其中,利用canvas对所述视频进行解析生成视频流数据,所述视频流数据包括每一帧图像的视频流信息,在所述视频流信息中截取图片;
将截取的所述图片传输至后台,所述后台对所述掌纹和所述图片进行识别,确定客户的身份。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中包括视频流截取程序,所述视频流截取程序被处理器执行时,实现如下步骤:
在html前端打开摄像装置,利用所述摄像装置获取待确认身份的客户的视频;
将所述视频处理成图片,其中,利用canvas对所述视频进行解析生成视频流数据,所述视频流数据包括每一帧图像的视频流信息,在所述视频流信息中截取图片;
将截取的所述图片传输至后台,在所述后台对所述图片进行识别,确定客户的身份。
本申请提出的视频流截取方法、装置及计算机可读存储介质,通过从在html前端打开摄像装置,利用摄像装置获取待确认身份的客户的视频;将视频处理成图片,利用canvas对视频进行解析生成视频流数据,视频流数据包括每一帧图像的视频流信息,在视频流信息中截取图片。通过结合机器学习模型对截取的图片进行训练学习确定客户的身份,有效提高研发的工作效率, 降低人力和物力,以及对电脑性能要求。
图1为本申请视频流截取方法较佳实施例的应用环境示意图;
图2为图1中视频流截取程序较佳实施例的模块示意图;
图3为本申请视频流截取方法较佳实施例的流程图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供一种视频流截取方法,应用于一种电子装置1。参照图1所示,为本申请视频流截取方法较佳实施例的应用环境示意图。
在本实施例中,电子装置1可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端设备。
该电子装置1包括:处理器12、存储器11、摄像装置13、网络接口14及通信总线15。
存储器11包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器11等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子装置1的内部存储单元,例如该电子装置1的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子装置1的外部存储器11,例如所述电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
在本实施例中,所述存储器11的可读存储介质通常用于存储安装于所述电子装置1的视频流截取程序10等。所述存储器11还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代 码或处理数据,例如执行视频流截取程序10等。
摄像装置13既可以是所述电子装置1的一部分,也可以独立于电子装置1。在一些实施例中,所述电子装置1为智能手机、平板电脑、便携计算机等具有摄像头的终端设备,则所述摄像装置13即为所述电子装置1的摄像头。在其他实施例中,所述电子装置1可以为服务器,所述摄像装置13独立于该电子装置1、与该电子装置1通过网络连接,例如,该摄像装置13安装于特定场所,如办公场所、监控区域,对进入该特定场所的目标实时拍摄得到实时图像,通过网络将拍摄得到的实时图像传输至处理器12。
网络接口14可选地可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子装置1与其他电子设备之间建立通信连接。
通信总线15用于实现这些组件之间的连接通信。
图1仅示出了具有组件11-15的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
可选地,该电子装置1还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。
可选地,该电子装置1还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子装置1中处理的信息以及用于显示可视化的用户界面。
可选地,该电子装置1还包括触摸传感器。所述触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外,这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,所述触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,所述触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。
此外,该电子装置1的显示器的面积可以与所述触摸传感器的面积相同,也可以不同。可选地,将显示器与所述触摸传感器层叠设置,以形成触摸显示屏。该装置基于触摸显示屏侦测用户触发的触控操作。
可选地,该电子装置1还可以包括射频(Radio Frequency,RF)电路, 传感器、音频电路等等,在此不再赘述。
在图1所示的装置实施例中,作为一种计算机存储介质的存储器11中可以包括操作系统、以及视频流截取程序10;处理器12执行存储器11中存储的视频流截取程序10时实现如下步骤:
在html前端打开摄像装置13,利用所述摄像装置13获取待确认身份的客户的视频;
将所述视频处理成图片,利用canvas对所述视频进行解析生成视频流数据,所述视频流数据包括每一帧图像的视频流信息,在所述视频流信息中截取图片;
将截取的所述图片传输至后台,在所述后台对所述图片进行识别,确定客户的身份。
其中,所述视频为掌纹视频或者脸部视频;在本申请中,可以通过摄像装置13获取掌纹视频或者脸部视频,当获取掌纹视频时,将客户的手掌心按照要求的手势和部位在摄像装置所能拍摄的范围内,摄像装置13获取有效的客户的掌纹视频;当获取客户的脸部视频时,客户按照规定的要求站在摄像装置13前方所能拍摄的范围内,使得摄像装置拍摄到有效的客户的脸部视频。
当所述视频为掌纹视频时,在所述视频流信息中截取的图片为掌纹图片;
将截取到的所述掌纹图片传输至后台,在所述后台,将所述掌纹图片与后台数据库中的标准掌纹图片进匹配,确定客户的身份。
当所述视频为脸部视频时,在所述视频流信息中截取的图片为脸部图片;
将截取到的所述脸部图片传输至后台,在所述后台,将所述脸部图片与后台数据库中的标准脸部图片进匹配,确定客户的身份。
在本申请中,在html前端使用代码打开摄像头,底层功能集成到navigator对象中;即:在所述html前端,利用navigator和video打开摄像装置;其中,Navigator对象包含有关浏览器的信息,所有浏览器都支持该对象;具体地,Navigator对象包含的属性描述了正在使用的浏览器,可以使用这些属性进行平台专用的配置。虽然这个对象的名称显而易见的是Netscape的Navigator浏览器,但其他实现了JavaScript的浏览器也支持这个对象。Navigator对象的实例是唯一的,可以用Window对象的navigator属性来引用它。
当摄像装置13拍摄到视频时,摄像装置13将拍摄到的视频发送到处理 器12,当处理器12接收到该视频后,首先对拍摄到的视频流进行缓冲或者解析,并按照canvas制式对视频流数据进行解析,生成视频流数据对应的每一帧图像的视频流数据;然后按照每隔300ms(0.3秒)在视频流中截取一张图片,将截取的图片转成base64形式传输至后台。
具体地,在视频处理过程中,具体过程如下:
首先使用canvas.getContext('2d')创建画布;其中,创建画布是为了之后的截取图片做准备,截到图片放在画布中展示,然后转成相应的格式(png或jpg)。
然后使用context.drawImage(video,0,0,800,600);画出当前video内展示的图片,图片大小为800*600;
其中,每隔300ms(0.3秒)截取一张图片,在本申请中共截取10张,可以根据需要截取更多的图片。
上述这一步骤的目的是为了设置了截取的图片的规格大小,在视频流中,截取到的图片的大小都是800*600,以及截取的时间间隔。
在本申请中,通过机器学习模型对截取的图片进行训练学习确定客户的身份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
具体地,采用长短期记忆网络对输入的超市购物客户的掌纹进行图像分析,分析客户的掌纹与后端的数据库的客户的标准掌纹是否匹配,从而通过掌纹信息确定客户的身份;或者对输入的客户的脸部图片进行图像分析,分析获取客户的脸部图片与后端数据库中的客户的标准脸部图片是否匹配,从而通过脸部信息确定客户的身份。
通过机器学习模型识对掌纹图片或者脸部图片进行学习,机器学习模型可以不做特指,目前用到的是深度学习模型。
其中,深度学习就是要构建一个网络,这个网络也就是指深度学习神经网络模型,深度学习一般可以归纳为以下所示的3个步骤:
第一步骤,神经网络模型是一个有简单函数组成的复杂的函数,通常设计一个神经网络模型,然后用计算机从给定的训练数据中训练得到一些参数,这些参数保证模型能够在测试集中达到设计预期的效果,并且具有泛化能力。
第二步骤,根据训练数据定义一个代价函数,通过代价函数可以评估参数有效性,定义一个代价函数则是根据具体任务和实际的训练数据进行设计。
第三个步骤,根据前面两步骤的结果找出最佳的函数,例如用梯度下降 的方法找出这个最佳的函数。
其中,在本申请中的深度学习模型可以为CNN(Convolutional Neural Network,卷积神经网络)和LSTM(Long Short-Term Memory,长短期记忆网络)。
其中,卷积神经网络CNN是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现,它包括卷积层(convolutional layer)和池化层(pooling layer)。
CNN的基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;其二是特征映射层,网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等。特征映射结构采用影响函数核小的sigmoid函数作为卷积网络的激活函数,使得特征映射具有位移不变性。
LSTM(是长短期记忆网络,是一种时间递归神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。基于LSTM的系统可以学习翻译语言、控制机器人、图像分析、文档摘要、语音识别图像识别、手写识别、控制聊天机器人、预测疾病、点击率和股票、合成音乐等任务。
需要说明的是,上述实施例中采用长短期记忆网络对输入的超市购物客户的掌纹图片或者脸部图片进行分析,确定客户的掌纹是否与数据库的客户的标准掌纹匹配,或者确定客户的脸部图片是否与数据库中的客户的脸部标准图片匹配,从而确定客户的身份。
上述实施例提出的电子装置1,通过从在html前端打开摄像装置,利用摄像装置获取待确认身份的客户的视频;将视频处理成图片,利用canvas对视频进行解析生成视频流数据,视频流数据包括每一帧图像的视频流信息,在视频流信息中截取图片。通过结合机器学习模型对截取的图片进行训练学习确定客户的身份,有效提高研发的工作效率,降低人力和物力,以及对电脑性能要求。
在其他实施例中,视频流截取程序10还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由处理器12执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。 参照图2所示,为图1中视频流截取程序10较佳实施例的程序模块图。所述视频流截取程序10可以被分割为:视频获取模块110、图片截取模块120及图片识别模块130。所述模块110-130所实现的功能或操作步骤均与上文类似,此处不再详述,示例性地,例如其中:
视频获取模块110,用于在html前端打开摄像装置,利用所述摄像装置获取待确认身份的客户的视频;
图片截取模块120,用于将所述视频处理成图片,其中,利用canvas对所述视频进行解析生成视频流数据,所述视频流数据包括每一帧图像的视频流信息,在所述视频流信息中截取图片;
图片识别模块130,用于将截取的所述图片传输至后台,在所述后台对所述图片进行识别,确定客户的身份。
此外,本申请还提供一种视频流截取方法。参照图3所示,为本申请视频流截取方法较佳实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,视频流截取方法包括:步骤S110-步骤S130。
步骤S110,在html前端打开摄像装置,利用所述摄像装置获取待确认身份的客户的视频。
在所述html前端,利用navigator和video打开摄像装置。在本申请中,在html前端使用代码打开摄像头,底层功能集成到navigator对象中。其中,Navigator对象包含有关浏览器的信息,所有浏览器都支持该对象。
其中具体用到的代码如下:
通过上述代码在html前端,利用navigator和video打开摄像装置。
其中,所述视频为掌纹视频或者脸部视频;在本申请中,可以通过摄像装置获取掌纹视频或者脸部视频,当获取掌纹视频时,将客户的手掌心按照要求的手势和部位在摄像装置所能拍摄的范围内,摄像装置获取有效的客户的掌纹视频;当获取客户的脸部视频时,客户按照规定的要求站在摄像装置前方所能拍摄的范围内,使得摄像装置拍摄到有效的客户的脸部视频。
当所述视频为掌纹视频时,在所述视频流信息中截取的图片为掌纹图片;
将截取到的所述掌纹图片传输至后台,在所述后台,将所述掌纹图片与后台数据库中的标准掌纹图片进匹配,确定客户的身份。
当所述视频为脸部视频时,在所述视频流信息中截取的图片为脸部图片;
将截取到的所述脸部图片传输至后台,在所述后台,将所述脸部图片与后台数据库中的标准脸部图片进匹配,确定客户的身份。步骤S130,将所述视频处理成图片,其中,利用canvas对所述视频进行解析生成视频流数据,所述视频流数据包括每一帧图像的视频流信息,在所述视频流信息中截取图片。
在本申请中,采用canvas方式将摄像头拍摄到的视频处理成图片,其中,每隔300ms截取一张图片。
在视频处理过程中,首先对拍摄到的视频流进行缓冲或者解析,并按照canvas制式对视频流数据进行解析,生成视频流数据对应的每一帧图像的视频流数据;然后按照每隔300ms(0.3秒)在视频流中截取一张图片。
将所述视频处理成图片的步骤包括:
首先使用canvas.getContext('2d')创建画布;其中,创建画布是为了之后的截取图片做准备,截到图片放在画布中展示,然后转成相应的格式(png或jpg)。
然后使用context.drawImage(video,0,0,800,600);画出当前video内展示的图片,图片大小为800*600,也就是说,利用context.drawImage按照预设 的规定画出当前在所述画布中展示的截取的图片。
其中,每隔300ms(0.3秒)截取一张图片,在本申请中共截取10张,可以根据需要截取更多的图片。
上述这一步骤的目的是为了设置了截取的图片的规格大小,在视频流中,截取到的图片的大小都是800*600,以及截取的时间间隔。
采用如下的代码截取10张图片:
最后,将截取的图片转成base64形式传输至后台。
步骤S130,将截取的所述图片传输至后台,在所述后台对所述图片进行识别,确定客户的身份。
在本申请中,通过机器学习模型对所述图片进行训练学习确定客户的身 份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
具体地,采用长短期记忆网络对输入的超市购物客户的掌纹进行图像分析,分析客户的掌纹与后端的数据库的客户的标准掌纹是否匹配,从而通过掌纹信息确定客户的身份;或者对输入的客户的脸部图片进行图像分析,分析获取客户的脸部图片与后端数据库中的客户的标准脸部图片是否匹配,从而通过脸部信息确定客户的身份。
通过机器学习模型识对掌纹图片和脸部图片进行学习,机器学习模型可以不做特指,目前用到的是深度学习模型。
其中,深度学习就是要构建一个网络,这个网络也就是指深度学习神经网络模型,深度学习一般可以归纳为以下所示的3个步骤:
第一步骤,神经网络模型是一个有简单函数组成的复杂的函数,通常设计一个神经网络模型,然后用计算机从给定的训练数据中训练得到一些参数,这些参数保证模型能够在测试集中达到设计预期的效果,并且具有泛化能力。
第二步骤,根据训练数据定义一个代价函数,通过代价函数可以评估参数有效性,定义一个代价函数则是根据具体任务和实际的训练数据进行设计。
第三个步骤,根据前面两步骤的结果找出最佳的函数,例如用梯度下降的方法找出这个最佳的函数。
其中,在本申请中的深度学习模型可以为CNN(Convolutional Neural Network,卷积神经网络)和LSTM(Long Short-Term Memory,长短期记忆网络)。
其中,卷积神经网络CNN是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现,它包括卷积层(convolutional layer)和池化层(pooling layer)。
CNN的基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;其二是特征映射层,网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等。特征映射结构采用影响函数核小的sigmoid函数作为卷积网络的激活函数,使得特征映射具有位移不变性。
LSTM(是长短期记忆网络,是一种时间递归神经网络,适合于处理和预 测时间序列中间隔和延迟相对较长的重要事件。基于LSTM的系统可以学习翻译语言、控制机器人、图像分析、文档摘要、语音识别图像识别、手写识别、控制聊天机器人、预测疾病、点击率和股票、合成音乐等任务。
需要说明的是,上述实施例中采用长短期记忆网络对输入的超市购物客户的掌纹图片或者脸部图片进行分析,确定客户的掌纹是否与数据库的客户的标准掌纹匹配,或者确定客户的脸部图片是否与数据库中的客户的脸部标准图片匹配,从而确定客户的身份。
上述实施例提出的视频流截取方法,通过从在html前端打开摄像装置,利用摄像装置获取待确认身份的客户的视频;将视频处理成图片,利用canvas对视频进行解析生成视频流数据,视频流数据包括每一帧图像的视频流信息,在视频流信息中截取图片。通过结合机器学习模型对截取的图片进行训练学习确定客户的身份,有效提高研发的工作效率,降低人力和物力,以及对电脑性能要求。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质中包括视频流截取程序,所述视频流截取程序被处理器执行时实现如下操作:
在html前端打开摄像装置,利用所述摄像装置获取待确认身份的客户的视频;
将所述视频处理成图片,其中,利用canvas对所述视频进行解析生成视频流数据,所述视频流数据包括每一帧图像的视频流信息,在所述视频流信息中截取图片;
将截取的所述图片传输至后台,在所述后台对所述图片进行识别,确定客户的身份。
优选地,所述视频为掌纹视频或者脸部视频;其中,
当所述视频为掌纹视频时,在所述视频流信息中截取的图片为掌纹图片;
将截取到的所述掌纹图片传输至后台,在所述后台,将所述掌纹图片与后台数据库中的标准掌纹图片进匹配,确定客户的身份。
优选地,当所述视频为脸部视频时,在所述视频流信息中截取的图片为脸部图片;
将截取到的所述脸部图片传输至后台,在所述后台,将所述脸部图片与 后台数据库中的标准脸部图片进匹配,确定客户的身份。
优选地,所述将所述视频处理成图片的步骤包括:
利用canvas.getContext创建画布,将截取的图片放置在所述画布中展示;
利用context.drawImage按照预设的规定画出当前在所述画布中展示的截取的图片;
将画出的截取的图片转换成base64形式传输至所述后台。
优选地,所通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
本申请之计算机可读存储介质的具体实施方式与上述视频流截取方法、电子装置的具体实施方式大致相同,在此不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。
Claims (20)
- 一种视频流截取方法,应用于电子装置,其特征在于,所述方法包括:在html前端打开摄像装置,利用所述摄像装置获取待确认身份的客户的视频;将所述视频处理成图片,其中,利用canvas对所述视频进行解析生成视频流数据,所述视频流数据包括每一帧图像的视频流信息,在所述视频流信息中截取图片;将截取的所述图片传输至后台,在所述后台对所述图片进行识别,确定客户的身份。
- 根据权利要求1所述的视频流截取方法,其特征在于,所述视频为掌纹视频或者脸部视频;其中,当所述视频为掌纹视频时,在所述视频流信息中截取的图片为掌纹图片;将截取到的所述掌纹图片传输至后台,在所述后台,将所述掌纹图片与后台数据库中的标准掌纹图片进匹配,确定客户的身份。
- 根据权利要求2所述的视频流截取方法,其特征在于,当所述视频为脸部视频时,在所述视频流信息中截取的图片为脸部图片;将截取到的所述脸部图片传输至后台,在所述后台,将所述脸部图片与后台数据库中的标准脸部图片进匹配,确定客户的身份。
- 根据权利要求1所述的视频流截取方法,其特征在于,所述将所述视频处理成图片的步骤包括:利用canvas.getContext创建画布,将截取的图片放置在所述画布中展示;利用context.drawImage按照预设的规定画出当前在所述画布中展示的截取的图片;将画出的截取的图片转换成base64形式传输至所述后台。
- 根据权利要求1所述的视频流截取方法,其特征在于,通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
- 根据权利要求2所述的视频流截取方法,其特征在于,通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所 述机器学习模型包括卷积神经网络和长短期记忆网络。
- 根据权利要求3所述的视频流截取方法,其特征在于,通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
- 根据权利要求4所述的视频流截取方法,其特征在于,通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
- 一种电子装置,其特征在于,该电子装置包括:存储器、处理器及摄像装置,所述存储器中包括视频流截取程序,所述视频流截取程序被所述处理器执行时实现如下步骤:在html前端打开摄像装置,利用所述摄像装置获取待确认身份的客户的视频;将所述视频处理成图片,其中,利用canvas对所述视频进行解析生成视频流数据,所述视频流数据包括每一帧图像的视频流信息,在所述视频流信息中截取图片;将截取的所述图片传输至后台,在所述后台对所述图片进行识别,确定客户的身份。
- 根据权利要求9所述的电子装置,其特征在于,所述视频为掌纹视频或者脸部视频;其中,当所述视频为掌纹视频时,在所述视频流信息中截取的图片为掌纹图片;将截取到的所述掌纹图片传输至后台,在所述后台,将所述掌纹图片与后台数据库中的标准掌纹图片进匹配,确定客户的身份。
- 根据权利要求9所述的电子装置,其特征在于,所述将所述视频处理成图片的步骤包括:利用canvas.getContext创建画布,将截取的图片放置在所述画布中展示;利用context.drawImage按照预设的规定画出当前在所述画布中展示的截取的图片;将画出的截取的图片转换成base64形式传输至所述后台。
- 根据权利要求9所述的电子装置,其特征在于,通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所 述机器学习模型包括卷积神经网络和长短期记忆网络。
- 根据权利要求10所述的电子装置,其特征在于,通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
- 根据权利要求11所述的电子装置,其特征在于,通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中包括视频流截取程序,所述视频流截取程序被处理器执行时,实现如下步骤:在html前端打开摄像装置,利用所述摄像装置获取待确认身份的客户的视频;将所述视频处理成图片,其中,利用canvas对所述视频进行解析生成视频流数据,所述视频流数据包括每一帧图像的视频流信息,在所述视频流信息中截取图片;将截取的所述图片传输至后台,在所述后台对所述图片进行识别,确定客户的身份。
- 根据权利要求15所述的计算机可读存储介质,其特征在于,所述视频为掌纹视频或者脸部视频;其中,当所述视频为掌纹视频时,在所述视频流信息中截取的图片为掌纹图片;将截取到的所述掌纹图片传输至后台,在所述后台,将所述掌纹图片与后台数据库中的标准掌纹图片进匹配,确定客户的身份。当所述视频为脸部视频时,在所述视频流信息中截取的图片为脸部图片;将截取到的所述脸部图片传输至后台,在所述后台,将所述脸部图片与后台数据库中的标准脸部图片进匹配,确定客户的身份。
- 根据权利要求16所述的计算机可读存储介质,其特征在于,所述将所述视频处理成图片的步骤包括:利用canvas.getContext创建画布,将截取的图片放置在所述画布中展示;利用context.drawImage按照预设的规定画出当前在所述画布中展示的截取的图片;将画出的截取的图片转换成base64形式传输至所述后台。
- 根据权利要求15所述的计算机可读存储介质,其特征在于,通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
- 根据权利要求16所述的计算机可读存储介质,其特征在于,通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
- 根据权利要求17所述的计算机可读存储介质,其特征在于,通过机器学习模型对所述图片进行训练学习确定客户的身份,其中,所述机器学习模型包括卷积神经网络和长短期记忆网络。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910367384.3 | 2019-05-05 | ||
CN201910367384.3A CN110267095A (zh) | 2019-05-05 | 2019-05-05 | 视频流截取方法、装置及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020224127A1 true WO2020224127A1 (zh) | 2020-11-12 |
Family
ID=67914144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/103615 WO2020224127A1 (zh) | 2019-05-05 | 2019-08-30 | 视频流截取方法、装置及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110267095A (zh) |
WO (1) | WO2020224127A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113763671A (zh) * | 2021-09-08 | 2021-12-07 | 升维科技有限公司 | 一种建筑监控系统、方法、计算机设备及存储介质 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111901630A (zh) * | 2020-06-17 | 2020-11-06 | 视联动力信息技术股份有限公司 | 一种数据传输方法、装置、终端设备和存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140049593A1 (en) * | 2012-08-14 | 2014-02-20 | Avaya Inc. | Protecting Privacy of a Customer and an Agent Using Face Recognition in a Video Contact Center Environment |
CN108320345A (zh) * | 2018-05-04 | 2018-07-24 | 珠海横琴盛达兆业科技投资有限公司 | 一种基于百度人脸识别api的bs架构实现智能人脸考勤的方法 |
CN108345454A (zh) * | 2018-04-16 | 2018-07-31 | 珠海横琴盛达兆业科技投资有限公司 | 基于clmtrackr的药店管理系统调用html5视频实时自动采集人脸图像数据的方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107067429A (zh) * | 2017-03-17 | 2017-08-18 | 徐迪 | 基于深度学习的人脸三维重建和人脸替换的视频编辑系统及方法 |
-
2019
- 2019-05-05 CN CN201910367384.3A patent/CN110267095A/zh active Pending
- 2019-08-30 WO PCT/CN2019/103615 patent/WO2020224127A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140049593A1 (en) * | 2012-08-14 | 2014-02-20 | Avaya Inc. | Protecting Privacy of a Customer and an Agent Using Face Recognition in a Video Contact Center Environment |
CN108345454A (zh) * | 2018-04-16 | 2018-07-31 | 珠海横琴盛达兆业科技投资有限公司 | 基于clmtrackr的药店管理系统调用html5视频实时自动采集人脸图像数据的方法 |
CN108320345A (zh) * | 2018-05-04 | 2018-07-24 | 珠海横琴盛达兆业科技投资有限公司 | 一种基于百度人脸识别api的bs架构实现智能人脸考勤的方法 |
Non-Patent Citations (1)
Title |
---|
WU, CHUANWEN: "The research and application of face recognition technology in the entrance examination system", CHINA MASTER’S THESES FULL-TEXT DATABASE, 15 October 2015 (2015-10-15), pages 1 - 79, XP055751901 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113763671A (zh) * | 2021-09-08 | 2021-12-07 | 升维科技有限公司 | 一种建筑监控系统、方法、计算机设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110267095A (zh) | 2019-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109961009B (zh) | 基于深度学习的行人检测方法、系统、装置及存储介质 | |
US11908239B2 (en) | Image recognition network model training method, image recognition method and apparatus | |
US9721156B2 (en) | Gift card recognition using a camera | |
WO2021012494A1 (zh) | 基于深度学习的人脸识别方法、装置及计算机可读存储介质 | |
US9098888B1 (en) | Collaborative text detection and recognition | |
WO2019033571A1 (zh) | 面部特征点检测方法、装置及存储介质 | |
US11062124B2 (en) | Face pose detection method, device and storage medium | |
TWI712980B (zh) | 理賠資訊提取方法和裝置、電子設備 | |
US10380164B2 (en) | System and method for using on-image gestures and multimedia content elements as search queries | |
CN111539412B (zh) | 一种基于ocr的图像分析方法、系统、设备及介质 | |
WO2021012493A1 (zh) | 短视频关键词提取方法、装置及存储介质 | |
WO2021047587A1 (zh) | 手势识别方法、电子设备、计算机可读存储介质和芯片 | |
CN107766403B (zh) | 一种相册处理方法、移动终端以及计算机可读存储介质 | |
US20220092353A1 (en) | Method and device for training image recognition model, equipment and medium | |
CN112395979A (zh) | 基于图像的健康状态识别方法、装置、设备及存储介质 | |
US11164028B2 (en) | License plate detection system | |
CN113177133B (zh) | 一种图像检索方法、装置、设备及存储介质 | |
KR100648161B1 (ko) | 협력적인 수기 입력을 위한 시스템 및 방법 | |
US10133955B2 (en) | Systems and methods for object recognition based on human visual pathway | |
WO2021128846A1 (zh) | 电子文件的控制方法、装置、计算机设备及存储介质 | |
WO2021051602A1 (zh) | 基于唇语密码的人脸识别方法、系统、装置及存储介质 | |
WO2020224127A1 (zh) | 视频流截取方法、装置及存储介质 | |
EP4244830A1 (en) | Semantic segmentation for stroke classification in inking application | |
WO2022089020A1 (zh) | 事件展示方法及装置、存储介质及电子设备 | |
US20210195095A1 (en) | Systems and methods for guiding image sensor angle settings in different environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19927969 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19927969 Country of ref document: EP Kind code of ref document: A1 |