CN114924645A - Interaction method and system based on gesture recognition - Google Patents
Interaction method and system based on gesture recognition Download PDFInfo
- Publication number
- CN114924645A CN114924645A CN202210542825.0A CN202210542825A CN114924645A CN 114924645 A CN114924645 A CN 114924645A CN 202210542825 A CN202210542825 A CN 202210542825A CN 114924645 A CN114924645 A CN 114924645A
- Authority
- CN
- China
- Prior art keywords
- human
- gesture
- area
- speaker
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an interaction method and system based on gesture recognition, wherein the interaction method based on gesture recognition comprises the following steps: acquiring a video data stream of a monitoring area, and acquiring an image frame from the video data stream; performing human shape detection on the image frame to determine human shape regions in the image frame; and executing gesture detection on the human-shaped area, and determining a focusing area according to the gesture detection result. According to the method and the device, the focusing area is determined according to the gesture detection result, so that the target area in the conference is focused in real time. If the target area is set as a human-shaped area of the speaker or the participant, the speaker or the participant in the conference can be effectively focused in real time.
Description
Technical Field
The invention relates to the technical field of artificial intelligence recognition interaction, in particular to an interaction method and system based on gesture recognition.
Background
With the high-speed development of artificial intelligence technology, the visual field makes a major breakthrough, and visual algorithm technologies such as face recognition, target detection, target tracking and the like are widely applied in various industries. At present, in a conference interaction mode, an intelligent conference interaction mode is a trend of future development. The pure voice and pure video interaction mode in the traditional online conference is too monotonous, and when the picture displayed in the traditional online conference contains too much background, the participants can not be effectively focused, and the speaker can not be highlighted in the multi-person conference scene.
Therefore, the invention provides an interaction method and system based on gesture recognition, so as to effectively focus on a speaker and participants in a conference.
Disclosure of Invention
The invention provides an interaction method and an interaction system based on gesture recognition, which are used for effectively focusing a speaker and participants in a conference.
In a first aspect, the present invention provides an interaction method based on gesture recognition, including: acquiring a video data stream of a monitoring area, and acquiring an image frame from the video data stream; performing human shape detection on the image frame to determine human shape regions in the image frame; and executing gesture detection on the human-shaped area, and determining a focusing area according to the gesture detection result.
The beneficial effects are that: according to the method and the device, the focusing area is determined according to the gesture detection result, so that the target area in the conference is focused in real time. If the target area is set as a human-shaped area of the speaker or the participant, the speaker or the participant in the conference can be effectively focused in real time.
Optionally, the performing gesture detection on the humanoid region includes: executing gesture detection on the human-shaped area, if a first gesture is detected, determining that the human-shaped area containing the first gesture is a human-shaped area of a speaker, and the human-shaped area not containing the first gesture is a human-shaped area of a participant, and executing real-time focusing processing on the human-shaped area of the speaker; and if the second gesture is detected, performing focusing processing on the human-shaped area of the participant. The beneficial effects are that: according to the first gesture and the second gesture of the speaker stroke, the focusing area can be freely switched, so that the effect of an online conference is achieved.
Further optionally, the performing gesture detection on the humanoid region further includes: and if the first gesture and the second gesture are not detected, executing real-time focusing processing on the human-shaped area of the participant. The beneficial effects are that: by performing real-time focusing on the human-shaped areas of the participants, the privacy of the participants can be protected, and some useless information or interference information can be effectively shielded.
Optionally, the performing real-time focusing processing on the human-shaped area of the speaker comprises: performing face detection on the human-shaped area of the speaker, and determining the facial features of the speaker according to the result of the face detection; performing face recognition on the image frame based on facial features of the speaker when a human-shaped region of the speaker is not detected; and determining a humanoid area containing the speaker based on the result of the face recognition, and performing real-time focusing processing on the humanoid area containing the speaker. The beneficial effects are that: when the human shape detection of the speaker fails due to reasons such as shielding, the human shape area of the speaker can be determined again according to the result of face recognition, and real-time focusing is performed to prevent the loss of a focusing target.
Further optionally, the performing face recognition on the image frame based on the facial features of the speaker comprises: and if the face of the speaker is not recognized, quitting the real-time focusing processing of the human-shaped area of the speaker, and executing the real-time focusing processing on the human-shaped area of the participant. The beneficial effects are that: if the face of the speaker is not detected, which indicates that the speaker may temporarily leave the conference, the focus of the human-shaped area of the participant is switched.
Optionally, the interaction method based on gesture recognition further includes: arranging anti-shake areas around the human-shaped areas of the speaker and the participants; if the human shape position of the speaker exceeds the anti-shake area, human shape detection is carried out on the image frame again, and the human shape area of the speaker is determined again according to a detection result; and if the position of the human figure of the participant exceeds the anti-shake area, re-performing human figure detection on the image frame, and re-determining the human figure area of the participant according to the detection result. The beneficial effects are that: since the position of the person may change, such as lowering the head to take notes, or holding a cup, or suddenly standing or sitting down, it is necessary to set the anti-shake range to control the human figure region within a reasonable range and reduce the number of times of human figure re-detection.
Further optionally, if the position of the speaker does not exceed the anti-shake area, locking the human-shaped area of the speaker; and if the position of the participant does not exceed the anti-shake area, locking the human-shaped area of the participant.
Optionally, the performing real-time focusing on the humanoid region of the speaker comprises: and performing feature extraction on the humanoid area of the speaker, and predicting the action track of the speaker in the next frame based on the detected first gesture so as to realize real-time focusing processing on the humanoid area of the speaker.
In a second aspect, the invention is directed to an interaction system based on gesture recognition, configured to perform the interaction method based on gesture recognition according to any one of the first aspect, the system comprising modules/units for performing the method according to any one of the possible designs of the first aspect. These modules/units may be implemented by hardware, or by hardware executing corresponding software.
As for the advantageous effects of the above second aspect, reference may be made to the description of the above first aspect.
Drawings
FIG. 1 is a flowchart of an embodiment of an interaction method based on gesture recognition according to the present invention;
FIG. 2 is a schematic structural diagram of an embodiment of an interactive system based on gesture recognition according to the present invention;
fig. 3 is a schematic diagram of a screenshot of an online conference provided by the present invention.
Detailed Description
The technical solution in the embodiments of the present application is described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments of the present application, the terminology used in the following embodiments is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the following embodiments of the present application, "at least one", "one or more" means one or more than two (including two). The term "and/or" is used to describe an association relationship that associates objects, meaning that three relationships may exist; for example, a and/or B, may represent: a alone, both A and B, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather mean "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise. The term "coupled" includes direct coupling and indirect coupling, unless otherwise noted. "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.
In the embodiments of the present application, the words "exemplary" or "such as" are used herein to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
The invention provides an interaction method based on gesture recognition, the flow of which is shown in figure 1, and the method comprises the following steps:
s101: acquiring a video data stream of a monitoring area, and acquiring an image frame from the video data stream;
s102: performing human shape detection on the image frame to determine human shape regions in the image frame;
s103: and executing gesture detection on the human-shaped area, and determining a focusing area according to the gesture detection result.
In a possible embodiment, optionally, the performing gesture detection on the humanoid region comprises: executing gesture detection on the human-shaped area, if a first gesture is detected, determining that the human-shaped area containing the first gesture is a human-shaped area of a speaker, and the human-shaped area not containing the first gesture is a human-shaped area of a participant, and executing real-time focusing processing on the human-shaped area of the speaker; and if the second gesture is detected, performing focusing processing on the human-shaped area of the participant. In this embodiment, according to the first gesture and the second gesture of the speaker stroke, the focus area can be freely switched, so as to achieve the effect of an online conference.
In another possible embodiment, the performing gesture detection on the humanoid region further includes: and if the first gesture and the second gesture are not detected, executing real-time focusing processing on the human-shaped area of the participant. In the embodiment, the real-time focusing processing is performed on the human-shaped area of the participant, so that the privacy of the participant can be protected, and useless information or interference information can be effectively shielded.
Illustratively, a picture shot by a monocular camera is transmitted into a display to display a picture of a video conference, and the picture image is sent into a human shape and gesture detection model to identify the human shape and the gesture in the picture. If the first gesture is not found, displaying a dynamic real-time focusing picture of the participant; if the first gesture is found, the picture is dynamically focused to a gesture maker, namely a speaker, and meanwhile, only if the speaker makes a second gesture, the focusing mode of the speaker can be closed, and the dynamic real-time focusing picture of the participant is switched.
In a further possible embodiment, the performing a real-time focusing process on the humanoid region of the speaker comprises: performing face detection on the human-shaped area of the speaker, and determining the facial features of the speaker according to the result of the face detection; performing face recognition on the image frame based on facial features of the speaker when a human-shaped region of the speaker is not detected; and determining a human-shaped area containing the speaker based on the face recognition result, and performing real-time focusing processing on the human-shaped area containing the speaker. In this embodiment, when the human shape detection of the speaker fails due to a shielding or the like, the human shape area of the speaker may be determined again according to the result of the face recognition, and real-time focusing may be performed to prevent a focused target from being lost.
In one possible embodiment, the performing face recognition on the image frames based on facial features of the speaker comprises: and if the face of the speaker is not recognized, exiting the real-time focusing processing on the human-shaped area of the speaker, and executing the real-time focusing processing on the human-shaped area of the participant. In this embodiment, if the face of the presenter is not detected, which indicates that the presenter may temporarily leave the conference, the focus is switched to the participant.
In one possible embodiment, an anti-shake area is provided around the human-shaped area of the speaker and the human-shaped area of the participant; if the human shape position of the speaker exceeds the anti-shake area, human shape detection is carried out on the image frame again, and the human shape area of the speaker is determined again according to the detection result; and if the human shape position of the participant exceeds the anti-shake area, carrying out human shape detection on the image frame again, and determining the human shape area of the participant again according to the detection result. In this embodiment, since the position of the human may change, such as lowering the head to take a note, or taking a cup, or suddenly standing or sitting down, it is necessary to set the anti-shake range to control the human shape area within a reasonable range and reduce the number of times of human shape re-detection.
In yet another possible embodiment, the human-shaped area of the speaker is locked if the position of the speaker does not exceed the anti-shake area; and if the position of the participant does not exceed the anti-shake area, locking the human-shaped area of the participant.
Illustratively, since the position of the detected person is constantly changing, which may cause a slight jitter in the focused picture, the target position is controlled within a reasonable range, i.e., an anti-jitter range needs to be set. If the target position detected by the current frame is out of the range, the target position detected by the current frame is obtained again, otherwise, the target position is the target position of the previous frame, the current frame is i, the target position is x, and the jitter range is k, then:optionally, the redundancy processing may be further performed, and based on the position of the speaker or the position of the participant, the widening processing is performed to make the display more reasonable, and if the current frame is i, the target size is y, and the redundancy coefficient is r, then: y is y r (r is more than or equal to 1); and then executing focusing processing on the target human-shaped area, wherein the focusing processing is a sliding focusing process, when the picture is focused from one position to another position, the picture is also slowly moved from one position to another position to form a picture dynamic silk sliding process, and if the current frame is i, the sliding coefficient is a, the range is (0, 1), the target position width w and the central point (cx, cy) slide according to the width and the central point, the following steps are carried out:the actual picture being a final output displayThe height of the picture with fixed size is calculated according to the size proportion of the actual picture by the calculated width, and finally the picture is determined by the width, the height and the central point and is scaled in the same proportion to obtain the actual picture.
In one possible embodiment, the performing real-time focusing on the human-shaped area of the speaker comprises: and performing feature extraction on the humanoid area of the speaker, and predicting the action track of the speaker in the next frame based on the detected first gesture so as to realize real-time focusing processing on the humanoid area of the speaker.
Illustratively, the action track of the speaker in the next frame is predicted based on a method combining deep learning and traditional tracking ideas. Firstly, a tracking mode is started through gesture recognition, a speaker is recognized based on a detection algorithm, next frame track and characteristic information of the speaker are predicted through a Kalman filtering method and a ReID (pedestrian re-recognition) method, and then the track information and the characteristic information obtained through Hungary algorithm are matched with the speaker detected and recognized in the current frame.
The interaction method based on gesture recognition provided by the invention can get rid of the traditional remote control mode and realize intelligent control by executing the gesture command at a long distance.
The invention provides an interaction system based on gesture recognition, which is configured to execute the interaction method based on gesture recognition according to any one of the above embodiments, as shown in fig. 2, and the interaction system comprises: the system comprises an acquisition module 201, a human shape detection module 202, a gesture detection module 203 and a focusing module 204; the obtaining module 201 is configured to obtain a video data stream of a monitored area, and obtain an image frame from the video data stream; the human shape detection module 202 is configured to perform human shape detection on the image frame to determine a human shape region in the image frame; the gesture detection module 203 is configured to perform gesture detection on the human-shaped region, and the focusing module 204 is configured to determine a focusing region according to a result of the gesture detection.
In one possible embodiment, the gesture detection module comprises: a setting unit and a detection unit; the setting unit is used for setting a first gesture and a second gesture; the detection unit is used for executing gesture detection on the human-shaped area; if the first gesture is detected, determining that a humanoid area containing the first gesture is a humanoid area of a speaker, and a humanoid area not containing the first gesture is a humanoid area of a participant, wherein the focusing module executes real-time focusing processing on the humanoid area of the speaker; and if the second gesture is detected, the focusing module performs focusing processing on the human-shaped area of the participant.
Illustratively, as shown in fig. 3, the full screen of the online conference includes 5 participants including the speaker, a shape area 1 of the speaker, and shape areas 2-4 of the participants. If the first gesture is detected, the focusing module performs real-time focusing processing on the humanoid area 1 of the speaker; if a second gesture is detected, the focusing module performs focusing on the human-shaped regions 2-4 of the participants.
The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered within the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. An interaction method based on gesture recognition is characterized by comprising the following steps:
acquiring a video data stream of a monitoring area, and acquiring an image frame from the video data stream;
performing human shape detection on the image frame to determine human shape regions in the image frame;
and executing gesture detection on the human-shaped area, and determining a focusing area according to the gesture detection result.
2. The gesture recognition based interaction method according to claim 1, wherein the performing gesture detection on the humanoid region comprises:
executing gesture detection on the human-shaped area, if a first gesture is detected, determining that the human-shaped area containing the first gesture is a human-shaped area of a speaker, and the human-shaped area not containing the first gesture is a human-shaped area of a participant, and executing real-time focusing processing on the human-shaped area of the speaker; and if the second gesture is detected, performing focusing processing on the human-shaped area of the participant.
3. The gesture recognition based interaction method according to claim 2, wherein the performing gesture detection on the humanoid region further comprises: and if neither the first gesture nor the second gesture is detected, performing real-time focusing processing on the human-shaped area of the participant.
4. The interaction method based on gesture recognition according to claim 2, wherein the performing of real-time focusing processing on the humanoid region of the speaker comprises: executing face detection on the human-shaped area of the speaker, and determining the facial features of the speaker according to the face detection result;
performing face recognition on the image frame based on facial features of the speaker when a human-shaped region of the speaker is not detected;
and determining a humanoid area containing the speaker based on the result of the face recognition, and performing real-time focusing processing on the humanoid area containing the speaker.
5. The gesture recognition based interaction method according to claim 4, wherein the performing of the face recognition on the image frame based on the facial features of the speaker comprises:
and if the face of the speaker is not recognized, exiting the real-time focusing processing on the human-shaped area of the speaker, and executing the real-time focusing processing on the human-shaped area of the participant.
6. The gesture recognition based interaction method according to claim 2, further comprising: arranging anti-shake areas around the human-shaped area of the speaker and the human-shaped area of the participant; if the human shape position of the speaker exceeds the anti-shake area, human shape detection is carried out on the image frame again, and the human shape area of the speaker is determined again according to a detection result; and if the position of the human figure of the participant exceeds the anti-shake area, re-performing human figure detection on the image frame, and re-determining the human figure area of the participant according to the detection result.
7. The interaction method based on gesture recognition according to claim 6, wherein if the position of the speaker does not exceed the anti-shake area, the human-shaped area of the speaker is locked; and if the position of the participant does not exceed the anti-shake area, locking the human-shaped area of the participant.
8. The interaction method based on gesture recognition according to claim 2, wherein the performing of real-time focusing on the humanoid region of the speaker comprises:
and performing feature extraction on the humanoid area of the speaker, and predicting the action track of the speaker in the next frame based on the detected first gesture so as to realize real-time focusing processing on the humanoid area of the speaker.
9. A gesture recognition based interaction system configured to perform the gesture recognition based interaction method according to any one of claims 1 to 8, comprising: the system comprises an acquisition module, a human shape detection module, a gesture detection module and a focusing module;
the acquisition module is used for acquiring a video data stream of a monitoring area and acquiring an image frame from the video data stream;
the human shape detection module is used for performing human shape detection on the image frame so as to determine a human shape area in the image frame;
the gesture detection module is used for executing gesture detection on the humanoid area, and the focusing module is used for determining a focusing area according to a gesture detection result.
10. The gesture recognition based interaction system according to claim 9, wherein the gesture detection module comprises: a setting unit and a detection unit; the setting unit is used for setting a first gesture and a second gesture; the detection unit is used for executing gesture detection on the human-shaped area; if the first gesture is detected, determining that a humanoid area containing the first gesture is a humanoid area of a speaker, and a humanoid area not containing the first gesture is a humanoid area of a participant, wherein the focusing module executes real-time focusing processing on the humanoid area of the speaker; and if the second gesture is detected, the focusing module executes focusing processing on the human-shaped area of the participant.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210542825.0A CN114924645A (en) | 2022-05-18 | 2022-05-18 | Interaction method and system based on gesture recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210542825.0A CN114924645A (en) | 2022-05-18 | 2022-05-18 | Interaction method and system based on gesture recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114924645A true CN114924645A (en) | 2022-08-19 |
Family
ID=82808675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210542825.0A Pending CN114924645A (en) | 2022-05-18 | 2022-05-18 | Interaction method and system based on gesture recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114924645A (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103281508A (en) * | 2013-05-23 | 2013-09-04 | 深圳锐取信息技术股份有限公司 | Video picture switching method, video picture switching system, recording and broadcasting server and video recording and broadcasting system |
CN105049764A (en) * | 2015-06-17 | 2015-11-11 | 武汉智亿方科技有限公司 | Image tracking method and system for teaching based on multiple positioning cameras |
CN108664853A (en) * | 2017-03-30 | 2018-10-16 | 北京君正集成电路股份有限公司 | Method for detecting human face and device |
CN109257559A (en) * | 2018-09-28 | 2019-01-22 | 苏州科达科技股份有限公司 | A kind of image display method, device and the video conferencing system of panoramic video meeting |
CN111079686A (en) * | 2019-12-25 | 2020-04-28 | 开放智能机器(上海)有限公司 | Single-stage face detection and key point positioning method and system |
CN112052805A (en) * | 2020-09-10 | 2020-12-08 | 深圳数联天下智能科技有限公司 | Face detection frame display method, image processing device, equipment and storage medium |
CN112689092A (en) * | 2020-12-23 | 2021-04-20 | 广州市迪士普音响科技有限公司 | Automatic tracking conference recording and broadcasting method, system, device and storage medium |
CN112954451A (en) * | 2021-02-05 | 2021-06-11 | 广州市奥威亚电子科技有限公司 | Method, device and equipment for adding information to video character and storage medium |
CN113705510A (en) * | 2021-09-02 | 2021-11-26 | 广州市奥威亚电子科技有限公司 | Target identification tracking method, device, equipment and storage medium |
CN113784046A (en) * | 2021-08-31 | 2021-12-10 | 北京安博盛赢教育科技有限责任公司 | Follow-up shooting method, device, medium and electronic equipment |
CN113784045A (en) * | 2021-08-31 | 2021-12-10 | 北京安博盛赢教育科技有限责任公司 | Focusing interaction method, device, medium and electronic equipment |
-
2022
- 2022-05-18 CN CN202210542825.0A patent/CN114924645A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103281508A (en) * | 2013-05-23 | 2013-09-04 | 深圳锐取信息技术股份有限公司 | Video picture switching method, video picture switching system, recording and broadcasting server and video recording and broadcasting system |
CN105049764A (en) * | 2015-06-17 | 2015-11-11 | 武汉智亿方科技有限公司 | Image tracking method and system for teaching based on multiple positioning cameras |
CN108664853A (en) * | 2017-03-30 | 2018-10-16 | 北京君正集成电路股份有限公司 | Method for detecting human face and device |
CN109257559A (en) * | 2018-09-28 | 2019-01-22 | 苏州科达科技股份有限公司 | A kind of image display method, device and the video conferencing system of panoramic video meeting |
CN111079686A (en) * | 2019-12-25 | 2020-04-28 | 开放智能机器(上海)有限公司 | Single-stage face detection and key point positioning method and system |
CN112052805A (en) * | 2020-09-10 | 2020-12-08 | 深圳数联天下智能科技有限公司 | Face detection frame display method, image processing device, equipment and storage medium |
CN112689092A (en) * | 2020-12-23 | 2021-04-20 | 广州市迪士普音响科技有限公司 | Automatic tracking conference recording and broadcasting method, system, device and storage medium |
CN112954451A (en) * | 2021-02-05 | 2021-06-11 | 广州市奥威亚电子科技有限公司 | Method, device and equipment for adding information to video character and storage medium |
CN113784046A (en) * | 2021-08-31 | 2021-12-10 | 北京安博盛赢教育科技有限责任公司 | Follow-up shooting method, device, medium and electronic equipment |
CN113784045A (en) * | 2021-08-31 | 2021-12-10 | 北京安博盛赢教育科技有限责任公司 | Focusing interaction method, device, medium and electronic equipment |
CN113705510A (en) * | 2021-09-02 | 2021-11-26 | 广州市奥威亚电子科技有限公司 | Target identification tracking method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3855731A1 (en) | Context based target framing in a teleconferencing environment | |
US9396399B1 (en) | Unusual event detection in wide-angle video (based on moving object trajectories) | |
US6894714B2 (en) | Method and apparatus for predicting events in video conferencing and other applications | |
US11803984B2 (en) | Optimal view selection in a teleconferencing system with cascaded cameras | |
US11076127B1 (en) | System and method for automatically framing conversations in a meeting or a video conference | |
CN104169842B (en) | For controlling method, the method for operating video clip, face orientation detector and the videoconference server of video clip | |
JP2007147762A (en) | Speaker predicting device and speaker predicting method | |
US11423550B2 (en) | Presenter-tracker management in a videoconferencing environment | |
CN111488774A (en) | Image processing method and device for image processing | |
CN113591562A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
US12014562B2 (en) | Method and system for automatic speaker framing in video applications | |
US20220319034A1 (en) | Head Pose Estimation in a Multi-Camera Teleconferencing System | |
US11496675B2 (en) | Region of interest based adjustment of camera parameters in a teleconferencing environment | |
CN114816045A (en) | Method and device for determining interaction gesture and electronic equipment | |
CN110072055A (en) | Video creating method and system based on artificial intelligence | |
US20220327732A1 (en) | Information processing apparatus, information processing method, and program | |
WO2024131131A1 (en) | Conference video data processing method and system, and conference terminal and medium | |
CN114924645A (en) | Interaction method and system based on gesture recognition | |
JP4831750B2 (en) | Communication trigger system | |
Komiya et al. | Image-based attention level estimation of interaction scene by head pose and gaze information | |
JP2018049482A (en) | Evaluation system, information processing apparatus, and program | |
CN112287877A (en) | Multi-role close-up shot tracking method | |
JP2009106325A (en) | Communication induction system | |
Nishida et al. | SOANets: Encoder-decoder based Skeleton Orientation Alignment Network for White Cane User Recognition from 2D Human Skeleton Sequence. | |
WO2023137715A1 (en) | Gimbal control method and apparatus, and movable platform and computer-readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |