[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113657185A - Intelligent auxiliary method, device and medium for piano practice - Google Patents

Intelligent auxiliary method, device and medium for piano practice Download PDF

Info

Publication number
CN113657185A
CN113657185A CN202110843452.6A CN202110843452A CN113657185A CN 113657185 A CN113657185 A CN 113657185A CN 202110843452 A CN202110843452 A CN 202110843452A CN 113657185 A CN113657185 A CN 113657185A
Authority
CN
China
Prior art keywords
key
image
video
keys
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110843452.6A
Other languages
Chinese (zh)
Inventor
唐浩鑫
胡建华
王承聪
张颖
魏嘉俊
许雁嘉
莫异江
吴伟杰
梁梓豪
姚琦彤
陈广意
赵泓皓
陈莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Institute of Science and Technology
Original Assignee
Guangdong Institute of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Institute of Science and Technology filed Critical Guangdong Institute of Science and Technology
Priority to CN202110843452.6A priority Critical patent/CN113657185A/en
Publication of CN113657185A publication Critical patent/CN113657185A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)

Abstract

本发明实施例公开一种钢琴练习智能辅助方法、装置及介质,能够准确评估弹奏的成绩以及提供直观的指法和音符等信息的指引。所述方法包括:将演奏视频逐帧分解成琴键图像,并对所述图像进行预处理,其中预处理包括标注琴键区域范围;根据琴键图像的时间信息,确定按下琴键的位置信息,将琴键图像输入到预设的手势识别模型,预测手势识别关节点,输出手指坐标,若手指坐标在琴键的位置范围内,则关联手指与琴键,得到对应关系;将基于不同演奏视频得到的不同对应关系输入深度学习模型,输出弹奏成绩;镭射光线投射到琴键上提示弹奏位置和指法信息。

Figure 202110843452

The embodiment of the invention discloses an intelligent assistant method, device and medium for piano practice, which can accurately evaluate the performance of playing and provide intuitive information such as fingering and notes. The method includes: decomposing the performance video into key images frame by frame, and preprocessing the images, wherein the preprocessing includes marking the range of the key area; determining the position information of the pressed key according to the time information of the key image, The image is input to the preset gesture recognition model, the joint points of gesture recognition are predicted, and the coordinates of the fingers are output. If the coordinates of the fingers are within the position range of the keys, the corresponding relationship between the fingers and the keys is obtained; the different corresponding relationships obtained based on different performance videos will be obtained. Input the deep learning model and output the playing score; the laser light is projected on the keys to prompt the playing position and fingering information.

Figure 202110843452

Description

Intelligent auxiliary method, device and medium for piano practice
Technical Field
The invention relates to the field of deep learning, in particular to an intelligent auxiliary method, device and medium for piano practice.
Background
With the improvement of living standard and art maintenance of people, more and more users learn the piano. Because the music foundation is weak, finding the corresponding keys according to the music score is a great learning obstacle, so that the mastering of the positions of the keys of the staff and the piano is unskilled and key pressing errors or note errors are inevitable in the playing process. At present, in the teaching of a piano, a teacher generally only can talk about a plurality of students exemplarily for several times of music at the same time, in the playing process, the students remember the playing skills and rhythm of the teacher, and generally need to go home and practice playing by themselves, so that the students lose reference when going home and practice by themselves, and particularly cannot be guided and corrected immediately when practicing at home. Greatly reducing the learning interest of the piano learners.
In the piano practice process, in order to guide a user to be able to accurately play a tune including correct fingering and musical notes and to judge whether the fingering and musical notes of the user are accurate, an intelligent auxiliary method for piano practice is required.
Disclosure of Invention
The invention provides an intelligent auxiliary method, device and medium for piano practice, and aims to at least solve one of technical problems in the prior art.
In a first aspect, an embodiment of the present invention provides an intelligent auxiliary method for piano practice, including: decomposing the first playing video into first key images frame by frame, and preprocessing the first key images, wherein preprocessing comprises marking key region ranges; determining the position information of the pressed key according to the time information of the first key image, inputting the first key image into a preset gesture recognition model, predicting a gesture recognition joint point, outputting a finger coordinate, and associating the finger with the key if the finger coordinate is in the position range of the key to obtain a first corresponding relation; decomposing the second playing video into second key images frame by frame, and preprocessing the images, wherein the preprocessing comprises marking key region ranges; determining the position information of the pressed key according to the time information of the second key image, inputting the second key image into a preset gesture recognition model, predicting a gesture recognition joint point, outputting a finger coordinate, and associating the finger with the key if the finger coordinate is in the position range of the key to obtain a second corresponding relation; inputting the first corresponding relation and the second corresponding relation into a deep learning model, and outputting a playing score; laser light is projected on the keys to prompt playing positions and fingering information.
Optionally, the image preprocessing further includes: and normalizing the size of the key in the image.
Optionally, the method for intelligently assisting piano practice includes the steps of marking a key region range, transforming a key image coordinate system to a pixel coordinate system, and determining the region range of each key based on the pixel coordinate.
Optionally, the method for intelligently assisting piano practice determines the position information of the pressed key, and includes associating the coordinate information of the pressed key by time-associating the generated information of the triggered key at a moment corresponding to the video of each image.
Optionally, the intelligent auxiliary method for piano practice is implemented by fine-tuning a trained gesture recognition model based on a real data set;
optionally, the intelligent assisting method for piano practice includes reading an area range of the pressed key and the finger coordinate in the position range of the key, comparing the area range of the pressed key and the finger coordinate in real time, and determining that the key is pressed by the finger if the position of the finger is in the area range of the pressed key.
Optionally, the intelligent auxiliary method for piano practice is characterized in that the step of predicting the key points of the hand includes the steps of transmitting the recognized gesture prediction frame image into a high-resolution network to serve as a backbone neural network, generating a single heat map with multiple resolutions and high resolutions by adopting convolution and deconvolution modules, predicting the gesture recognition joint points, and outputting finger coordinates.
Optionally, the intelligent auxiliary method for piano practice is that the playing video is a segment, and in the process of decomposing the video, a time point corresponding to the segment is selected.
In a second aspect, an embodiment of the present invention further provides an intelligent auxiliary device for piano practice, including: the video decomposition module is used for decomposing the video into images; the key marking module is used for selecting the played sound area image and marking the area range of each key; the information acquisition module is used for identifying the coordinates of the pressed keys; a gesture detection module for predicting a confidence map of the hand mask; the gesture recognition module is used for predicting a confidence map of the joint points of the hand; the gesture scoring module is used for evaluating the accuracy level of fingering and musical notes played currently; the playing guide module is used for guiding a player to play; the video recording device is used for recording the video played by the player on site; the laser guiding device is used for guiding fingering and note information of a player, wherein the laser device guides the fingering and the note information in a laser ray emitting mode.
The invention has the following beneficial effects:
1. the characteristics of the gesture images are largely learned through a deep convolutional neural network, and the coordinates of the fingers are accurately identified;
2. by judging whether the coordinates of the fingers are in the area range of the pressed keys or not, the corresponding relation between the fingers and the pressed keys is accurately matched;
3. by comparing the corresponding relation between the fingers of different videos and the keys pressed down, the playing fingering and the note playing level of the player are accurately evaluated;
4. the method comprises the following steps of intuitively guiding relevant information of a player through a laser device, wherein the relevant information comprises fingering and note information;
5. and judging whether the playing progresses and making a training plan in a targeted manner according to the data of the historical scores of the players.
Drawings
Fig. 1 is a general flowchart of an intelligent assisting method for piano practice according to the present invention.
Fig. 2 is a detailed flowchart of an intelligent assisting method for piano practice according to the present invention.
Detailed Description
The conception, the specific structure and the technical effects of the present invention will be clearly and completely described in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the schemes and the effects of the present invention.
It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it may be directly fixed or connected to the other feature or indirectly fixed or connected to the other feature. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any combination of one or more of the associated listed items.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element of the same type from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. The use of any and all examples, or exemplary language ("e.g.," such as "or the like") provided herein, is intended merely to better illuminate embodiments of the invention and does not pose a limitation on the scope of the invention unless otherwise claimed.
In a first aspect, the embodiment of the invention further provides an intelligent auxiliary method for piano practice, which can be used for assisting piano practice with stronger pertinence and more effectiveness.
As shown in fig. 1, an intelligent assisting method for piano practice comprises the following steps:
s1, decomposing the first playing video frame by frame into first key images, and preprocessing the images, wherein preprocessing comprises marking key region ranges;
s2, determining the position information of the pressed key according to the time information of the first key image, inputting the first key image into a preset gesture recognition model, predicting a gesture recognition joint point, outputting a finger coordinate, and associating the finger with the key if the finger coordinate is within the position range of the key to obtain a first corresponding relation;
s3, decomposing the second playing video into second key images frame by frame, and preprocessing the images, wherein preprocessing comprises marking key region ranges;
s4, determining the position information of the pressed key according to the time information of the second key image, inputting the second key image into a preset gesture recognition model, predicting a gesture recognition joint point, outputting a finger coordinate, and associating the finger with the key if the finger coordinate is within the position range of the key to obtain a second corresponding relation;
s5, inputting the first corresponding relation and the second corresponding relation into a deep learning model, and outputting a playing score;
s6, projecting laser light on the keys to prompt playing positions and fingering information.
In step S1, the first and/or second performance videos in step S3 may be videos played by keyboard musical instruments such as a piano and a harmonica, the first performance video and the second performance video should be guaranteed to be the same keyboard musical instrument and the same initial tune is played, the time frame lengths of the video decomposition are the same to guarantee real-time synchronization of images at each moment in the first key image and the second key image, wherein the first and/or second performance video may be recorded by a video capture device, or selected from a video repository, the first and/or second key image may be one or more key images, wherein the first and/or second images cover the whole key, the first playing video can be the first key image obtained by decomposing the student practice video, and the second playing video can be the second key image obtained by decomposing the teacher teaching video.
Details of the above steps are described in various embodiments below in conjunction with the flow chart shown in fig. 2.
In an embodiment, the steps S1 and S3 specifically include:
s11, decomposing the video frame: converting a video frame into one or more images by adopting OpenCV (open source computer vision) and outputting the images, wherein the OpenCV is an open source function library used for image processing, analysis and machine vision;
s12, note key region range: and detecting and selecting the played sound area image through a target detection algorithm, and marking the area range of each key. Due to the shooting angle, the shooting distance and the like, the sizes of keys on the image are different, optionally, the sizes of the keys in the image are normalized, after the sizes of the keys are normalized, a key image coordinate system is converted into a pixel coordinate system, and the area range of each key is determined through pixel coordinates. Marking the key range by adopting a target detection algorithm, optionally, detecting the region range of the key by adopting a target detection algorithm based on yolov4, wherein the characteristics are extracted through a sliding window, the window obtains corresponding scores after being classified and identified by a classifier, when the multi-window and other windows are included or crossed, selecting the window with the highest score in the neighborhood and the low inhibition score by adopting a Non-maximum suppression algorithm (NMS), and the region range of the optional key frame can be determined by a rectangular frame consisting of at least two coordinate points.
In an embodiment, the steps S2 and S4 specifically include:
s21, associating the coordinate information of the depressed key: when the action of pressing a key is triggered, the information acquisition module can uniquely determine the position information of the pressed key, for example, when a plurality of keys are pressed simultaneously, because the information generated after each key is triggered is different, each information is associated with a unique key coordinate. In step S2, each image in step S4 corresponds to a moment in the video, and associates the coordinate information of the depressed key by time-associating the generated information after the key is triggered, wherein the key coordinate specifically refers to the area range of the key;
s22, the method for predicting the gesture recognition joints mainly comprises two steps of obtaining a feature map of a hand and predicting key points of the hand, wherein the first step is used for predicting a confidence map of a hand mask, and the second step is used for predicting a confidence map of joint points of the hand, the two steps adopt an iterative cascade structure, and the accuracy of gesture recognition is effectively improved by utilizing the back propagation of end-to-end training:
s22-1, acquiring a characteristic diagram of the hand:
and (3) selecting a data set, optionally selecting an MSCOCO data set as a training set, wherein the MSCOCO data set is a data set constructed by Microsoft and comprises tasks such as detection, segmentation, key points and the like, more than 20 ten thousand pictures with more than 80 types are provided, image materials from twenty-many Piano schoolchild piano are collected as a fine adjustment data set, fine adjustment is carried out on a trained model to further improve the accuracy of target detection, 5000 images in RHD are selected as a test set, and the RHD data set is a commonly used test gesture recognition data set.
Taking an image containing human hand information as input to obtain a characteristic diagram with a target as a hand, for example, a target detection model is based on a Yolov3 neural network structure, specifically, a convolutional layer Conv layer processes an input image by adopting a plurality of different convolutional kernels to obtain different response characteristic diagrams, a BN layer normalizes all batch processing data and performs down-sampling by adopting convolution with the step length of 2, the extracted shallow features and deep features can be simultaneously utilized by the detection network through feature fusion, the feature map of the hand is output, an effective gesture recognition area is obtained, the fusion of the high-level features and the bottom-level features is realized through a target detection model of the neural network based on Yolov3, the result is predicted by using the multi-scale feature map, the parallel operation function of the multi-core processor and the GPU is fully exerted, and the feature map of the hand is obtained at high speed, so that the video frame is detected in real time.
In one embodiment, the input image is first pre-processed and then the spatial layout of the hand in the color image is encoded. Optionally, a convolution stage from the VGG-19 network to com4 generates feature F of 512 channels, the number of channels is increased so that more information can be extracted, and then the feature F is convolved to obtain a hand mask part of two channels, wherein, the VGG19 has 19 layers in total, including 16 convolutional layers and the last 3 fully-connected layers, and a pooling layer is adopted in the middle.
In one embodiment:
1, input layer: inputting a 64x64x3 three-channel color image, wherein the average value of RGB is subtracted from each pixel in the input image;
2, a convolutional layer: the input dimension is 64x64x3, the preprocessed image is subjected to five times of convolution by 64 convolution kernels of 5x5 + ReLU, the step size is 1, and the size after the convolution is 60x60x 64;
3, sampling layer: the input dimension is 60x60x64, and the pooling is maximized, the size of the pooling unit is 2x2, the effect is that the image size is halved, and the pooled size becomes 30x30x 64;
4, a convolutional layer: the input dimension is 30x30x64, five convolutions are performed by 96 convolution kernels of 5x5 + ReLU, the step size is 1, and the size becomes 26x26x 96;
5, sampling layer: the input dimension is 26x26x96, the maximization pooling of 3x3 is carried out, and the size is changed to 13x13x 96;
6, a convolutional layer: the input dimension is 13x13x96, five convolutions are performed by 128 convolution kernels of 5x5 + ReLU, the step size is 1, and the size becomes 9x9x 128;
7, sampling layer: the input dimension is 9x9x128, the maximization pooling of 3x3 is carried out, and the size is changed into 5x5x 128;
8, local connection layer: the input is 5x5x128, and the convolution kernel passing through 3x3 is convoluted for three times, the step size is 1, and the size is changed into 3x3x 160;
9, connecting layer: the input is 3x3x160, full join + ReLU is performed through three full join layers, for example, in the hand contour point estimation, 19 hand contour points are estimated, the structure of the join layers is set, and finally, a vector of 1x1x38 dimensions is obtained.
In one embodiment, the testing phase replaces the 3 fully connected layers with 3 convolutional layers, so that the tested fully convolutional network can receive any input with a width or height because of no restriction of full connection.
In one embodiment, the model is trained in two stages, the first stage is used for training on the synthetic data set, and the second stage is used for fine-tuning the model in the first stage on the real data set, so that the model is more robust, and can better perform in a real scene
S22-2, predicting key points of the hand, and outputting finger coordinates:
selecting a data set, optionally, using an Interhand2.6M data set as a training set, wherein the Interhand2.6M data set is a maximum 3D double-hand interaction estimation data set and consists of 360 ten thousand video frames; collecting image materials from twenty-multiple Piano schoolchildren playing pianos as a fine adjustment data set, wherein fine adjustment of the trained model can further improve the accuracy of posture estimation;
and (4) predicting key points of the hand, transmitting the gesture prediction box image identified in the step (S21) into HRnet to serve as a main neural network, generating a single heat map with multiple resolutions and high resolutions by adopting convolution and deconvolution modules, predicting the gesture recognition joint points, and outputting finger coordinates.
In one embodiment, 42 of the hand key points are estimated from the human hand outline box given in step S22-1, wherein 21 key nodes are estimated for the left hand and the right hand.
In one embodiment, the original image and the output of S22-1 are used as input for the prediction of the hand keypoints, respectively, the model structure used for the prediction of the hand keypoints is the same as that of S22-1, and finally the output of the full connected layer is an 84-dimensional vector.
S23, determining the corresponding relation between the fingers and the keys: and reading the area range of the pressed key and the coordinates of the finger, comparing the area range and the coordinates of the finger in real time, and judging that the key is pressed by the finger if the position of the finger is in the area range of the pressed key in the image.
In one embodiment, the first corresponding relation and the second corresponding relation at the same time are used as the input of the deep learning model in step S5, and the playing score is the output of the deep learning model, wherein the playing score includes the error type of fingering error, note error and the like, the place of each error, and the total score of playing.
In one embodiment, according to the first corresponding relation or the second corresponding relation, the playing guide information is projected onto the key by the laser device in a laser ray manner, wherein the prompt guide information comprises playing position and fingering information.
In one embodiment, the first performance video may be a segment, and during the process of decomposing the video, the time point of the corresponding segment is selected, so as to ensure that the decomposed first performance video and the second performance video correspond in time.
In one embodiment, the exercise plan is made by analyzing the data of the historical playing transcript, analyzing the progress situation or the playing defect of the player.
In one embodiment, the first performance video may be an entire tune, and during the process of decomposing the video, it should be ensured that the time frames of the decomposed first performance video and the second performance video are consistent.
In one embodiment, according to the corresponding relation between the fingers and the keys obtained by the teacher teaching video, the laser light projects guide fingering and tone information.
In a second aspect, the embodiment of the invention further provides an intelligent auxiliary device for piano practice, which can be used for assisting piano practice with stronger pertinence and more effectiveness.
The embodiment of the invention provides an intelligent auxiliary device for piano practice, which comprises:
the video decomposition module is used for decomposing the video into images, selecting the time length required to be decomposed and setting a decomposition time frame; the key marking module is used for selecting the played sound area image and marking the area range of each key; the information acquisition module is used for identifying the coordinates of the pressed keys; a gesture detection module for predicting a confidence map of the hand mask; the gesture recognition module is used for predicting a confidence map of the joint points of the hand; the gesture scoring module is used for evaluating the accuracy level of fingering and musical notes played currently; and the playing guide module is used for guiding the player to play.
In one embodiment, a piano practice intelligent aid may comprise: the video storage module is used for storing performance videos, wherein the videos can be videos recorded during playing and then uploaded to the video storage module, and also can be videos downloaded from a website and stored in the video storage module;
in one embodiment, a piano practice intelligent aid may comprise: the video recording device is used for recording the performance video and then uploading the performance video to the video storage module or uploading the performance video to the video storage module, and the laser device is used for projecting guide fingering and tone information.
It should be recognized that the method steps in embodiments of the present invention may be embodied or carried out by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The method may use standard programming techniques. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention may also include the computer itself when programmed according to the methods and techniques described herein.
A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
The above description is only a preferred embodiment of the present invention, and the present invention is not limited to the above embodiment, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention as long as the technical effects of the present invention are achieved by the same means. The invention is capable of other modifications and variations in its technical solution and/or its implementation, within the scope of protection of the invention.

Claims (10)

1.一种钢琴练习智能辅助方法,其特征在于,所述方法包括以下步骤:1. an intelligent assistant method for piano practice, is characterized in that, described method comprises the following steps: S1,将第一演奏视频逐帧分解成第一琴键图像,并对所述第一琴键图像进行预处理,其中预处理包括标注琴键区域范围;S1, the first performance video is decomposed into a first key image frame by frame, and the first key image is preprocessed, wherein the preprocessing includes marking the key area range; S2,根据所述第一琴键图像的时间信息,确定按下琴键的位置信息,将所述第一琴键图像输入到预设的手势识别模型,预测手势识别关节点,输出手指坐标,若手指坐标在琴键的位置范围内,则关联手指与琴键,得到第一对应关系;S2, according to the time information of the first key image, determine the position information of the pressed key, input the first key image into a preset gesture recognition model, predict the gesture recognition joint point, and output the finger coordinates, if the finger coordinates Within the range of positions of the keys, the fingers and the keys are associated to obtain the first correspondence; S3,将第二演奏视频逐帧分解成第二琴键图像,并对所述第二琴键图像进行预处理,其中预处理包括标注琴键区域范围;S3, decompose the second performance video into a second key image frame by frame, and perform preprocessing on the second key image, wherein the preprocessing includes marking the range of the key area; S4,根据所述第二琴键图像的时间信息,确定按下琴键的位置信息,将所述第二琴键图像输入到预设的手势识别模型,预测手势识别关节点,输出手指坐标,若手指坐标在琴键的位置范围内,则关联手指与琴键,得到第二对应关系;S4, according to the time information of the second key image, determine the position information of the pressed key, input the second key image into a preset gesture recognition model, predict the gesture recognition joint point, and output the finger coordinates, if the finger coordinates Within the range of the position of the keys, the fingers and the keys are associated to obtain a second correspondence; S5,将第一对应关系与第二对应关系输入深度学习模型,输出弹奏成绩;S5, input the first corresponding relationship and the second corresponding relationship into the deep learning model, and output the playing score; S6,镭射光线投射到琴键上提示弹奏位置和指法信息。S6, the laser light is projected on the keys to prompt the playing position and fingering information. 2.根据权利要求1所述的一种钢琴练习智能辅助方法,其中,所述步骤S1和S3中任一步骤包括:将所述图像中琴键的大小进行归一化。2 . The intelligent assistance method for piano practice according to claim 1 , wherein any one of the steps S1 and S3 comprises: normalizing the size of the piano keys in the image. 3 . 3.根据权利要求1所述的一种钢琴练习智能辅助方法,其中,所述步骤S1和S3中任一步骤包括:将琴键图像坐标系变换到像素坐标系,基于像素坐标确定每个琴键的区域范围。3. a kind of intelligent assistant method for piano practice according to claim 1, wherein, any step in described steps S1 and S3 comprises: transforming the key image coordinate system to the pixel coordinate system, and determining the value of each key based on the pixel coordinates. geographic range. 4.根据权利要求1所述的一种钢琴练习智能辅助方法,其中,所述步骤S2和S4中任一步骤包括:每张图像对应视频中的一个时刻,通过时间关联琴键被触发后的产生的信息,从而关联被按下的琴键的坐标信息以来确定按下琴键的位置信息。4. a kind of intelligent assistant method for piano practice according to claim 1, wherein, any step in described steps S2 and S4 comprises: a moment in the corresponding video of each image, the generation after being triggered by time-related piano keys information, so as to correlate the coordinate information of the pressed key to determine the position information of the pressed key. 5.根据权利要求1所述一种钢琴练习智能辅助方法,其中,所述步骤S2和S4中任一步骤还包括,基于真实数据集,微调训练后的手势识别模型。5 . The intelligent assistance method for piano practice according to claim 1 , wherein any one of the steps S2 and S4 further comprises, based on a real data set, fine-tuning the trained gesture recognition model. 6 . 6.根据权利要求1所述一种钢琴练习智能辅助方法,其中,所述步骤S2和S4中任一步骤包括:读取被按下琴键的区域范围和手指坐标,两者进行实时的对比,若手指的位置处于被按下琴键的区域范围内,则判定所述被按下琴键被所述手指按下。6. a kind of intelligent assistant method for piano practice according to claim 1, wherein, any step in described step S2 and S4 comprises: read the area scope and finger coordinates of the pressed piano key, the two carry out real-time contrast, If the position of the finger is within the range of the pressed key, it is determined that the pressed key is pressed by the finger. 7.根据权利要求1所述一种钢琴练习智能辅助方法,其中,所述步骤S2和S4中任一步骤包括:将识别出的手势预测框图像传入到高分辨率网络作为主干神经网络,采用卷积、反卷积模块来生成多分辨率和高分辨率的独热图,对手势识别关节点的预测,输出手指坐标。7. a kind of intelligent assistant method for piano practice according to claim 1, wherein, any step in described steps S2 and S4 comprises: the gesture prediction frame image that identifies is passed in to high-resolution network as backbone neural network, Convolution and deconvolution modules are used to generate multi-resolution and high-resolution one-hot maps, predict joint points for gesture recognition, and output finger coordinates. 8.根据权利要求1所述一种钢琴练习智能辅助方法,其中,演奏视频是一个片段,对视频分解的过程中,选定演奏视频中对应片段的时间点。8 . The intelligent assistance method for piano practice according to claim 1 , wherein the performance video is a segment, and in the process of video decomposition, the time point of the corresponding segment in the performance video is selected. 9 . 9.一种钢琴练习智能辅助装置,包括,9. An intelligent auxiliary device for piano practice, comprising, 视频分解模块,用于将视频分解为图像;Video decomposition module for decomposing video into images; 琴键标注模块,用于选取所弹的音区图像,标出每个琴键的区域范围;The key labeling module is used to select the image of the sound area played, and mark the area range of each key; 信息采集模块,用于识别被按下琴键坐标;Information acquisition module, used to identify the coordinates of the pressed keys; 手势检测模块,用于预测手部遮罩的置信图;Gesture detection module for predicting the confidence map of the hand mask; 手势识别模块,用于预测手部关节点的置信图;Gesture recognition module, used to predict the confidence map of hand joint points; 手势评分模块,用于评价当前弹奏的指法和音符的准确性水平;Gesture scoring module, used to evaluate the accuracy level of the fingering and notes currently played; 弹奏指引模块,用于指引弹奏者弹奏;The playing guide module is used to guide the player to play; 视频录制装置,用于录制弹奏者现场弹奏的视频;Video recording device, used to record the video of the player playing live; 镭射指引装置,用于指引弹奏者指法和音符信息,其中,镭射装置通过发射镭射光线的方式指引。The laser guiding device is used to guide the fingering and note information of the player, wherein the laser device guides by emitting laser light. 10.一种计算机可读存储介质,其上存储有计算机指令,其特征在于该指令被处理器执行时实现如权利要求1至8中任一项权利要求所述的方法的步骤。10. A computer-readable storage medium having computer instructions stored thereon, characterized in that the instructions, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 8.
CN202110843452.6A 2021-07-26 2021-07-26 Intelligent auxiliary method, device and medium for piano practice Pending CN113657185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110843452.6A CN113657185A (en) 2021-07-26 2021-07-26 Intelligent auxiliary method, device and medium for piano practice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110843452.6A CN113657185A (en) 2021-07-26 2021-07-26 Intelligent auxiliary method, device and medium for piano practice

Publications (1)

Publication Number Publication Date
CN113657185A true CN113657185A (en) 2021-11-16

Family

ID=78490247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110843452.6A Pending CN113657185A (en) 2021-07-26 2021-07-26 Intelligent auxiliary method, device and medium for piano practice

Country Status (1)

Country Link
CN (1) CN113657185A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215441A (en) * 2018-10-19 2019-01-15 深圳市微蓝智能科技有限公司 A kind of Piano Teaching method, apparatus and computer storage medium
CN109887375A (en) * 2019-04-17 2019-06-14 西安邮电大学 An error correction method for piano practice based on image recognition processing
CN112488047A (en) * 2020-12-16 2021-03-12 上海悠络客电子科技股份有限公司 Piano fingering intelligent identification method
WO2021057810A1 (en) * 2019-09-29 2021-04-01 深圳数字生命研究院 Data processing method, data training method, data identifying method and device, and storage medium
CN112818981A (en) * 2021-01-15 2021-05-18 小叶子(北京)科技有限公司 Musical instrument playing key position prompting method and device, electronic equipment and storage medium
CN112836597A (en) * 2021-01-15 2021-05-25 西北大学 Multi-hand pose keypoint estimation method based on cascaded parallel convolutional neural network
CN112883804A (en) * 2021-01-21 2021-06-01 小叶子(北京)科技有限公司 Error correction method and device for hand motion during musical instrument playing and electronic equipment
CN113158748A (en) * 2021-02-03 2021-07-23 杭州小伴熊科技有限公司 Hand detection tracking and musical instrument detection combined interaction method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215441A (en) * 2018-10-19 2019-01-15 深圳市微蓝智能科技有限公司 A kind of Piano Teaching method, apparatus and computer storage medium
CN109887375A (en) * 2019-04-17 2019-06-14 西安邮电大学 An error correction method for piano practice based on image recognition processing
WO2021057810A1 (en) * 2019-09-29 2021-04-01 深圳数字生命研究院 Data processing method, data training method, data identifying method and device, and storage medium
CN112488047A (en) * 2020-12-16 2021-03-12 上海悠络客电子科技股份有限公司 Piano fingering intelligent identification method
CN112818981A (en) * 2021-01-15 2021-05-18 小叶子(北京)科技有限公司 Musical instrument playing key position prompting method and device, electronic equipment and storage medium
CN112836597A (en) * 2021-01-15 2021-05-25 西北大学 Multi-hand pose keypoint estimation method based on cascaded parallel convolutional neural network
CN112883804A (en) * 2021-01-21 2021-06-01 小叶子(北京)科技有限公司 Error correction method and device for hand motion during musical instrument playing and electronic equipment
CN113158748A (en) * 2021-02-03 2021-07-23 杭州小伴熊科技有限公司 Hand detection tracking and musical instrument detection combined interaction method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李锵;李晨曦;关欣;: "基于判决HMM和改进Viterbi的钢琴指法自动标注方法", 天津大学学报(自然科学与工程技术版), no. 08, pages 48 - 58 *

Similar Documents

Publication Publication Date Title
US11783615B2 (en) Systems and methods for language driven gesture understanding
US20180315329A1 (en) Augmented reality learning system and method using motion captured virtual hands
US8793118B2 (en) Adaptive multimodal communication assist system
CN104049754B (en) Real time hand tracking, posture classification and Interface Control
CN113657184B (en) Piano playing fingering evaluation method and device
US12014645B2 (en) Virtual tutorials for musical instruments with finger tracking in augmented reality
CN105068662B (en) A kind of electronic equipment for man-machine interaction
US20240013754A1 (en) Performance analysis method, performance analysis system and non-transitory computer-readable medium
CN111814733A (en) Concentration degree detection method and device based on head posture
CN116386424A (en) Method, device and computer readable storage medium for music teaching
CN113657185A (en) Intelligent auxiliary method, device and medium for piano practice
Fauzi et al. Recognition of Real-Time Angklung Kodály Hand Gesture using Mediapipe and Machine Learning Method
WO2022202266A1 (en) Image processing method, image processing system, and program
Kerdvibulvech et al. Real-time guitar chord estimation by stereo cameras for supporting guitarists
Kerdvibulvech et al. Guitarist fingertip tracking by integrating a Bayesian classifier into particle filters
JP7615818B2 (en) IMAGE PROCESSING METHOD, IMAGE PROCESSING SYSTEM, AND PROGRAM
Enkhbat et al. Using Hybrid Models for Action Correction in Instrument Learning Based on AI
Van Wyk et al. A multimodal gesture-based virtual interactive piano system using computer vision and a motion controller
WO2024212940A1 (en) Method and device for music teaching, and computer-readable storage medium
WO2023032422A1 (en) Processing method, program, and processing device
Kheldoun et al. Algsl89: An algerian sign language dataset
王釗 An Automatic Guitar Fingering Assessing System Based on Convolutional Neural Network and Spatio-temporal Support Vector Regression
CN119007295A (en) Method and system for identifying piano playing errors
Dillhoff Computer Vision Methods for Sign Language and Cognitive Evaluation through Physical Tasks
CN117218719A (en) Human body sitting posture detection and identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination