WO2019146466A1 - Information processing device, moving-image retrieval method, generation method, and program - Google Patents
Information processing device, moving-image retrieval method, generation method, and program Download PDFInfo
- Publication number
- WO2019146466A1 WO2019146466A1 PCT/JP2019/001084 JP2019001084W WO2019146466A1 WO 2019146466 A1 WO2019146466 A1 WO 2019146466A1 JP 2019001084 W JP2019001084 W JP 2019001084W WO 2019146466 A1 WO2019146466 A1 WO 2019146466A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- character string
- moving image
- image
- displayed
- character
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
Definitions
- the present invention relates to an information processing apparatus, a moving image search method, a generation method, and a program.
- the conventional online learning system does not provide a function of searching for a specific part that the user desires to view from the lecture moving image. Therefore, the user has to look for the part he / she wants to view by watching the lecture video from the beginning to the end, or by fast-forwarding. Such a problem may occur not only in lecture videos but also in any videos.
- the present disclosure aims to provide a technology capable of quickly searching for a specific part in a moving image that the user desires to view.
- An information processing apparatus is a moving image in which images of a plurality of first character strings are displayed, a second character string generated by character recognition of the image of the first character string, and A storage unit storing time information indicating a time when the first character string image is displayed in a moving image, a database storing the moving image in association with each other, a receiving unit receiving a search target character string, and the search A search unit for searching the database for a second character string including a target character string, time information corresponding to the second character string, and a moving image corresponding to the second character string, and reproducing the searched moving image And an output unit for outputting a screen including a second display area for displaying the searched second character string and the time information in chronological order.
- this aspect it is possible to provide a technology capable of quickly searching for a specific portion in the moving image that the user desires to view.
- the output unit may output, in the second display area, a screen on which the retrieved second character string and the time information are arranged in chronological order in the horizontal direction or the vertical direction and displayed. . According to this aspect, since the plurality of text information and time information are displayed in chronological order in the second area in the screen, it is possible to improve the visibility.
- the output unit further displays, in the second display area, a message indicating that an image of a first character string corresponding to the searched second character string is displayed in the moving image. You may do so. According to this aspect, the user can easily recognize on the screen that the search target is the first character string displayed in the moving image.
- the output unit is configured to display information indicating a position at which the image of the first character string corresponding to the searched second character string is displayed in the moving image so as to be superimposed on the moving image. It is also good. According to this aspect, the user can easily grasp the position where the character string to be searched is displayed in the moving image.
- the output unit may highlight a portion of the second character string displayed in the second display area that corresponds to the character string to be searched. According to this aspect, for example, even when the number of characters of the second character string is large, it becomes possible to easily grasp which part of the second character string the character string to be searched corresponds to.
- the moving image is a moving image obtained by shooting a lecturer giving a lecture using a blackboard
- the first character string is a character string including a plurality of handwritten characters handwritten on the blackboard
- An information processing apparatus extracts a first image which is an area in which an image of a first character string is displayed in a moving image, and the display of the image of the first character string in the moving image is
- An extraction unit for outputting time information to be started; a division unit for dividing the first image extracted by the extraction unit into a second image for each character included in the first character string;
- a character recognition unit that outputs a plurality of candidate characters for each of the second images by performing character recognition on each of the two images, and the plurality of candidate characters output for each of the second images as the first character string It is determined that the plurality of candidate character strings generated by combining in accordance with the arrangement order of the characters is most similar to any of the plurality of candidate character strings among the plurality of character strings that may be used in the moving image.
- the database can be automatically generated, and the user can quickly use a technology capable of quickly searching for a specific portion desired to be viewed in the moving image. Become.
- a moving image search method includes: a second character string generated by character recognition of an image of a first character string for a moving image in which images of a plurality of first character strings are displayed;
- the moving image search method is performed by an information processing apparatus including a storage unit that stores a database that stores time information indicating the time when an image of the first character string is displayed in the moving image and the moving image.
- a step of receiving a search target character string, a second character string including the search target character string, time information corresponding to the second character string, and a moving image corresponding to the second character string from the database Outputting a screen including a step of searching, a first display area for reproducing the searched moving image, and a second display area for displaying the searched second character string and time information in chronological order Have.
- a program is a computer that generates a second character string generated by character recognition of an image of a first character string for a moving image in which images of a plurality of first character strings are displayed.
- Storage means for storing a database for storing time information indicating the time when the image of the first character string is displayed in the moving image, the moving image, and receiving means for receiving a character string to be searched;
- Search means for searching the database for a second character string including the character string to be searched for, time information corresponding to the second character string, and a moving image corresponding to the second character string;
- Function as an output means for outputting a screen including a first display area for reproducing the second character string and the second display area for displaying the searched second character string and time information in chronological order.
- FIG. 1 is a diagram illustrating an example of a moving image distribution system according to an embodiment.
- the moving image distribution system includes a distribution server 10 and a terminal 20.
- the distribution server 10 and the terminal 20 can communicate with each other via a wireless or wired communication network N.
- a plurality of terminals 20 may be included in the present moving image distribution system.
- the distribution server 10 and the terminal 20 may be collectively referred to as an information processing apparatus, or only the distribution server 10 may be referred to as an information processing apparatus.
- the distribution server 10 is a server that distributes a lecture moving image, and has a function of transmitting data of the lecture moving image requested from the terminal 20 to the terminal 20.
- the distribution server 10 may be one or more physical or virtual servers, or may be a cloud server.
- the terminal 20 is a terminal operated by the user, and may be a terminal provided with a communication function, such as a smartphone, a tablet terminal, a mobile phone, a personal computer (PC), a laptop PC, a personal digital assistant (PDA), a home gaming device, etc.
- a communication function such as a smartphone, a tablet terminal, a mobile phone, a personal computer (PC), a laptop PC, a personal digital assistant (PDA), a home gaming device, etc.
- PC personal computer
- PDA personal digital assistant
- home gaming device etc.
- any terminal can be used.
- the user inputs a character string to be searched (search keyword), and an image of a character string handwritten by the lecturer on the blackboard in the lecture moving image (hereinafter referred to as "handwritten character string") in the lecture moving image.
- search keyword a character string to be searched
- handwritten character string an image of a character string handwritten by the lecturer on the blackboard in the lecture moving image
- lecture videos that include the search target character string. For example, when the user inputs “organic compound” as a search target character string on the search screen of the terminal 20, a lecture moving image in which the lecturer wrote “organic compound” on the blackboard is listed on the screen of the terminal 20.
- the distribution server 10 uses the text information (second character string) generated by character recognition of the image of the handwritten character string (first character string) and the handwriting of the lecture moving image
- the time information indicating the time when the image of the character string is displayed is associated with the lecture moving image (or the information uniquely identifying the lecture moving image) and stored in the database. More specifically, the time information may be information indicating a time from when the handwritten character string is displayed in the lecture moving image to when the display ends (hereinafter, referred to as “appearing time”).
- the database is called “lecture data DB (Database)”.
- the distribution server 10 searches the lecture moving image including the character string to be searched using the lecture data DB, so that the character string to be searched for is a handwritten sentence or character string written by the lecturer on the blackboard. It will be possible to search the included lecture videos.
- FIG. 2 is a diagram showing an example of the hardware configuration of the distribution server 10.
- the distribution server 10 includes a central processing unit (CPU) 11, a storage device 12 such as a memory, a communication IF (Interface) 13 for performing wired or wireless communication, an input device 14 for receiving an input operation, and an output device 15 for outputting information.
- CPU central processing unit
- storage device 12 such as a memory
- communication IF Interface
- input device 14 for receiving an input operation
- an output device 15 for outputting information.
- Each functional unit described in the functional block configuration to be described later can be realized by processing that a program stored in the storage device 12 causes the CPU 11 to execute.
- the program can be stored, for example, in a non-temporary recording medium.
- FIG. 3 is a diagram showing an example of a functional block configuration of the distribution server 10.
- the distribution server 10 includes a reception unit 101, a search unit 102, an output unit 103, a generation unit 104, and a storage unit 105.
- the storage unit 105 stores lecture data DB.
- the reception unit 101 has a function of receiving a search target character string input by the user on the screen of the terminal 20.
- the search unit 102 performs lecture data on “text information” including the character string to be searched for received by the reception unit 101, “appearance time” corresponding to the text information, and “lecture moving image” corresponding to the text information. Search from DB
- the output unit 103 displays a region (first region) for reproducing the lecture moving image searched by the search unit 102, and a region (second region) for displaying the searched text information and appearance time (time information) in chronological order.
- Output a screen that contains The output screen is displayed on the display of the terminal 20.
- the output unit 103 may have, for example, a web server function, and may have a function of transmitting a website to which a lecture moving image is distributed to the terminal 20.
- the output unit 103 may have a function of transmitting, to the terminal 20, content for displaying a lecture moving image or the like on the screen of an application installed on the terminal 20.
- the generation unit 104 generates a lecture data DB by performing character recognition on a handwritten character string displayed in a moving image of a lecture moving image.
- the generation unit 104 further includes an area extraction unit 1041, a division unit 1042, a single character recognition engine 1043, a character string recognition engine 1044, and a DB generation unit 1045.
- the processing performed by the area extraction unit 1041, the division unit 1042, the single character recognition engine 1043, the character string recognition engine 1044, and the DB generation unit 1045 will be described later.
- the generation unit 104 of the distribution server 10 is described on the premise of generating a lecture data DB, but the distribution server 10 does not necessarily have to create the lecture data DB by itself, and an external information processing apparatus It may be generated by In that case, the generation unit 104 is not implemented in the distribution server 10, but implemented in another information processing apparatus different from the distribution server 10, and the lecture data DB generated by the information processing apparatus is stored in the storage unit 105 of the distribution server 10. It may be registered in the
- FIG. 4 is a flowchart showing an example of a processing procedure when generating the lecture data DB.
- step S101 the area extraction unit 1041 extracts an image (first image) of a character display area in which a handwritten character string is displayed in a lecture moving image. Further, the time (appearing time) from the display of the handwritten character string to the end of the display in the lecture moving image is determined and output. If there are a plurality of handwritten character strings, extraction of an image of the character display area and determination of appearance time are performed on each handwritten character string.
- the region extraction unit 1041 performs image processing on a moving image (FIG. 5A) in which a lecturer is writing characters on a blackboard (FIG. 5A) by a predetermined number of frames (for example, 80 frames). Extract the area (area other than background) that is distinguished from For example, the region extraction unit 1041 outputs a score (probability) indicating the possibility of being different from the background image in pixel units and in units of the predetermined number of frames. As a result of this processing, a score equal to or greater than a predetermined value is output for the pixels of the area where the characters are written on the blackboard and the pixels of the area where the lecturer appears.
- the region extraction unit 1041 extracts pixels whose output score is equal to or more than a predetermined value.
- An example of the extracted pixel is shown in FIG. 5 (b).
- An extraction part 500 shown in FIG. 5B indicates a part where the extracted pixels are gathered.
- the area extraction unit 1041 preferably performs a process excluding the area in which the lecturer appears. For example, in the process of extracting an area distinguished from the background, the area extracting unit 1041 extracts pixels (areas) whose score variation in a predetermined time length (for example, 10 seconds) is equal to or less than a predetermined threshold.
- the area extraction unit 1041 determines that the area of the area where the extracted pixels are gathered is larger than the predetermined value.
- the instructor may be regarded as being extracted and treated as an exclusion target. This makes it possible to prevent the pixel in which the lecturer appears from being extracted as an area distinguished from the background.
- the region extraction unit 1041 determines the time from the appearance of the part where the pixels are gathered to the disappearance in the lecture moving image as the appearance time when the handwritten character string is displayed in the lecture moving image.
- the area extracting unit 1041 determines the position (for example, the pixel position at the lower left of the rectangle when the lower left of the moving image is the starting point) and the size (longitudinal and lateral directions) Determine the size of
- the frame 510 shown in FIG. 5B is an example of the determined rectangular frame.
- the region extraction unit 1041 cuts out the region surrounded by the rectangular frame from the image of an arbitrary frame from the images of the respective frames constituting the lecture moving image during the appearance time, whereby handwritten characters in the lecture moving image Extract the image of the character display area where the column is displayed.
- step S102 the dividing unit 1042 divides the image of the character display area extracted by the area extracting unit 1041 into an image (second image) of one character unit constituting a handwritten character string.
- the dividing unit 1042 binarizes the image of the character display area and, for example, regards a portion where the illuminance of all pixels in the vertical axis direction of the image is less than a predetermined threshold as a break of the character. Divide into The specific example of the position of a break is shown in FIG.5 (c).
- step S103 the single character recognition engine 1043 performs character recognition on the image of one character unit constituting the handwritten character string, and outputs a plurality of candidate characters for each image.
- a specific example is shown using FIG.
- Candidate characters 1 to 5 shown in FIG. 6 indicate examples of a plurality of candidate characters output by performing character recognition on each of the “different”, “sex”, and “body” images.
- the candidate character output by the single character recognition engine 1043 is directly used as text information in the lecture data DB without proceeding to the processing procedure of step S104. It may be stored.
- the single character recognition engine 1043 has the ability to correctly recognize “different”, “sex”, and “body” with respect to “different”, “sex”, and “body” images.
- the “isomer” obtained by combining the recognized “different”, “sex”, and “body” text may be stored as it is in the lecture data DB as text information.
- step S104 the character string recognition engine 1044 (output unit) generates a plurality of candidate character strings by combining a plurality of candidate characters output for each one-character image according to the arrangement order of characters in the handwritten character string. . For example, in the example of FIG. 6, it is generated by combining five candidate characters corresponding to "different", five candidate characters corresponding to "sex", and five candidate characters corresponding to "body”. Generate 125 (5 ⁇ 5 ⁇ 5) candidate character strings.
- the character string recognition engine 1044 has learned in advance a plurality of keywords (character strings) which may be used in the lecture moving image, and by inputting an arbitrary character string, the plurality of keywords can be selected. It has a function of outputting a keyword that is determined to be most similar to the input character string and a score indicating the similarity.
- the keyword which may be used in the lecture moving image is a keyword as described in the index of a textbook such as "Yamaidai Country” or "Tokugawa Ieyasu".
- keywords are generally different for each subject.
- the character string recognition engine 1044 selects one of a plurality of generated candidate character strings from among keywords (character strings) learned in advance as a plurality of keywords (character strings) which may be used in the lecture moving image.
- the keyword (character string) determined to be most similar is output as text information corresponding to the handwritten character string. More specifically, the character string recognition engine 1044 outputs, for each of the plurality of generated candidate character strings, a keyword (similarity score (score)) to the keyword judged to be the most similar, and the outputted keyword having the highest similarity. Is output as text information corresponding to the handwritten character string.
- the string recognition engine 1044 outputs and outputs the degree of similarity between each of the 125 candidate strings and the learned keywords (including at least the “isomer” in the example of FIG. 6).
- the example at the time of outputting the learned keyword "isomer” with the highest similarity as text information corresponding to a handwritten character string is shown. Even if the single character recognition engine 1043 can not correctly recognize the character and the 125 candidate strings do not contain the "isomer" itself, there are a plurality of candidate strings. If a candidate character string (for example, "disidentate” etc.) similar to "isomer" is included in ",” “isomer” is output from character string recognition engine 1044 as text information corresponding to a handwritten character string. It will be.
- the generation unit 104 repeatedly performs the processing procedure up to the steps S101 to 104 described above for each handwritten character string displayed in the lecture moving image, whereby each of the plurality of handwritten character strings displayed in the lecture moving image is generated. For keywords and time of appearance.
- step S105 the DB generation unit 1045 receives the text information output from the character string recognition engine 1044 in the processing procedure in step S104, the appearance time output from the area extraction unit 1041 in step S101, and the lecture moving image to be processed.
- a lecture data DB is generated in correspondence with (a file name of a lecture moving image may be used).
- FIG. 7 is a diagram showing an example of the lecture data DB.
- An identifier for uniquely identifying a lecture moving image is stored in the "lecture moving image".
- the identifier includes the subject of the lecture video, the lecture name, and the like.
- the identifier may be, for example, a file name including a subject of a lecture moving image.
- the “appearing time” stores the time from when the handwritten character string is displayed in the lecture video until it disappears. Text data corresponding to the handwritten character string is stored in the "text information". In the example of FIG.
- FIG. 8A is an example of a screen for searching a lecture moving image.
- an input box 1001 for inputting a character string to be searched and a subject of the lecture moving image to be searched is provided.
- the search unit 102 accesses the lecture data DB, and the character string of the search target in the text information of the lecture moving image corresponding to the input subject Search whether there is a lecture video that includes.
- the output unit 103 When there is a lecture moving image in which a text string to be searched is included in the text information, the output unit 103 outputs a screen displaying a list of the searched lecture moving images. Note that the output unit 103 outputs a screen displaying a list of lecture moving images when there are a plurality of searched lecture moving images, and when there is one searched lecture moving image, “replay the lecture moving image described later is output. It is also possible to make a direct transition to the screen to be displayed (FIG. 9 (a)).
- FIG. 8B is an example of a screen displaying a list of searched lecture moving images.
- the search results are displayed in a list in the display area 1003. For example, if the user selects "Chemistry” as the subject and enters "Aeon” as the search target character string, the lecturer writes "Ion” on the blackboard from among the lecture videos on chemistry.
- the lecture videos are displayed in a list in the display area 1003 as a search result.
- the user selects a lecture moving image desired to be viewed from among the lecture moving images displayed in a list in the display area 1003, a transition is made to a screen for reproducing the lecture moving image.
- the display area 1003 has a function of accepting selection of a lecture moving image desired to be viewed by the user in addition to displaying a list of searched lecture moving images, the user views the screen including the display area 1003 May be referred to as a screen for receiving a selection of a lecture moving image for which
- FIG. 9A An example of the screen for reproducing the lecture moving image is shown in FIG.
- a display area 2001 (first area) for reproducing a lecture moving image, text information including a character string to be searched, and a start time at which the display of a handwritten character string is started are arranged in the horizontal direction.
- a display area 2002 (second area) arranged and displayed in chronological order and a display area 2004 (third area) displaying a character string searched in the past with respect to the subject of the lecture moving image reproduced in the display area 2001 are included.
- a button 2003 for displaying a list of start times and text information is displayed.
- FIG. 9 (b) a display in which text information including a character string to be searched and time stamp information are arranged in chronological order in the vertical direction instead of the display area 2002.
- Area 2005 (second area) is displayed.
- a message indicating that the search result is a handwritten character string to be displayed in the lecture moving image (a handwritten character string corresponding to the searched text information is displayed in the lecture moving image
- the word "board” is displayed as a message indicating that there is a problem.
- the number of times the text information including the character string to be searched for is searched is displayed in the display area 2102.
- a portion corresponding to the character string to be searched may be highlighted.
- the portion of “ion” which is a character string to be searched out of “complex ion formation reaction” and “hydrogen ion” is highlighted.
- the display area 2002 and the display area 2005 may further display an end time at which the display of the handwritten character string ends.
- the appearance time of the handwritten character string may be displayed as "0:05 to 3:10 complex ion forming reaction".
- information indicating the position where the handwritten character string corresponding to the searched text information is displayed in the lecture moving image may be displayed superimposed on the lecture moving image.
- a frame indicating a position where “complex ion forming reaction”, which is the searched text information, is displayed in the lecture moving image 2101 may be displayed.
- information indicating the position at which the frame 2101 is displayed and the size of the frame 2101 may be further stored in the lecture data DB for each record.
- the same information as the information indicating the position may be stored.
- the frame 2101 may be continuously displayed on the display area 2001 during the appearance time corresponding to the searched text information.
- reproduction of the lecture moving image is not started in the display area 2001, and the user starts the reproduction start button displayed in the display area 2001.
- the user can start playing back lecture videos for the first time by selecting time stamp information desired to be viewed from time stamp information and text information displayed in display area 2002 or display area 2005. It may be done.
- the user may swipe the display area 2002 from right to left (or left to right) to display the next (or previous) start time and text information.
- the text having a start time of 0: 05 disappears from the left and the text has a start time of 2: 15
- the information may move from the right to the left, and the next text information may appear on the right.
- the next (or previous) time stamp information and text information may be displayed.
- the output unit 103 searches the display area 2002 for at least the text included in the searched text information. Only partial text including the target character string may be output. As a result, when the number of characters of the text included in the text information is too large to display all the characters in the display area 2002 or the display area 2005, or when the terminal 20 is a smartphone or the like and the display size is small Even when it is difficult to display all the information, it is possible to display the text information without largely sacrificing the visibility.
- the character string displayed in the display area 2004 for the subject of the lecture moving image is input among the character strings previously input by the plurality of users using the moving image distribution system as the search target character string It may be displayed in descending order of the number of times performed.
- the selected character string may be automatically input to the input box 1001.
- the lecture data DB stores text information obtained by converting the characters written by the lecturer on the blackboard into text in the lecture video, and comparing the character string to be searched with the text information to search for the lecture video To do.
- the present embodiment has a technical effect that the search speed can be improved as compared with a method of searching for a character string while directly analyzing a moving image of a lecture moving image.
- the time when the display of the handwritten character string is started (the time when the character string is written on the blackboard) and the time when the display is ended (for example And the like) are included, but it is also possible to include only the time when the display of the handwritten character string is started. Thereby, the data volume of lecture data DB can be reduced.
- the time when the display of the handwritten character string is started and the time when the display is terminated may be collectively referred to as "time information", or only the time when the display of the handwritten character string is started is "time information”. You may call it.
- the moving image in which the character string is displayed is described as a lecture moving image in which the lecturer gives a lecture while writing the handwriting character on the blackboard
- the present embodiment relates to a lecture moving image and handwritten characters. It is not limited to.
- the present embodiment can be applied to any character string or moving image as long as the character string is displayed on the moving image.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Provided is an information processing device (10) having: a storage unit (105) in which is stored a database containing, with regard to a moving image in which a plurality of images of a first character string are displayed, a second character string generated by character recognition of the images of the first character string, time information that indicates the time at which the images of the first character string are displayed by the moving image, and the moving image, in correlation to each other; an acceptance unit (101) that accepts a character string to be retrieved; a retrieval unit (102) that retrieves from the database a second character string that includes the character string to be retrieved, time information that corresponds to the second character string, and a moving image that corresponds to the second character string; and an output unit (103) that outputs a screen that includes a first display region (2001) where the retrieved moving image is played back and a second display region (2002) where the retrieved second character string and time information are displayed in chronological order.
Description
本出願は、2018年1月25日に出願された日本出願番号(特願)2018-010904号に基づくもので、ここにその記載内容を援用する。
This application is based on Japanese Patent Application No. 2018-010904 filed on Jan. 25, 2018, the contents of which are incorporated herein by reference.
本発明は、情報処理装置、動画検索方法、生成方法及びプログラムに関する。
The present invention relates to an information processing apparatus, a moving image search method, a generation method, and a program.
ユーザが、Webブラウザ等を用いて学習を行うことが可能な、オンライン学習システムが知られている。オンライン学習システムを用いることで、ユーザは、興味のある講義の動画を視聴したり、テストを受けることで自分の理解度を把握したり、テストで躓いた問題を重点的に復習したりすることができ、効率的に学習を進めることができる。なお、ネットワークを利用した遠隔学習支援システムとして、例えば特許文献1に記載の技術が知られている。
There is known an online learning system in which a user can perform learning using a web browser or the like. By using the online learning system, users can watch videos of lectures they are interested in, understand their level of understanding by taking tests, and focus on reviewing problems encountered in the tests. And can proceed with learning efficiently. As a remote learning support system using a network, for example, the technology described in Patent Document 1 is known.
ユーザが苦手科目の復習をする場合など、必ずしも講義動画を最初から最後まで全て視聴するのではなく、特定の部分のみを視聴したいといったニーズが存在すると考えられる。例えば、世界史の科目のうちアメリカの歴史について復習をしたいために、世界史の講義動画の中で講師がアメリカについて説明をしている部分のみを視聴したいといったニーズがあると考えられる。
しかしながら、従来のオンライン学習システムでは、講義動画の中から、ユーザが視聴を所望する特定の部分を検索する機能が提供されていない。そのため、ユーザは、講義動画を最初から最後まで視聴するか、又は早送り等を行うことで視聴したい部分を自ら探す必要があった。このような問題は、講義動画に限らずあらゆる動画においても生じ得る。 In the case where the user is reviewing a subject who is not good, it is considered that there is a need for viewing only a specific part, not necessarily viewing the whole lecture video from the beginning to the end. For example, in order to review the history of the United States among the subjects of world history, there is a need to want to view only the part where the lecturer explains the United States in the lecture video of the world history.
However, the conventional online learning system does not provide a function of searching for a specific part that the user desires to view from the lecture moving image. Therefore, the user has to look for the part he / she wants to view by watching the lecture video from the beginning to the end, or by fast-forwarding. Such a problem may occur not only in lecture videos but also in any videos.
しかしながら、従来のオンライン学習システムでは、講義動画の中から、ユーザが視聴を所望する特定の部分を検索する機能が提供されていない。そのため、ユーザは、講義動画を最初から最後まで視聴するか、又は早送り等を行うことで視聴したい部分を自ら探す必要があった。このような問題は、講義動画に限らずあらゆる動画においても生じ得る。 In the case where the user is reviewing a subject who is not good, it is considered that there is a need for viewing only a specific part, not necessarily viewing the whole lecture video from the beginning to the end. For example, in order to review the history of the United States among the subjects of world history, there is a need to want to view only the part where the lecturer explains the United States in the lecture video of the world history.
However, the conventional online learning system does not provide a function of searching for a specific part that the user desires to view from the lecture moving image. Therefore, the user has to look for the part he / she wants to view by watching the lecture video from the beginning to the end, or by fast-forwarding. Such a problem may occur not only in lecture videos but also in any videos.
そこで、本開示は、動画の中でユーザが視聴を所望する特定の部分を迅速に検索することが可能な技術を提供することを目的とする。
Therefore, the present disclosure aims to provide a technology capable of quickly searching for a specific part in a moving image that the user desires to view.
本開示の一態様に係る情報処理装置は、複数の第1文字列の画像が表示される動画について、該第1文字列の画像を文字認識することで生成される第2文字列と、前記動画で該第1文字列の画像が表示される時間を示す時間情報と、前記動画とを対応づけて格納するデータベースを記憶する記憶部と、検索対象の文字列を受け付ける受付部と、前記検索対象の文字列を含む第2文字列と、該第2文字列に対応する時間情報と、該第2文字列に対応する動画とを前記データベースから検索する検索部と、検索された動画を再生する第1表示領域と、検索された第2文字列と時間情報とを時系列順に表示する第2表示領域とを含む画面を出力する出力部と、を有する。この態様によれば、動画の中でユーザが視聴を所望する特定の部分を迅速に検索することが可能な技術を提供することが可能になる。
An information processing apparatus according to an aspect of the present disclosure is a moving image in which images of a plurality of first character strings are displayed, a second character string generated by character recognition of the image of the first character string, and A storage unit storing time information indicating a time when the first character string image is displayed in a moving image, a database storing the moving image in association with each other, a receiving unit receiving a search target character string, and the search A search unit for searching the database for a second character string including a target character string, time information corresponding to the second character string, and a moving image corresponding to the second character string, and reproducing the searched moving image And an output unit for outputting a screen including a second display area for displaying the searched second character string and the time information in chronological order. According to this aspect, it is possible to provide a technology capable of quickly searching for a specific portion in the moving image that the user desires to view.
上記態様において、前記出力部は、前記第2表示領域に、検索された第2文字列と時間情報とを、横方向又は縦方向に時系列順に並べて表示する画面を出力するようにしてもよい。この態様によれば、画面内の第2領域に複数のテキスト情報と時間情報とが時系列順に表示されるため、視認性を向上させることが可能になる。
In the above aspect, the output unit may output, in the second display area, a screen on which the retrieved second character string and the time information are arranged in chronological order in the horizontal direction or the vertical direction and displayed. . According to this aspect, since the plurality of text information and time information are displayed in chronological order in the second area in the screen, it is possible to improve the visibility.
上記態様において、前記出力部は、前記第2表示領域に、更に、検索された第2文字列に対応する第1文字列の画像が、前記動画に表示されていることを示すメッセージを表示するようにしてもよい。この態様によれば、ユーザは、検索対象が、動画に表示されている第1文字列であることを画面上で容易に認識することが可能になる。
In the above aspect, the output unit further displays, in the second display area, a message indicating that an image of a first character string corresponding to the searched second character string is displayed in the moving image. You may do so. According to this aspect, the user can easily recognize on the screen that the search target is the first character string displayed in the moving image.
上記態様において、前記出力部は、検索された第2文字列に対応する第1文字列の画像が前記動画の中で表示される位置を示す情報を、前記動画に重ねて表示するようにしてもよい。この態様によれば、ユーザは、検索対象の文字列が動画内で表示されている位置を容易に把握することが可能になる。
In the aspect described above, the output unit is configured to display information indicating a position at which the image of the first character string corresponding to the searched second character string is displayed in the moving image so as to be superimposed on the moving image. It is also good. According to this aspect, the user can easily grasp the position where the character string to be searched is displayed in the moving image.
上記態様において、前記出力部は、前記第2表示領域に表示する第2文字列のうち、前記検索対象の文字列に該当する部分を強調表示するようにしてもよい。この態様によれば、例えば第2文字列の文字数が多い場合であっても、検索対象の文字列が第2文字列のうちどの部分に該当するのかを容易に把握することが可能になる。
In the above aspect, the output unit may highlight a portion of the second character string displayed in the second display area that corresponds to the character string to be searched. According to this aspect, for example, even when the number of characters of the second character string is large, it becomes possible to easily grasp which part of the second character string the character string to be searched corresponds to.
上記態様において、前記動画は、講師が黒板を用いて講義を行っている様子を撮影した動画であり、前記第1文字列は、前記黒板に手書きで書かれた複数の手書き文字を含む文字列であってもよい。この態様によれば、ユーザは、講義の動画の中で黒板に書かれた手書き文字のうち、検索対象の文字列が表示される部分を容易に検索することが可能になる。
In the above aspect, the moving image is a moving image obtained by shooting a lecturer giving a lecture using a blackboard, and the first character string is a character string including a plurality of handwritten characters handwritten on the blackboard It may be According to this aspect, it is possible for the user to easily search the portion of the handwritten character written on the blackboard in the animation of the lecture, in which the character string to be searched is displayed.
本開示の他の態様に係る情報処理装置は、動画内で第1文字列の画像が表示される領域である第1画像を抽出し、前記動画内で前記第1文字列の画像の表示が開始される時間情報を出力する抽出部と、前記抽出部で抽出された前記第1画像を、前記第1文字列に含まれる文字ごとの第2画像に分割する分割部と、前記複数の第2画像の各々について文字認識を行うことで、前記第2画像ごとに複数の候補文字を出力する文字認識部と、前記第2画像ごとに出力された前記複数の候補文字を前記第1文字列における文字の並び順に従って組み合わせることで生成される複数の候補文字列について、前記動画で使用される可能性のある複数の文字列のうち、前記複数の候補文字列のいずれかに最も類似すると判定される文字列を、第2文字列として出力する出力部と、前記出力部で出力された前記第2文字列と、前記抽出部で出力された前記時間情報と、前記動画とを対応づけたデータベースを生成する生成部と、を有する。この態様によれば、データベースを自動的に生成することができ、ユーザは、動画の中で視聴を所望する特定の部分を迅速に検索することが可能な技術を迅速に利用することが可能になる。
An information processing apparatus according to another aspect of the present disclosure extracts a first image which is an area in which an image of a first character string is displayed in a moving image, and the display of the image of the first character string in the moving image is An extraction unit for outputting time information to be started; a division unit for dividing the first image extracted by the extraction unit into a second image for each character included in the first character string; A character recognition unit that outputs a plurality of candidate characters for each of the second images by performing character recognition on each of the two images, and the plurality of candidate characters output for each of the second images as the first character string It is determined that the plurality of candidate character strings generated by combining in accordance with the arrangement order of the characters is most similar to any of the plurality of candidate character strings among the plurality of character strings that may be used in the moving image. And the second string And an output unit for outputting the second character string output from the output unit; and a generation unit for generating a database in which the time information output from the extraction unit and the moving image are associated with each other. . According to this aspect, the database can be automatically generated, and the user can quickly use a technology capable of quickly searching for a specific portion desired to be viewed in the moving image. Become.
本開示の他の態様に係る動画検索方法は、複数の第1文字列の画像が表示される動画について、該第1文字列の画像を文字認識することで生成される第2文字列と、前記動画で該第1文字列の画像が表示される時間を示す時間情報と、前記動画とを対応づけて格納するデータベースを記憶する記憶部を有する情報処理装置が行う動画検索方法であって、検索対象の文字列を受け付けるステップと、前記検索対象の文字列を含む第2文字列と、該第2文字列に対応する時間情報と、該第2文字列に対応する動画とを前記データベースから検索するステップと、検索された動画を再生する第1表示領域と、検索された第2文字列と時間情報とを時系列順に表示する第2表示領域とを含む画面を出力するステップと、を有する。この態様によれば、動画の中でユーザが視聴を所望する特定の部分を迅速に検索することが可能な技術を提供することが可能になる。
A moving image search method according to another aspect of the present disclosure includes: a second character string generated by character recognition of an image of a first character string for a moving image in which images of a plurality of first character strings are displayed; The moving image search method is performed by an information processing apparatus including a storage unit that stores a database that stores time information indicating the time when an image of the first character string is displayed in the moving image and the moving image. A step of receiving a search target character string, a second character string including the search target character string, time information corresponding to the second character string, and a moving image corresponding to the second character string from the database Outputting a screen including a step of searching, a first display area for reproducing the searched moving image, and a second display area for displaying the searched second character string and time information in chronological order Have. According to this aspect, it is possible to provide a technology capable of quickly searching for a specific portion in the moving image that the user desires to view.
本開示の他の態様に係るプログラムは、コンピュータを、複数の第1文字列の画像が表示される動画について、該第1文字列の画像を文字認識することで生成される第2文字列と、前記動画で該第1文字列の画像が表示される時間を示す時間情報と、前記動画とを対応づけて格納するデータベースを記憶する記憶手段と、検索対象の文字列を受け付ける受付手段と、前記検索対象の文字列を含む第2文字列と、該第2文字列に対応する時間情報と、該第2文字列に対応する動画とを前記データベースから検索する検索手段と、検索された動画を再生する第1表示領域と、検索された第2文字列と時間情報とを時系列順に表示する第2表示領域とを含む画面を出力する出力手段と、として機能させる。この態様によれば、動画の中でユーザが視聴を所望する特定の部分を迅速に検索することが可能な技術を提供することが可能になる。
A program according to another aspect of the present disclosure is a computer that generates a second character string generated by character recognition of an image of a first character string for a moving image in which images of a plurality of first character strings are displayed. Storage means for storing a database for storing time information indicating the time when the image of the first character string is displayed in the moving image, the moving image, and receiving means for receiving a character string to be searched; Search means for searching the database for a second character string including the character string to be searched for, time information corresponding to the second character string, and a moving image corresponding to the second character string; Function as an output means for outputting a screen including a first display area for reproducing the second character string and the second display area for displaying the searched second character string and time information in chronological order. According to this aspect, it is possible to provide a technology capable of quickly searching for a specific portion in the moving image that the user desires to view.
本開示によれば、動画の中でユーザが視聴を所望する特定の部分を迅速に検索することが可能な技術を提供することができる。
According to the present disclosure, it is possible to provide a technology capable of quickly searching for a specific portion in a moving image that the user desires to view.
添付図面を参照して、本発明の好適な実施形態について説明する。なお、各図において、同一の符号を付したものは、同一又は同様の構成を有する。
Preferred embodiments of the present invention will be described with reference to the accompanying drawings. In addition, what attached the same code | symbol in each figure has the same or same structure.
<システム構成>
図1は、実施形態に係る動画配信システムの一例を示す図である。本動画配信システムは、配信サーバ10及び端末20を含む。配信サーバ10及び端末20は、無線又は有線の通信ネットワークNを介して相互に通信することができる。図1には、端末20が1つずつ図示されているが、本動画配信システムには、複数の端末20が含まれていてもよい。本実施形態では、配信サーバ10及び端末20をまとめて情報処理装置と称してもよいし、配信サーバ10のみを情報処理装置と称してもよい。 <System configuration>
FIG. 1 is a diagram illustrating an example of a moving image distribution system according to an embodiment. The moving image distribution system includes adistribution server 10 and a terminal 20. The distribution server 10 and the terminal 20 can communicate with each other via a wireless or wired communication network N. Although one terminal 20 is illustrated in FIG. 1, a plurality of terminals 20 may be included in the present moving image distribution system. In the present embodiment, the distribution server 10 and the terminal 20 may be collectively referred to as an information processing apparatus, or only the distribution server 10 may be referred to as an information processing apparatus.
図1は、実施形態に係る動画配信システムの一例を示す図である。本動画配信システムは、配信サーバ10及び端末20を含む。配信サーバ10及び端末20は、無線又は有線の通信ネットワークNを介して相互に通信することができる。図1には、端末20が1つずつ図示されているが、本動画配信システムには、複数の端末20が含まれていてもよい。本実施形態では、配信サーバ10及び端末20をまとめて情報処理装置と称してもよいし、配信サーバ10のみを情報処理装置と称してもよい。 <System configuration>
FIG. 1 is a diagram illustrating an example of a moving image distribution system according to an embodiment. The moving image distribution system includes a
配信サーバ10は、講義動画を配信するサーバであり、端末20から要求された講義動画のデータを端末20に送信する機能を有する。配信サーバ10は、1又は複数の物理的又は仮想的なサーバであってもよいし、クラウドサーバであってもよい。
The distribution server 10 is a server that distributes a lecture moving image, and has a function of transmitting data of the lecture moving image requested from the terminal 20 to the terminal 20. The distribution server 10 may be one or more physical or virtual servers, or may be a cloud server.
端末20は、ユーザが操作する端末であり、スマートフォン、タブレット端末、携帯電話機、パーソナルコンピュータ(PC)、ノートPC、携帯情報端末(PDA)、家庭用ゲーム機器など、通信機能を備えた端末であればあらゆる端末を用いることができる。
The terminal 20 is a terminal operated by the user, and may be a terminal provided with a communication function, such as a smartphone, a tablet terminal, a mobile phone, a personal computer (PC), a laptop PC, a personal digital assistant (PDA), a home gaming device, etc. For example, any terminal can be used.
本実施形態では、ユーザは、検索対象の文字列(検索キーワード)を入力することで、講義動画において講師が黒板に手書きで書いた文字列(以下、「手書き文字列」と言う。)の画像に、検索対象の文字列が含まれる講義動画を検索することができる。例えば、ユーザが端末20の検索画面に検索対象の文字列として「有機化合物」を入力すると、講師が黒板に「有機化合物」と書いた講義動画が端末20の画面上に一覧表示される。また、ユーザが、一覧表示された講義動画の中から視聴したい講義動画を選択すると、端末20の画面上にて講義動画の再生が開始されると共に、講義動画の時間軸上において講師が黒板に「有機化合物」と書いた時間(例えば30分の動画の中で5分30秒、15分10秒及び23分40秒あたり等)が一覧表示される。ユーザが一覧表示された時間の中から1つを選択すると、再生中の講義動画が、選択された時間まで移動する。
In the present embodiment, the user inputs a character string to be searched (search keyword), and an image of a character string handwritten by the lecturer on the blackboard in the lecture moving image (hereinafter referred to as "handwritten character string") in the lecture moving image. In addition, it is possible to search for lecture videos that include the search target character string. For example, when the user inputs “organic compound” as a search target character string on the search screen of the terminal 20, a lecture moving image in which the lecturer wrote “organic compound” on the blackboard is listed on the screen of the terminal 20. In addition, when the user selects a lecture video to view from the lecture videos displayed in a list, reproduction of the lecture video is started on the screen of the terminal 20, and the lecturer places the blackboard on the time axis of the lecture video. The times written as “organic compounds” (for example, 5 minutes 30 seconds, 15 minutes 10 seconds and 23 minutes 40 seconds in a 30-minute moving image, etc.) are listed. When the user selects one of the listed times, the lecture moving image being played moves to the selected time.
このような動作を実現するために、配信サーバ10には、手書き文字列(第1文字列)の画像を文字認識することで生成されるテキスト情報(第2文字列)と、講義動画で手書き文字列の画像が表示される時間を示す時間情報と、講義動画(又は講義動画を一意に特定する情報)とを対応づけてデータベースに格納しておく。時間情報は、より詳細には、講義動画内で手書き文字列が表示されてから表示が終了するまでの時間(以下、「出現時間」と言う。)を示す情報であってもよい。本実施形態では、当該データベースを「講義データDB(Database)」と呼ぶ。これにより、配信サーバ10は、検索対象の文字列が含まれる講義動画を、講義データDBを用いて検索することで、講師が黒板に書いた手書きの文章や文字列に検索対象の文字列が含まれる講義動画を検索することが可能になる。
In order to realize such an operation, the distribution server 10 uses the text information (second character string) generated by character recognition of the image of the handwritten character string (first character string) and the handwriting of the lecture moving image The time information indicating the time when the image of the character string is displayed is associated with the lecture moving image (or the information uniquely identifying the lecture moving image) and stored in the database. More specifically, the time information may be information indicating a time from when the handwritten character string is displayed in the lecture moving image to when the display ends (hereinafter, referred to as “appearing time”). In the present embodiment, the database is called “lecture data DB (Database)”. As a result, the distribution server 10 searches the lecture moving image including the character string to be searched using the lecture data DB, so that the character string to be searched for is a handwritten sentence or character string written by the lecturer on the blackboard. It will be possible to search the included lecture videos.
<ハードウェア構成>
図2は、配信サーバ10のハードウェア構成例を示す図である。配信サーバ10は、CPU(Central Processing Unit)11、メモリ等の記憶装置12、有線又は無線通信を行う通信IF(Interface)13、入力操作を受け付ける入力デバイス14、及び情報の出力を行う出力デバイス15を有する。後述する機能ブロック構成にて説明する各機能部は、記憶装置12に記憶されたプログラムがCPU11に実行させる処理により実現することができる。なお、当該プログラムは、例えば非一時的な記録媒体に格納することができる。 <Hardware configuration>
FIG. 2 is a diagram showing an example of the hardware configuration of thedistribution server 10. As shown in FIG. The distribution server 10 includes a central processing unit (CPU) 11, a storage device 12 such as a memory, a communication IF (Interface) 13 for performing wired or wireless communication, an input device 14 for receiving an input operation, and an output device 15 for outputting information. Have. Each functional unit described in the functional block configuration to be described later can be realized by processing that a program stored in the storage device 12 causes the CPU 11 to execute. The program can be stored, for example, in a non-temporary recording medium.
図2は、配信サーバ10のハードウェア構成例を示す図である。配信サーバ10は、CPU(Central Processing Unit)11、メモリ等の記憶装置12、有線又は無線通信を行う通信IF(Interface)13、入力操作を受け付ける入力デバイス14、及び情報の出力を行う出力デバイス15を有する。後述する機能ブロック構成にて説明する各機能部は、記憶装置12に記憶されたプログラムがCPU11に実行させる処理により実現することができる。なお、当該プログラムは、例えば非一時的な記録媒体に格納することができる。 <Hardware configuration>
FIG. 2 is a diagram showing an example of the hardware configuration of the
<機能ブロック構成>
図3は、配信サーバ10の機能ブロック構成例を示す図である。配信サーバ10は、受付部101と、検索部102と、出力部103と、生成部104と、記憶部105とを有する。記憶部105には、講義データDBが格納される。 <Function block configuration>
FIG. 3 is a diagram showing an example of a functional block configuration of thedistribution server 10. As shown in FIG. The distribution server 10 includes a reception unit 101, a search unit 102, an output unit 103, a generation unit 104, and a storage unit 105. The storage unit 105 stores lecture data DB.
図3は、配信サーバ10の機能ブロック構成例を示す図である。配信サーバ10は、受付部101と、検索部102と、出力部103と、生成部104と、記憶部105とを有する。記憶部105には、講義データDBが格納される。 <Function block configuration>
FIG. 3 is a diagram showing an example of a functional block configuration of the
受付部101は、ユーザが端末20の画面に入力した、検索対象の文字列を受け付ける機能を有する。
The reception unit 101 has a function of receiving a search target character string input by the user on the screen of the terminal 20.
検索部102は、受付部101で受け付けた検索対象の文字列を含む「テキスト情報」と、当該テキスト情報に対応する「出現時間」と、当該テキスト情報に対応する「講義動画」とを講義データDBから検索する。
The search unit 102 performs lecture data on “text information” including the character string to be searched for received by the reception unit 101, “appearance time” corresponding to the text information, and “lecture moving image” corresponding to the text information. Search from DB
出力部103は、検索部102により検索された講義動画を再生する領域(第1領域)と、検索されたテキスト情報と出現時間(時間情報)とを時系列順に表示する領域(第2領域)とを含む画面を出力する。出力された画面は端末20のディスプレイに表示される。なお、出力部103は、例えばWebサーバ機能を備えており、講義動画を配信するWebサイトを端末20に送信する機能を有していてもよい。或いは、出力部103は、端末20にインストールされたアプリケーションの画面に講義動画等を表示させるためのコンテンツを、端末20に送信する機能を有していてもよい。
The output unit 103 displays a region (first region) for reproducing the lecture moving image searched by the search unit 102, and a region (second region) for displaying the searched text information and appearance time (time information) in chronological order. Output a screen that contains The output screen is displayed on the display of the terminal 20. The output unit 103 may have, for example, a web server function, and may have a function of transmitting a website to which a lecture moving image is distributed to the terminal 20. Alternatively, the output unit 103 may have a function of transmitting, to the terminal 20, content for displaying a lecture moving image or the like on the screen of an application installed on the terminal 20.
生成部104は、講義動画の動画に表示される手書き文字列を文字認識することで、講義データDBを生成する。生成部104は、更に、領域抽出部1041と、分割部1042と、単一文字認識エンジン1043と、文字列認識エンジン1044と、DB生成部1045とを含む。領域抽出部1041と、分割部1042と、単一文字認識エンジン1043と、文字列認識エンジン1044と、DB生成部1045とが行う処理については後述する。
The generation unit 104 generates a lecture data DB by performing character recognition on a handwritten character string displayed in a moving image of a lecture moving image. The generation unit 104 further includes an area extraction unit 1041, a division unit 1042, a single character recognition engine 1043, a character string recognition engine 1044, and a DB generation unit 1045. The processing performed by the area extraction unit 1041, the division unit 1042, the single character recognition engine 1043, the character string recognition engine 1044, and the DB generation unit 1045 will be described later.
<講義データDBの生成について>
続いて、図4を用いて、講義データDBを生成する方法について具体的に説明する。以下の説明では、配信サーバ10の生成部104が、講義データDBを生成する前提で説明するが、必ずしも配信サーバ10が自ら講義データDBを作成するようにする必要はなく、外部の情報処理装置で生成されることとしてもよい。その場合、生成部104は、配信サーバ10ではなく、配信サーバ10とは異なる他の情報処理装置に実装されており、当該情報処理装置で生成された講義データDBが配信サーバ10の記憶部105に登録されることとしてもよい。 <About generation of lecture data DB>
Subsequently, a method of generating the lecture data DB will be specifically described with reference to FIG. In the following description, thegeneration unit 104 of the distribution server 10 is described on the premise of generating a lecture data DB, but the distribution server 10 does not necessarily have to create the lecture data DB by itself, and an external information processing apparatus It may be generated by In that case, the generation unit 104 is not implemented in the distribution server 10, but implemented in another information processing apparatus different from the distribution server 10, and the lecture data DB generated by the information processing apparatus is stored in the storage unit 105 of the distribution server 10. It may be registered in the
続いて、図4を用いて、講義データDBを生成する方法について具体的に説明する。以下の説明では、配信サーバ10の生成部104が、講義データDBを生成する前提で説明するが、必ずしも配信サーバ10が自ら講義データDBを作成するようにする必要はなく、外部の情報処理装置で生成されることとしてもよい。その場合、生成部104は、配信サーバ10ではなく、配信サーバ10とは異なる他の情報処理装置に実装されており、当該情報処理装置で生成された講義データDBが配信サーバ10の記憶部105に登録されることとしてもよい。 <About generation of lecture data DB>
Subsequently, a method of generating the lecture data DB will be specifically described with reference to FIG. In the following description, the
図4は、講義データDBを生成する際の処理手順の一例を示すフローチャートである。
FIG. 4 is a flowchart showing an example of a processing procedure when generating the lecture data DB.
ステップS101で、領域抽出部1041は、講義動画内で手書き文字列が表示される文字表示領域の画像(第1画像)を抽出する。また、講義動画内で当該手書き文字列が表示されてから表示が終了するまでの時間(出現時間)を判定して出力する。もし複数の手書き文字列が存在する場合、各々の手書き文字列に対して、文字表示領域の画像の抽出と出現時間の判定とを行う。
In step S101, the area extraction unit 1041 extracts an image (first image) of a character display area in which a handwritten character string is displayed in a lecture moving image. Further, the time (appearing time) from the display of the handwritten character string to the end of the display in the lecture moving image is determined and output. If there are a plurality of handwritten character strings, extraction of an image of the character display area and determination of appearance time are performed on each handwritten character string.
1つの手書き文字列について、文字表示領域の画像の抽出と出現時間の判定を行う処理の具体例を、図5を用いて説明する。領域抽出部1041は、講師が黒板に文字を書きながら講義を行っている動画(図5(a))に対して、所定のフレーム数単位(例えば80フレーム等)で画像処理をして、背景と区別される領域(背景以外の領域)を抽出する。例えば、領域抽出部1041は、ピクセル単位かつ当該所定のフレーム数単位で、背景画像と異なる可能性を示すスコア(確率)を出力する。この処理により、黒板上に文字が書かれた領域のピクセル及び講師が写っている領域のピクセルについては所定の値以上のスコアが出力される。
A specific example of processing for extracting an image of a character display area and determining an appearance time for one handwritten character string will be described with reference to FIG. The region extraction unit 1041 performs image processing on a moving image (FIG. 5A) in which a lecturer is writing characters on a blackboard (FIG. 5A) by a predetermined number of frames (for example, 80 frames). Extract the area (area other than background) that is distinguished from For example, the region extraction unit 1041 outputs a score (probability) indicating the possibility of being different from the background image in pixel units and in units of the predetermined number of frames. As a result of this processing, a score equal to or greater than a predetermined value is output for the pixels of the area where the characters are written on the blackboard and the pixels of the area where the lecturer appears.
続いて、領域抽出部1041は、出力されたスコアが所定の値以上であるピクセルを抽出する。抽出されたピクセルの例を図5(b)に示す。図5(b)に示す抽出箇所500は、抽出されたピクセルが集合している箇所を示している。また、領域抽出部1041は、背景と区別される領域を抽出する際に、講師が写っている領域を除く処理をすることが好ましい。例えば、領域抽出部1041は、背景と区別される領域を抽出する処理において、所定の時間長(例えば、10秒等)におけるスコアの変動が所定の閾値以下であるピクセル(領域)を抽出するようにしてもよい。これにより、動画内で動き回る講師が認識されたピクセルについては背景と区別される領域として抽出されないようにすることができる。また、領域抽出部1041は、領域抽出部1041は、背景と区別される領域を抽出する処理において、抽出されたピクセルが集合している領域の面積が所定値よりも大きい場合は、文字列ではなく講師が抽出されたものとみなして、抽出対象外として扱うようにしてもよい。これにより、講師が写っているピクセルについては背景と区別される領域として抽出されないようにすることができる。領域抽出部1041は、講義動画において、ピクセルが集合している箇所が現れてから消えるまでの時間を、講義動画内で手書き文字列が表示されている出現時間として判定する。
Subsequently, the region extraction unit 1041 extracts pixels whose output score is equal to or more than a predetermined value. An example of the extracted pixel is shown in FIG. 5 (b). An extraction part 500 shown in FIG. 5B indicates a part where the extracted pixels are gathered. In addition, when extracting the area distinguished from the background, the area extraction unit 1041 preferably performs a process excluding the area in which the lecturer appears. For example, in the process of extracting an area distinguished from the background, the area extracting unit 1041 extracts pixels (areas) whose score variation in a predetermined time length (for example, 10 seconds) is equal to or less than a predetermined threshold. You may In this way, it is possible to prevent the pixels at which the moving instructor in the moving image is recognized from being extracted as an area distinguished from the background. Also, in the process of extracting the area distinguished from the background by the area extraction unit 1041, the area extraction unit 1041 determines that the area of the area where the extracted pixels are gathered is larger than the predetermined value. However, the instructor may be regarded as being extracted and treated as an exclusion target. This makes it possible to prevent the pixel in which the lecturer appears from being extracted as an area distinguished from the background. The region extraction unit 1041 determines the time from the appearance of the part where the pixels are gathered to the disappearance in the lecture moving image as the appearance time when the handwritten character string is displayed in the lecture moving image.
続いて、領域抽出部1041は、ピクセルが集合している箇所を囲む長方形の枠の位置(例えば動画の左下を起点とした場合の長方形の左下のピクセル位置)及び大きさ(縦方向及び横方向の大きさ)を決定する。図5(b)に示す枠510は、決定された長方形の枠の一例である。
Subsequently, the area extracting unit 1041 determines the position (for example, the pixel position at the lower left of the rectangle when the lower left of the moving image is the starting point) and the size (longitudinal and lateral directions) Determine the size of The frame 510 shown in FIG. 5B is an example of the determined rectangular frame.
続いて、領域抽出部1041は、出現時間の間における講義動画を構成する各フレームの画像のうち任意のフレームの画像から長方形の枠で囲まれた領域を切り出すことで、講義動画内で手書き文字列が表示される文字表示領域の画像を抽出する。
Subsequently, the region extraction unit 1041 cuts out the region surrounded by the rectangular frame from the image of an arbitrary frame from the images of the respective frames constituting the lecture moving image during the appearance time, whereby handwritten characters in the lecture moving image Extract the image of the character display area where the column is displayed.
ステップS102で、分割部1042は、領域抽出部1041で抽出された文字表示領域の画像を、手書き文字列を構成する一文字単位の画像(第2画像)に分割する。分割部1042は、文字表示領域の画像を2値化すると共に、例えば、当該画像の縦軸方向の全ピクセルの照度が所定の閾値を下回る部分を文字の切れ目とみなすことで、一文字単位の画像に分割する。図5(c)に切れ目の位置の具体例を示す。
In step S102, the dividing unit 1042 divides the image of the character display area extracted by the area extracting unit 1041 into an image (second image) of one character unit constituting a handwritten character string. The dividing unit 1042 binarizes the image of the character display area and, for example, regards a portion where the illuminance of all pixels in the vertical axis direction of the image is less than a predetermined threshold as a break of the character. Divide into The specific example of the position of a break is shown in FIG.5 (c).
ステップS103で、単一文字認識エンジン1043は、手書き文字列を構成する一文字単位の画像について文字認識を行うことで、当該画像ごとに複数の候補文字を出力する。図6を用いて具体例を示す。図6に示す候補文字1~5は、「異」、「性」、「体」の画像の各々について文字認識を行うことで出力された複数の候補文字の例を示している。
In step S103, the single character recognition engine 1043 performs character recognition on the image of one character unit constituting the handwritten character string, and outputs a plurality of candidate characters for each image. A specific example is shown using FIG. Candidate characters 1 to 5 shown in FIG. 6 indicate examples of a plurality of candidate characters output by performing character recognition on each of the “different”, “sex”, and “body” images.
なお、単一文字認識エンジン1043が高精度な文字認識能力を有している場合、ステップS104の処理手順に進まずに、単一文字認識エンジン1043が出力した候補文字をそのままテキスト情報として講義データDBに格納することとしてもよい。例えば図6の例において、単一文字認識エンジン1043が、「異」、「性」、「体」の画像に対して「異」、「性」、「体」と正しく認識可能な能力を有している場合、認識された「異」、「性」、「体」のテキストを結合した「異性体」を、そのままテキスト情報として講義データDBに格納することとしてもよい。
If the single character recognition engine 1043 has high accuracy character recognition capability, the candidate character output by the single character recognition engine 1043 is directly used as text information in the lecture data DB without proceeding to the processing procedure of step S104. It may be stored. For example, in the example of FIG. 6, the single character recognition engine 1043 has the ability to correctly recognize "different", "sex", and "body" with respect to "different", "sex", and "body" images. In this case, the “isomer” obtained by combining the recognized “different”, “sex”, and “body” text may be stored as it is in the lecture data DB as text information.
ステップS104で、文字列認識エンジン1044(出力部)は、一文字単位の画像ごとに出力された複数の候補文字を、手書き文字列における文字の並び順に従って組み合わせることで複数の候補文字列を生成する。例えば、図6の例では、「異」に対応する5つの候補文字と、「性」に対応する5つの候補文字と、「体」に対応する5つの候補文字とを組み合わせることで生成される125(5×5×5)個の候補文字列を生成する。
In step S104, the character string recognition engine 1044 (output unit) generates a plurality of candidate character strings by combining a plurality of candidate characters output for each one-character image according to the arrangement order of characters in the handwritten character string. . For example, in the example of FIG. 6, it is generated by combining five candidate characters corresponding to "different", five candidate characters corresponding to "sex", and five candidate characters corresponding to "body". Generate 125 (5 × 5 × 5) candidate character strings.
ここで、文字列認識エンジン1044は、講義動画で使用される可能性のある複数のキーワード(文字列)を予め学習済みであり、任意の文字列を入力することで、当該複数のキーワードのうち、入力された文字列と最も類似すると判定されるキーワード及び類似度を示すスコアを出力する機能を有している。講義動画で使用される可能性のあるキーワードとは、例えば、日本史の講義動画の場合、「邪馬台国」や「徳川家康」といった教科書の索引に記載されているようなキーワードである。ただし、キーワードは科目ごとに異なることが一般的である。そのため、講義動画の属性(科目や講義名等)に応じて異なるキーワードを学習させた文字列認識エンジン1044を用意しておき、講義動画の属性に応じた文字列認識エンジン1044を使用してステップS104の処理手順を行うようにしてもよい。
Here, the character string recognition engine 1044 has learned in advance a plurality of keywords (character strings) which may be used in the lecture moving image, and by inputting an arbitrary character string, the plurality of keywords can be selected. It has a function of outputting a keyword that is determined to be most similar to the input character string and a score indicating the similarity. For example, in the case of a lecture moving image of Japanese history, the keyword which may be used in the lecture moving image is a keyword as described in the index of a textbook such as "Yamaidai Country" or "Tokugawa Ieyasu". However, keywords are generally different for each subject. Therefore, prepare a character string recognition engine 1044 in which different keywords are learned according to the attribute (subject or lecture name, etc.) of the lecture moving image, and use the character string recognition engine 1044 according to the attribute of the lecture moving image The processing procedure of S104 may be performed.
続いて、文字列認識エンジン1044は、講義動画で使用される可能性のある複数のキーワード(文字列)として予め学習したキーワード(文字列)のうち、生成した複数の候補文字列のいずれかに最も類似すると判定されるキーワード(文字列)を、手書き文字列に対応するテキスト情報として出力する。より具体的には、文字列認識エンジン1044は、生成した複数の候補文字列の各々について、最も類似すると判定されるキーワードと類似度(スコア)を出力し、出力された類似度が最も高いキーワードを、手書き文字列に対応するテキスト情報として出力する。
Subsequently, the character string recognition engine 1044 selects one of a plurality of generated candidate character strings from among keywords (character strings) learned in advance as a plurality of keywords (character strings) which may be used in the lecture moving image. The keyword (character string) determined to be most similar is output as text information corresponding to the handwritten character string. More specifically, the character string recognition engine 1044 outputs, for each of the plurality of generated candidate character strings, a keyword (similarity score (score)) to the keyword judged to be the most similar, and the outputted keyword having the highest similarity. Is output as text information corresponding to the handwritten character string.
図6には、文字列認識エンジン1044が、125個の候補文字列の各々と学習済みキーワード(図6の例では少なくとも「異性体」を含む)との間の類似度を出力し、出力した類似度が最も高い学習済みキーワード「異性体」を、手書き文字列に対応するテキスト情報として出力した場合の例を示している。仮に、単一文字認識エンジン1043が文字を正しく認識することができず、125個の候補文字列の中に「異性体」そのものが含まれていない場合であっても、複数の候補文字列の中に「異性体」に類似する候補文字列(例えば「異住体」等)が含まれるのであれば、手書き文字列に対応するテキスト情報として「異性体」が文字列認識エンジン1044から出力されることになる。
In FIG. 6, the string recognition engine 1044 outputs and outputs the degree of similarity between each of the 125 candidate strings and the learned keywords (including at least the “isomer” in the example of FIG. 6). The example at the time of outputting the learned keyword "isomer" with the highest similarity as text information corresponding to a handwritten character string is shown. Even if the single character recognition engine 1043 can not correctly recognize the character and the 125 candidate strings do not contain the "isomer" itself, there are a plurality of candidate strings. If a candidate character string (for example, "disidentate" etc.) similar to "isomer" is included in "," "isomer" is output from character string recognition engine 1044 as text information corresponding to a handwritten character string. It will be.
生成部104は、以上説明したステップS101~104のまでの処理手順を、講義動画内で表示される手書き文字列ごとに繰り返し行うことで、講義動画内で表示される複数の手書き文字列の各々について、キーワード及び出現時間を判定する。
The generation unit 104 repeatedly performs the processing procedure up to the steps S101 to 104 described above for each handwritten character string displayed in the lecture moving image, whereby each of the plurality of handwritten character strings displayed in the lecture moving image is generated. For keywords and time of appearance.
ステップS105で、DB生成部1045は、ステップS104の処理手順で文字列認識エンジン1044から出力されたテキスト情報と、ステップS101で領域抽出部1041から出力された出現時間と、処理対象である講義動画(講義動画のファイル名でもよい)とを対応づけて講義データDBを生成する。
In step S105, the DB generation unit 1045 receives the text information output from the character string recognition engine 1044 in the processing procedure in step S104, the appearance time output from the area extraction unit 1041 in step S101, and the lecture moving image to be processed. A lecture data DB is generated in correspondence with (a file name of a lecture moving image may be used).
図7は、講義データDBの一例を示す図である。「講義動画」には、講義動画を一意に識別する識別子が格納される。当該識別子には、講義動画の科目及び講義名等を含む。当該識別子は、例えば、講義動画の科目を含むファイル名であってもよい。「出現時間」には、手書き文字列が講義動画内で表示されてから消えるまでの時間が格納される。「テキスト情報」には、手書き文字列に対応するテキストデータが格納される。図7の例では、「化学_第1講_有機化合物の構造決定_チャプター1」の講義動画には「錯イオン形成反応」が0分05秒~3分10秒までの間表示されていること、「元素分析」が1分20秒~3分10秒までの間表示されていること等を示すデータが格納されている。
FIG. 7 is a diagram showing an example of the lecture data DB. An identifier for uniquely identifying a lecture moving image is stored in the "lecture moving image". The identifier includes the subject of the lecture video, the lecture name, and the like. The identifier may be, for example, a file name including a subject of a lecture moving image. The “appearing time” stores the time from when the handwritten character string is displayed in the lecture video until it disappears. Text data corresponding to the handwritten character string is stored in the "text information". In the example of FIG. 7, “complex ion formation reaction” is displayed for 0 minutes 05 seconds to 3 minutes 10 seconds in the lecture animation of “chemistry_first course_structure determination of organic compound_chapter 1” Data indicating that "elemental analysis" is displayed for one minute 20 seconds to three minutes 10 seconds is stored.
<講義の検索について>
続いて、ユーザが講義動画を検索する際の処理手順について具体的に説明する。図8及び図9は、端末20に表示される画面の一例を示す図である。図8(a)は講義動画を検索するための画面の一例である。講義動画を検索する画面には、検索対象の文字列と、検索対象とする講義動画の科目を入力する入力ボックス1001が設けられている。入力ボックス1001の右側に表示される検索ボタンが押下されると、検索部102は、講義データDBにアクセスし、入力された科目に該当する講義動画のテキスト情報の中に、検索対象の文字列が含まれる講義動画が存在するか否かを検索する。テキスト情報に検索対象の文字列が含まれる講義動画が存在する場合、出力部103は、検索された講義動画を一覧表示する画面を出力する。なお、出力部103は、検索された講義動画が複数である場合に、講義動画を一覧表示する画面を出力し、検索された講義動画が1つである場合は、後述する「講義動画を再生する画面(図9(a))」に直接遷移するようにしてもよい。 <About search of lectures>
Subsequently, a processing procedure when the user searches for a lecture moving image will be specifically described. 8 and 9 are diagrams showing an example of a screen displayed on the terminal 20. FIG. FIG. 8A is an example of a screen for searching a lecture moving image. In the screen for searching a lecture moving image, aninput box 1001 for inputting a character string to be searched and a subject of the lecture moving image to be searched is provided. When the search button displayed on the right side of the input box 1001 is pressed, the search unit 102 accesses the lecture data DB, and the character string of the search target in the text information of the lecture moving image corresponding to the input subject Search whether there is a lecture video that includes. When there is a lecture moving image in which a text string to be searched is included in the text information, the output unit 103 outputs a screen displaying a list of the searched lecture moving images. Note that the output unit 103 outputs a screen displaying a list of lecture moving images when there are a plurality of searched lecture moving images, and when there is one searched lecture moving image, “replay the lecture moving image described later is output. It is also possible to make a direct transition to the screen to be displayed (FIG. 9 (a)).
続いて、ユーザが講義動画を検索する際の処理手順について具体的に説明する。図8及び図9は、端末20に表示される画面の一例を示す図である。図8(a)は講義動画を検索するための画面の一例である。講義動画を検索する画面には、検索対象の文字列と、検索対象とする講義動画の科目を入力する入力ボックス1001が設けられている。入力ボックス1001の右側に表示される検索ボタンが押下されると、検索部102は、講義データDBにアクセスし、入力された科目に該当する講義動画のテキスト情報の中に、検索対象の文字列が含まれる講義動画が存在するか否かを検索する。テキスト情報に検索対象の文字列が含まれる講義動画が存在する場合、出力部103は、検索された講義動画を一覧表示する画面を出力する。なお、出力部103は、検索された講義動画が複数である場合に、講義動画を一覧表示する画面を出力し、検索された講義動画が1つである場合は、後述する「講義動画を再生する画面(図9(a))」に直接遷移するようにしてもよい。 <About search of lectures>
Subsequently, a processing procedure when the user searches for a lecture moving image will be specifically described. 8 and 9 are diagrams showing an example of a screen displayed on the terminal 20. FIG. FIG. 8A is an example of a screen for searching a lecture moving image. In the screen for searching a lecture moving image, an
図8(b)は、検索された講義動画を一覧表示する画面の一例である。検索結果は、表示エリア1003に一覧表示される。例えば、ユーザが、科目として「化学」を選択し、検索対象の文字列に「イオン」を入力して検索を行った場合、化学に関する講義動画の中から、講師が「イオン」と黒板に書いた講義動画が検索結果として表示エリア1003に一覧表示される。
FIG. 8B is an example of a screen displaying a list of searched lecture moving images. The search results are displayed in a list in the display area 1003. For example, if the user selects "Chemistry" as the subject and enters "Aeon" as the search target character string, the lecturer writes "Ion" on the blackboard from among the lecture videos on chemistry. The lecture videos are displayed in a list in the display area 1003 as a search result.
続いて、ユーザが、表示エリア1003に一覧表示された講義動画の中から視聴を所望する講義動画を選択すると、講義動画を再生する画面に遷移する。表示エリア1003は、検索された講義動画を一覧表示することに加えて、ユーザが視聴を所望する講義動画の選択を受け付ける機能も備えていることから、表示エリア1003を含む画面を、ユーザが視聴を所望する講義動画の選択を受け付ける画面と称してもよい。
Subsequently, when the user selects a lecture moving image desired to be viewed from among the lecture moving images displayed in a list in the display area 1003, a transition is made to a screen for reproducing the lecture moving image. Since the display area 1003 has a function of accepting selection of a lecture moving image desired to be viewed by the user in addition to displaying a list of searched lecture moving images, the user views the screen including the display area 1003 May be referred to as a screen for receiving a selection of a lecture moving image for which
講義動画を再生する画面の一例を図9(a)に示す。図9(a)には、講義動画を再生する表示エリア2001(第1領域)と、検索対象の文字列を含むテキスト情報と、手書き文字列の表示が開始される開始時間とを横方向に時系列順に並べて表示する表示エリア2002(第2領域)と、表示エリア2001で再生される講義動画の科目に関して過去に検索された文字列を表示する表示エリア2004(第3領域)とを含む。表示エリア2002の上部には、開始時間及びテキスト情報を一覧表示するボタン2003が表示される。ユーザがボタン2003を押下すると、図9(b)に示すように、表示エリア2002に代えて、検索対象の文字列を含むテキスト情報とタイムスタンプ情報とを縦方向に時系列順に並べて表示する表示エリア2005(第2領域)が表示される。
An example of the screen for reproducing the lecture moving image is shown in FIG. In FIG. 9A, a display area 2001 (first area) for reproducing a lecture moving image, text information including a character string to be searched, and a start time at which the display of a handwritten character string is started are arranged in the horizontal direction. A display area 2002 (second area) arranged and displayed in chronological order and a display area 2004 (third area) displaying a character string searched in the past with respect to the subject of the lecture moving image reproduced in the display area 2001 are included. At the top of the display area 2002, a button 2003 for displaying a list of start times and text information is displayed. When the user presses the button 2003, as shown in FIG. 9 (b), a display in which text information including a character string to be searched and time stamp information are arranged in chronological order in the vertical direction instead of the display area 2002. Area 2005 (second area) is displayed.
表示エリア2002及び表示エリア2005には、検索結果が講義動画に表示される手書き文字列であることを示すメッセージ(検索されたテキスト情報に対応する手書き文字列が、講義動画に表示されるものであることを示すメッセージ)として「板書」の文言が表示される。また、表示エリア2002及び表示エリア2005の上部には、検索対象の文字列を含むテキスト情報が検索された件数が表示エリア2102に表示される。
In the display area 2002 and the display area 2005, a message indicating that the search result is a handwritten character string to be displayed in the lecture moving image (a handwritten character string corresponding to the searched text information is displayed in the lecture moving image The word "board" is displayed as a message indicating that there is a problem. Further, in the upper part of the display area 2002 and the display area 2005, the number of times the text information including the character string to be searched for is searched is displayed in the display area 2102.
表示エリア2002及び表示エリア2005に表示されるテキスト情報のうち、検索対象の文字列に該当する部分が強調表示されるようにしてもよい。例えば図9(a)及び図9(b)の例では、「錯イオン形成反応」及び「水素イオン」のうち検索対象の文字列である「イオン」の部分が強調して表示されている。
Of the text information displayed in the display area 2002 and the display area 2005, a portion corresponding to the character string to be searched may be highlighted. For example, in the examples of FIG. 9A and FIG. 9B, the portion of “ion” which is a character string to be searched out of “complex ion formation reaction” and “hydrogen ion” is highlighted.
表示エリア2002及び表示エリア2005には、更に、手書き文字列の表示が終了する終了時間を表示するようにしてもよい。例えば、表示エリア2002及び表示エリア2005には、「0:05~3:10 錯イオン形成反応」といったように、手書き文字列の出現時間を表示するようにしてもよい。
The display area 2002 and the display area 2005 may further display an end time at which the display of the handwritten character string ends. For example, in the display area 2002 and the display area 2005, the appearance time of the handwritten character string may be displayed as "0:05 to 3:10 complex ion forming reaction".
なお、表示エリア2001には、検索されたテキスト情報に対応する手書き文字列が講義動画の中で表示される位置を示す情報が、講義動画に重ねて表示されるようにしてもよい。例えば、図9(a)及び図9(b)に示すように、表示エリア2001には、検索されたテキスト情報である「錯イオン形成反応」が講義動画の中で表示される位置を示す枠2101が表示されるようにしてもよい。2101を表示可能にするために、講義データDBには、更に、枠2101を表示する位置及び枠2101の大きさを示す情報がレコード毎に格納されていてもよい。枠2101を表示する位置及び枠2101の大きさを示す情報として講義データDBに格納する情報には、図4のステップS101で説明した、抽出されたピクセルの集合を囲む長方形の枠の位置及び大きさを示す情報と同一の情報が格納されることとしてもよい。また、枠2101は、検索されたテキスト情報に対応する出現時間の間、表示エリア2001に表示され続けることとしてもよい。
Note that in the display area 2001, information indicating the position where the handwritten character string corresponding to the searched text information is displayed in the lecture moving image may be displayed superimposed on the lecture moving image. For example, as shown in FIGS. 9A and 9B, in the display area 2001, a frame indicating a position where “complex ion forming reaction”, which is the searched text information, is displayed in the lecture moving image 2101 may be displayed. In order to make 2101 displayable, information indicating the position at which the frame 2101 is displayed and the size of the frame 2101 may be further stored in the lecture data DB for each record. The information stored in the lecture data DB as information indicating the position where the frame 2101 is displayed and the size of the frame 2101 includes the position and size of the rectangular frame surrounding the set of extracted pixels described in step S101 of FIG. The same information as the information indicating the position may be stored. In addition, the frame 2101 may be continuously displayed on the display area 2001 during the appearance time corresponding to the searched text information.
ユーザが表示エリア1003(図8(b))で講義動画を選択すると、表示エリア2001にて講義動画の再生が開始される。続いて、ユーザが、表示エリア2002又は表示エリア2005に表示されている開始時間及びテキスト情報の中から、視聴を所望する開始時間及びテキスト情報を選択すると、表示エリア2001に表示される講義動画が、選択された開始時間の時間又は開始時間の時間より所定の時間前(例えば10秒前等)の時間から再生される。例えば、ユーザが表示エリア2002にて2:15と表示されている箇所をタップすると、表示エリア2001において、2:15の時点又は所定の時間前(例えば2:06等)から講義動画が再生される。
When the user selects a lecture moving image in the display area 1003 (FIG. 8B), reproduction of the lecture moving image is started in the display area 2001. Subsequently, when the user selects the start time and text information desired to be viewed from the start time and text information displayed in display area 2002 or display area 2005, the lecture moving image displayed in display area 2001 is displayed. The selected start time or the time which is a predetermined time before (for example, 10 seconds before) the start time is reproduced. For example, when the user taps a portion displayed as 2:15 in the display area 2002, a lecture moving image is reproduced in the display area 2001 from 2:15 or a predetermined time (for example, 2:06). Ru.
なお、ユーザが表示エリア1003(図8(b))で講義動画を選択した時点では表示エリア2001にて講義動画の再生は開始されず、ユーザが表示エリア2001の中に表示される再生開始ボタンを押下するか、又は、ユーザが表示エリア2002又は表示エリア2005に表示されているタイムスタンプ情報及びテキスト情報の中から、視聴を所望するタイムスタンプ情報を選択することで初めて講義動画の再生が開始されるようにしてもよい。
When the user selects a lecture moving image in the display area 1003 (FIG. 8B), reproduction of the lecture moving image is not started in the display area 2001, and the user starts the reproduction start button displayed in the display area 2001. Or, the user can start playing back lecture videos for the first time by selecting time stamp information desired to be viewed from time stamp information and text information displayed in display area 2002 or display area 2005. It may be done.
また、ユーザが表示エリア2002を右から左(又は左から右)にスワイプすることで、次の(又は以前の)開始時間及びテキスト情報が表示されるようにしてもよい。例えば、図9(a)の例では、ユーザが表示エリア2002を右から左にスワイプすることで、開始時間が0:05であるテキスト情報が左から消えると共に開始時間が2:15であるテキスト情報が右側から左側に移動し、更に、右側に次のテキスト情報が現れるようにしてもよい。
Also, the user may swipe the display area 2002 from right to left (or left to right) to display the next (or previous) start time and text information. For example, in the example of FIG. 9A, when the user swipes the display area 2002 from right to left, the text having a start time of 0: 05 disappears from the left and the text has a start time of 2: 15 The information may move from the right to the left, and the next text information may appear on the right.
同様に、ユーザが表示エリア2005を上から下(又は下から上)にスワイプすることで、次の(又は以前の)タイムスタンプ情報及びテキスト情報が表示されるようにしてもよい。
Similarly, when the user swipes the display area 2005 from top to bottom (or from bottom to top), the next (or previous) time stamp information and text information may be displayed.
また、検索部102で検索されたテキスト情報に含まれるテキストの文字数が所定の文字数以上である場合、出力部103は、表示エリア2002において、検索されたテキスト情報に含まれるテキストのうち、少なくとも検索対象の文字列を含む一部のテキストのみを出力するようにしてもよい。これにより、テキスト情報に含まれるテキストの文字数が多すぎて表示エリア2002又は表示エリア2005に全ての文字を表示することが困難な場合や、端末20がスマートフォン等でありディスプレイサイズが小さいためにテキスト情報を全て表示することが困難である場合等であっても、視認性を大きく犠牲にすることなくテキスト情報を表示することが可能になる。
When the number of characters of the text included in the text information searched by the search unit 102 is equal to or more than the predetermined number of characters, the output unit 103 searches the display area 2002 for at least the text included in the searched text information. Only partial text including the target character string may be output. As a result, when the number of characters of the text included in the text information is too large to display all the characters in the display area 2002 or the display area 2005, or when the terminal 20 is a smartphone or the like and the display size is small Even when it is difficult to display all the information, it is possible to display the text information without largely sacrificing the visibility.
また、表示エリア2004に表示される、講義動画の科目に関して過去に検索された文字列は、本動画配信システムを利用する複数のユーザが過去に検索対象の文字列として入力した文字列のうち入力された回数が多い順に表示されるようにしてもよい。また、ユーザが表示エリア2004に表示される文字列を選択した場合、選択した文字列が入力ボックス1001に自動的に入力されるようにしてもよい。
In addition, the character string displayed in the display area 2004 for the subject of the lecture moving image is input among the character strings previously input by the plurality of users using the moving image distribution system as the search target character string It may be displayed in descending order of the number of times performed. When the user selects a character string displayed in the display area 2004, the selected character string may be automatically input to the input box 1001.
以上、本実施形態について説明した。本実施形態では、講義データDBに、講義動画において講師が黒板に書いた文字をテキスト化したテキスト情報を格納しておき、検索対象の文字列とテキスト情報とを比較することで講義動画の検索を行うようにした。これにより、本実施形態は、講義動画の動画を直接解析しながら文字列を検索する方法と比較して検索速度を向上させることができるという技術的効果を有する。
The present embodiment has been described above. In this embodiment, the lecture data DB stores text information obtained by converting the characters written by the lecturer on the blackboard into text in the lecture video, and comparing the character string to be searched with the text information to search for the lecture video To do. As a result, the present embodiment has a technical effect that the search speed can be improved as compared with a method of searching for a character string while directly analyzing a moving image of a lecture moving image.
以上の説明では、講義データDBに格納される出現時間には、手書き文字列の表示が開始された時間(黒板に文字列が書かれた時間)と表示が終了する時間(例えば講師が黒板消し等を用いて文字を消した時間)とが含まれることとしたが、手書き文字列の表示が開始された時間のみが含まれることとしてもよい。これにより、講義データDBのデータ容量を削減することができる。なお、手書き文字列の表示が開始された時間と表示が終了する時間とをまとめて「時間情報」と称してもよいし、手書き文字列の表示が開始された時間のみを「時間情報」と称してもよい。
In the above description, in the appearance time stored in the lecture data DB, the time when the display of the handwritten character string is started (the time when the character string is written on the blackboard) and the time when the display is ended (for example And the like) are included, but it is also possible to include only the time when the display of the handwritten character string is started. Thereby, the data volume of lecture data DB can be reduced. The time when the display of the handwritten character string is started and the time when the display is terminated may be collectively referred to as "time information", or only the time when the display of the handwritten character string is started is "time information". You may call it.
以上の説明において、文字列が表示される動画とは、講師が黒板に手書の文字を書きながら講義を行う講義動画である前提で説明したが、本実施形態は、講義動画や手書きの文字に限定されない。本実施形態は、文字列が表示される動画であればどのような文字列や動画に対しても適用することが可能である。
In the above description, although the moving image in which the character string is displayed is described as a lecture moving image in which the lecturer gives a lecture while writing the handwriting character on the blackboard, the present embodiment relates to a lecture moving image and handwritten characters. It is not limited to. The present embodiment can be applied to any character string or moving image as long as the character string is displayed on the moving image.
以上、説明した実施形態は、本発明の理解を容易にするためのものであり、本発明を限定して解釈するためのものではない。実施形態で説明したフローチャート、シーケンス、実施形態が備える各要素並びにその配置、材料、条件、形状及びサイズ等は、例示したものに限定されるわけではなく適宜変更することができる。また、異なる実施形態で示した構成同士を部分的に置換し又は組み合わせることが可能である。
The embodiments described above are for the purpose of facilitating the understanding of the present invention, and are not for the purpose of limiting the present invention. The flowcharts, sequences, and elements included in the embodiments and their arrangements, materials, conditions, shapes, sizes, and the like described in the embodiments are not limited to those illustrated, and can be changed as appropriate. In addition, configurations shown in different embodiments can be partially substituted or combined with each other.
Claims (9)
- 複数の第1文字列の画像が表示される動画について、該第1文字列の画像を文字認識することで生成される第2文字列と、前記動画で該第1文字列の画像が表示される時間を示す時間情報と、前記動画とを対応づけて格納するデータベースを記憶する記憶部と、
検索対象の文字列を受け付ける受付部と、
前記検索対象の文字列を含む第2文字列と、該第2文字列に対応する時間情報と、該第2文字列に対応する動画とを前記データベースから検索する検索部と、
検索された動画を再生する第1表示領域と、検索された第2文字列と時間情報とを時系列順に表示する第2表示領域とを含む画面を出力する出力部と、
を有する情報処理装置。 For a moving image in which a plurality of first character string images are displayed, a second character string generated by character recognition of the first character string image and an image of the first character string in the moving image are displayed A storage unit that stores a database that stores time information indicating time of day and the moving image in association with each other;
A reception unit that receives a character string to be searched;
A search unit configured to search the database for a second character string including the character string to be searched, time information corresponding to the second character string, and a moving image corresponding to the second character string;
An output unit for outputting a screen including a first display area for reproducing a searched moving image, and a second display area for displaying a searched second character string and time information in chronological order;
An information processing apparatus having - 前記出力部は、前記第2表示領域に、検索された第2文字列と時間情報とを、横方向又は縦方向に時系列順に並べて表示する画面を出力する、
請求項1に記載の情報処理装置。 The output unit outputs, on the second display area, a screen on which the retrieved second character string and time information are arranged in chronological order in the horizontal direction or in the vertical direction.
An information processing apparatus according to claim 1. - 前記出力部は、前記第2表示領域に、更に、検索された第2文字列に対応する第1文字列の画像が、前記動画に表示されていることを示すメッセージを表示する、
請求項2に記載の情報処理装置。 The output unit further displays a message indicating that an image of a first character string corresponding to the searched second character string is displayed in the moving image in the second display area.
The information processing apparatus according to claim 2. - 前記出力部は、検索された第2文字列に対応する第1文字列の画像が前記動画の中で表示される位置を示す情報を、前記動画に重ねて表示する、
請求項1乃至3のいずれか一項に記載の情報処理装置。 The output unit superimposes information indicating a position at which an image of a first character string corresponding to the searched second character string is displayed in the moving image, and displays the information.
The information processing apparatus according to any one of claims 1 to 3. - 前記出力部は、前記第2表示領域に表示する第2文字列のうち、前記検索対象の文字列に該当する部分を強調表示する、
請求項1乃至4のいずれか一項に記載の情報処理装置。 The output unit highlights a portion corresponding to the search target character string in the second character string displayed in the second display area.
The information processing apparatus according to any one of claims 1 to 4. - 前記動画は、講師が黒板を用いて授業を行っている様子を撮影した動画であり、
前記第1文字列は、前記黒板に手書きで書かれた複数の手書き文字を含む文字列である、
請求項1乃至5のいずれか一項に記載の情報処理装置。 The above video is a video of the lecturer taking a lesson using a blackboard,
The first character string is a character string including a plurality of handwritten characters handwritten on the blackboard.
The information processing apparatus according to any one of claims 1 to 5. - 動画内で第1文字列の画像が表示される領域である第1画像を抽出し、前記動画内で前記第1文字列の画像の表示が開始される時間情報を出力する抽出部と、
前記抽出部で抽出された前記第1画像を、前記第1文字列に含まれる文字ごとの第2画像に分割する分割部と、
複数の前記第2画像の各々について文字認識を行うことで、前記第2画像ごとに複数の候補文字を出力する文字認識部と、
前記第2画像ごとに出力された前記複数の候補文字を前記第1文字列における文字の並び順に従って組み合わせることで生成される複数の候補文字列について、前記動画で使用される可能性のある複数の文字列のうち、前記複数の候補文字列のいずれかに最も類似すると判定される文字列を、第2文字列として出力する出力部と、
前記出力部で出力された前記第2文字列と、前記抽出部で出力された前記時間情報と、前記動画とを対応づけたデータベースを生成する生成部と、
を有する情報処理装置。 An extraction unit which extracts a first image which is an area where an image of a first character string is displayed in a moving image, and outputs time information when display of the image of the first character string is started in the moving image;
A division unit that divides the first image extracted by the extraction unit into a second image for each character included in the first character string;
A character recognition unit that outputs a plurality of candidate characters for each of the second images by performing character recognition on each of the plurality of second images;
For a plurality of candidate character strings generated by combining the plurality of candidate characters output for each of the second images in accordance with the arrangement order of characters in the first character string, a plurality of possibilities may be used in the moving image An output unit that outputs, as a second character string, a character string determined to be most similar to any of the plurality of candidate character strings among the character strings of
A generation unit that generates a database in which the second character string output by the output unit, the time information output by the extraction unit, and the moving image are associated;
An information processing apparatus having - 複数の第1文字列の画像が表示される動画について、該第1文字列の画像を文字認識することで生成される第2文字列と、前記動画で該第1文字列の画像が表示される時間を示す時間情報と、前記動画とを対応づけて格納するデータベースを記憶する記憶部を有する情報処理装置が行う動画検索方法であって、
検索対象の文字列を受け付けるステップと、
前記検索対象の文字列を含む第2文字列と、該第2文字列に対応する時間情報と、該第2文字列に対応する動画とを前記データベースから検索するステップと、
検索された動画を再生する第1表示領域と、検索された第2文字列と時間情報とを時系列順に表示する第2表示領域とを含む画面を出力するステップと、
を有する動画検索方法。 For a moving image in which a plurality of first character string images are displayed, a second character string generated by character recognition of the first character string image and an image of the first character string in the moving image are displayed A moving image search method performed by an information processing apparatus having a storage unit that stores a database in which time information indicating the time of day and the moving image are stored in association with each other,
Receiving a search target character string;
Searching the database for a second character string including the search target character string, time information corresponding to the second character string, and a moving image corresponding to the second character string;
Outputting a screen including a first display area for reproducing the searched moving image, and a second display area for displaying the searched second character string and time information in chronological order;
Video search method with. - コンピュータを、
複数の第1文字列の画像が表示される動画について、該第1文字列の画像を文字認識することで生成される第2文字列と、前記動画で該第1文字列の画像が表示される時間を示す時間情報と、前記動画とを対応づけて格納するデータベースを記憶する記憶手段と、
検索対象の文字列を受け付ける受付手段と、
前記検索対象の文字列を含む第2文字列と、該第2文字列に対応する時間情報と、該第2文字列に対応する動画とを前記データベースから検索する検索手段と、
検索された動画を再生する第1表示領域と、検索された第2文字列と時間情報とを時系列順に表示する第2表示領域とを含む画面を出力する出力手段と、
として機能させるためのプログラム。 Computer,
For a moving image in which a plurality of first character string images are displayed, a second character string generated by character recognition of the first character string image and an image of the first character string in the moving image are displayed Storage means for storing a database in which time information indicating the time of day and the moving image are stored in association with each other;
Receiving means for receiving a character string to be searched;
Search means for searching the database for a second character string including the search target character string, time information corresponding to the second character string, and a moving image corresponding to the second character string;
Outputting means for outputting a screen including a first display area for reproducing the searched moving image, and a second display area for displaying the searched second character string and time information in chronological order;
Program to function as.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201980006824.0A CN111542817A (en) | 2018-01-25 | 2019-01-16 | Information processing device, video search method, generation method, and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2018-010904 | 2018-01-25 | ||
JP2018010904A JP6506427B1 (en) | 2018-01-25 | 2018-01-25 | INFORMATION PROCESSING APPARATUS, MOVIE SEARCH METHOD, GENERATION METHOD, AND PROGRAM |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019146466A1 true WO2019146466A1 (en) | 2019-08-01 |
Family
ID=66324237
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2019/001084 WO2019146466A1 (en) | 2018-01-25 | 2019-01-16 | Information processing device, moving-image retrieval method, generation method, and program |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP6506427B1 (en) |
CN (1) | CN111542817A (en) |
WO (1) | WO2019146466A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2023183769A (en) * | 2022-06-16 | 2023-12-28 | オートペディア カンパニー リミテッド | Tire tread surface abrasion determination system and method using deep artificial neural network |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111355999B (en) * | 2020-03-16 | 2022-07-01 | 北京达佳互联信息技术有限公司 | Video playing method and device, terminal equipment and server |
CN113347478B (en) * | 2021-05-28 | 2022-11-04 | 维沃移动通信(杭州)有限公司 | Display method and display device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007036752A (en) * | 2005-07-27 | 2007-02-08 | Tdk Corp | System, method and program for reproducing contents |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05274314A (en) * | 1992-03-25 | 1993-10-22 | Canon Inc | Document processor |
JP2959925B2 (en) * | 1993-05-25 | 1999-10-06 | 富士ゼロックス株式会社 | Image processing device |
JP2002099558A (en) * | 2000-09-21 | 2002-04-05 | Canon Inc | Information retrieval system and method, and recording medium |
JP4744317B2 (en) * | 2006-02-16 | 2011-08-10 | 富士通株式会社 | Word search device, word search method, and computer program |
EP2541440A4 (en) * | 2010-02-26 | 2014-10-15 | Rakuten Inc | Information processing device, information processing method, and recording medium that has recorded information processing program |
CN102572573A (en) * | 2010-12-30 | 2012-07-11 | 上海无戒空间信息技术有限公司 | Method for pushing information according to played content |
JP5845764B2 (en) * | 2011-09-21 | 2016-01-20 | 富士ゼロックス株式会社 | Information processing apparatus and information processing program |
JP5831420B2 (en) * | 2012-09-28 | 2015-12-09 | オムロン株式会社 | Image processing apparatus and image processing method |
JP6672645B2 (en) * | 2015-09-07 | 2020-03-25 | カシオ計算機株式会社 | Information terminal device and program |
-
2018
- 2018-01-25 JP JP2018010904A patent/JP6506427B1/en active Active
-
2019
- 2019-01-16 CN CN201980006824.0A patent/CN111542817A/en active Pending
- 2019-01-16 WO PCT/JP2019/001084 patent/WO2019146466A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007036752A (en) * | 2005-07-27 | 2007-02-08 | Tdk Corp | System, method and program for reproducing contents |
Non-Patent Citations (1)
Title |
---|
SAKURADA ET AL: "Proposal of a lecture recording system for ubiquitous learning", HUMAN INTERFACE, vol. 6, no. 4-5, 2041105, pages 49 - 52 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2023183769A (en) * | 2022-06-16 | 2023-12-28 | オートペディア カンパニー リミテッド | Tire tread surface abrasion determination system and method using deep artificial neural network |
JP7515189B2 (en) | 2022-06-16 | 2024-07-12 | オートペディア カンパニー リミテッド | System and method for determining tire tread wear using deep artificial neural network |
Also Published As
Publication number | Publication date |
---|---|
CN111542817A (en) | 2020-08-14 |
JP6506427B1 (en) | 2019-04-24 |
JP2019128850A (en) | 2019-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111143610B (en) | Content recommendation method and device, electronic equipment and storage medium | |
US11917344B2 (en) | Interactive information processing method, device and medium | |
CN111970577B (en) | Subtitle editing method and device and electronic equipment | |
US20190130185A1 (en) | Visualization of Tagging Relevance to Video | |
KR101587926B1 (en) | Comment tagging system for streaming video and providing method thereof | |
CN112437353B (en) | Video processing method, video processing device, electronic apparatus, and readable storage medium | |
CN111263186A (en) | Video generation, playing, searching and processing method, device and storage medium | |
CN113010698B (en) | Multimedia interaction method, information interaction method, device, equipment and medium | |
WO2023016349A1 (en) | Text input method and apparatus, and electronic device and storage medium | |
WO2019146466A1 (en) | Information processing device, moving-image retrieval method, generation method, and program | |
CN113395605B (en) | Video note generation method and device | |
CN114329223A (en) | Media content searching method, device, equipment and medium | |
WO2023103597A1 (en) | Multimedia content sharing method and apparatus, and device, medium and program product | |
CN112989112B (en) | Online classroom content acquisition method and device | |
CN113407775B (en) | Video searching method and device and electronic equipment | |
US20100281046A1 (en) | Method and web server of processing a dynamic picture for searching purpose | |
CN114117120A (en) | Video file intelligent index generation system and method based on content analysis | |
CN106936830B (en) | Multimedia data playing method and device | |
CN114780793B (en) | Information labeling method, device, terminal equipment and storage medium | |
CN116800988A (en) | Video generation method, apparatus, device, storage medium, and program product | |
WO2019069997A1 (en) | Information processing device, screen output method, and program | |
JP2019144817A (en) | Motion picture output device, motion picture output method, and motion picture output program | |
CN113626622B (en) | Multimedia data display method in interactive teaching and related equipment | |
CN107609018B (en) | Search result presenting method and device and terminal equipment | |
CN117724648A (en) | Note generation method, device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19744422 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19744422 Country of ref document: EP Kind code of ref document: A1 |