WO2019146466A1

WO2019146466A1 - Information processing device, moving-image retrieval method, generation method, and program

Info

Publication number: WO2019146466A1
Application number: PCT/JP2019/001084
Authority: WO
Inventors: 繁塩澤
Original assignee: 株式会社リクルート
Priority date: 2018-01-25
Filing date: 2019-01-16
Publication date: 2019-08-01
Also published as: CN111542817A; JP6506427B1; JP2019128850A

Abstract

Provided is an information processing device (10) having: a storage unit (105) in which is stored a database containing, with regard to a moving image in which a plurality of images of a first character string are displayed, a second character string generated by character recognition of the images of the first character string, time information that indicates the time at which the images of the first character string are displayed by the moving image, and the moving image, in correlation to each other; an acceptance unit (101) that accepts a character string to be retrieved; a retrieval unit (102) that retrieves from the database a second character string that includes the character string to be retrieved, time information that corresponds to the second character string, and a moving image that corresponds to the second character string; and an output unit (103) that outputs a screen that includes a first display region (2001) where the retrieved moving image is played back and a second display region (2002) where the retrieved second character string and time information are displayed in chronological order.

Description

INFORMATION PROCESSING APPARATUS, MOVIE SEARCH METHOD, GENERATION METHOD, AND PROGRAM

Cross-reference to related applications

This application is based on Japanese Patent Application No. 2018-010904 filed on Jan. 25, 2018, the contents of which are incorporated herein by reference.

The present invention relates to an information processing apparatus, a moving image search method, a generation method, and a program.

There is known an online learning system in which a user can perform learning using a web browser or the like. By using the online learning system, users can watch videos of lectures they are interested in, understand their level of understanding by taking tests, and focus on reviewing problems encountered in the tests. And can proceed with learning efficiently. As a remote learning support system using a network, for example, the technology described in Patent Document 1 is known.

JP, 2001-188461, A

In the case where the user is reviewing a subject who is not good, it is considered that there is a need for viewing only a specific part, not necessarily viewing the whole lecture video from the beginning to the end. For example, in order to review the history of the United States among the subjects of world history, there is a need to want to view only the part where the lecturer explains the United States in the lecture video of the world history.
However, the conventional online learning system does not provide a function of searching for a specific part that the user desires to view from the lecture moving image. Therefore, the user has to look for the part he / she wants to view by watching the lecture video from the beginning to the end, or by fast-forwarding. Such a problem may occur not only in lecture videos but also in any videos.

Therefore, the present disclosure aims to provide a technology capable of quickly searching for a specific part in a moving image that the user desires to view.

An information processing apparatus according to an aspect of the present disclosure is a moving image in which images of a plurality of first character strings are displayed, a second character string generated by character recognition of the image of the first character string, and A storage unit storing time information indicating a time when the first character string image is displayed in a moving image, a database storing the moving image in association with each other, a receiving unit receiving a search target character string, and the search A search unit for searching the database for a second character string including a target character string, time information corresponding to the second character string, and a moving image corresponding to the second character string, and reproducing the searched moving image And an output unit for outputting a screen including a second display area for displaying the searched second character string and the time information in chronological order. According to this aspect, it is possible to provide a technology capable of quickly searching for a specific portion in the moving image that the user desires to view.

In the above aspect, the output unit may output, in the second display area, a screen on which the retrieved second character string and the time information are arranged in chronological order in the horizontal direction or the vertical direction and displayed. . According to this aspect, since the plurality of text information and time information are displayed in chronological order in the second area in the screen, it is possible to improve the visibility.

In the above aspect, the output unit further displays, in the second display area, a message indicating that an image of a first character string corresponding to the searched second character string is displayed in the moving image. You may do so. According to this aspect, the user can easily recognize on the screen that the search target is the first character string displayed in the moving image.

In the aspect described above, the output unit is configured to display information indicating a position at which the image of the first character string corresponding to the searched second character string is displayed in the moving image so as to be superimposed on the moving image. It is also good. According to this aspect, the user can easily grasp the position where the character string to be searched is displayed in the moving image.

In the above aspect, the output unit may highlight a portion of the second character string displayed in the second display area that corresponds to the character string to be searched. According to this aspect, for example, even when the number of characters of the second character string is large, it becomes possible to easily grasp which part of the second character string the character string to be searched corresponds to.

In the above aspect, the moving image is a moving image obtained by shooting a lecturer giving a lecture using a blackboard, and the first character string is a character string including a plurality of handwritten characters handwritten on the blackboard It may be According to this aspect, it is possible for the user to easily search the portion of the handwritten character written on the blackboard in the animation of the lecture, in which the character string to be searched is displayed.

An information processing apparatus according to another aspect of the present disclosure extracts a first image which is an area in which an image of a first character string is displayed in a moving image, and the display of the image of the first character string in the moving image is An extraction unit for outputting time information to be started; a division unit for dividing the first image extracted by the extraction unit into a second image for each character included in the first character string; A character recognition unit that outputs a plurality of candidate characters for each of the second images by performing character recognition on each of the two images, and the plurality of candidate characters output for each of the second images as the first character string It is determined that the plurality of candidate character strings generated by combining in accordance with the arrangement order of the characters is most similar to any of the plurality of candidate character strings among the plurality of character strings that may be used in the moving image. And the second string And an output unit for outputting the second character string output from the output unit; and a generation unit for generating a database in which the time information output from the extraction unit and the moving image are associated with each other. . According to this aspect, the database can be automatically generated, and the user can quickly use a technology capable of quickly searching for a specific portion desired to be viewed in the moving image. Become.

A moving image search method according to another aspect of the present disclosure includes: a second character string generated by character recognition of an image of a first character string for a moving image in which images of a plurality of first character strings are displayed; The moving image search method is performed by an information processing apparatus including a storage unit that stores a database that stores time information indicating the time when an image of the first character string is displayed in the moving image and the moving image. A step of receiving a search target character string, a second character string including the search target character string, time information corresponding to the second character string, and a moving image corresponding to the second character string from the database Outputting a screen including a step of searching, a first display area for reproducing the searched moving image, and a second display area for displaying the searched second character string and time information in chronological order Have. According to this aspect, it is possible to provide a technology capable of quickly searching for a specific portion in the moving image that the user desires to view.

A program according to another aspect of the present disclosure is a computer that generates a second character string generated by character recognition of an image of a first character string for a moving image in which images of a plurality of first character strings are displayed. Storage means for storing a database for storing time information indicating the time when the image of the first character string is displayed in the moving image, the moving image, and receiving means for receiving a character string to be searched; Search means for searching the database for a second character string including the character string to be searched for, time information corresponding to the second character string, and a moving image corresponding to the second character string; Function as an output means for outputting a screen including a first display area for reproducing the second character string and the second display area for displaying the searched second character string and time information in chronological order. According to this aspect, it is possible to provide a technology capable of quickly searching for a specific portion in the moving image that the user desires to view.

According to the present disclosure, it is possible to provide a technology capable of quickly searching for a specific portion in a moving image that the user desires to view.

It is a figure showing an example of the animation distribution system concerning an embodiment. It is a figure which shows the hardware constitutions example of a delivery server. It is a figure which shows the function block structural example of a delivery server. It is a flow chart which shows an example of the processing procedure at the time of generating lecture data DB. It is a figure which shows the specific example of the process which extracts the image of a character display area. It is a figure which shows the process which specifies a keyword from the image of a character unit. It is a figure which shows an example of lecture data DB. It is a figure which shows an example of the screen displayed on a terminal. It is a figure which shows an example of the screen displayed on a terminal.

Preferred embodiments of the present invention will be described with reference to the accompanying drawings. In addition, what attached the same code | symbol in each figure has the same or same structure.

<System configuration>
FIG. 1 is a diagram illustrating an example of a moving image distribution system according to an embodiment. The moving image distribution system includes a distribution server 10 and a terminal 20. The distribution server 10 and the terminal 20 can communicate with each other via a wireless or wired communication network N. Although one terminal 20 is illustrated in FIG. 1, a plurality of terminals 20 may be included in the present moving image distribution system. In the present embodiment, the distribution server 10 and the terminal 20 may be collectively referred to as an information processing apparatus, or only the distribution server 10 may be referred to as an information processing apparatus.

The distribution server 10 is a server that distributes a lecture moving image, and has a function of transmitting data of the lecture moving image requested from the terminal 20 to the terminal 20. The distribution server 10 may be one or more physical or virtual servers, or may be a cloud server.

The terminal 20 is a terminal operated by the user, and may be a terminal provided with a communication function, such as a smartphone, a tablet terminal, a mobile phone, a personal computer (PC), a laptop PC, a personal digital assistant (PDA), a home gaming device, etc. For example, any terminal can be used.

In the present embodiment, the user inputs a character string to be searched (search keyword), and an image of a character string handwritten by the lecturer on the blackboard in the lecture moving image (hereinafter referred to as "handwritten character string") in the lecture moving image. In addition, it is possible to search for lecture videos that include the search target character string. For example, when the user inputs “organic compound” as a search target character string on the search screen of the terminal 20, a lecture moving image in which the lecturer wrote “organic compound” on the blackboard is listed on the screen of the terminal 20. In addition, when the user selects a lecture video to view from the lecture videos displayed in a list, reproduction of the lecture video is started on the screen of the terminal 20, and the lecturer places the blackboard on the time axis of the lecture video. The times written as “organic compounds” (for example, 5 minutes 30 seconds, 15 minutes 10 seconds and 23 minutes 40 seconds in a 30-minute moving image, etc.) are listed. When the user selects one of the listed times, the lecture moving image being played moves to the selected time.

In order to realize such an operation, the distribution server 10 uses the text information (second character string) generated by character recognition of the image of the handwritten character string (first character string) and the handwriting of the lecture moving image The time information indicating the time when the image of the character string is displayed is associated with the lecture moving image (or the information uniquely identifying the lecture moving image) and stored in the database. More specifically, the time information may be information indicating a time from when the handwritten character string is displayed in the lecture moving image to when the display ends (hereinafter, referred to as “appearing time”). In the present embodiment, the database is called “lecture data DB (Database)”. As a result, the distribution server 10 searches the lecture moving image including the character string to be searched using the lecture data DB, so that the character string to be searched for is a handwritten sentence or character string written by the lecturer on the blackboard. It will be possible to search the included lecture videos.

<Hardware configuration>
FIG. 2 is a diagram showing an example of the hardware configuration of the distribution server 10. As shown in FIG. The distribution server 10 includes a central processing unit (CPU) 11, a storage device 12 such as a memory, a communication IF (Interface) 13 for performing wired or wireless communication, an input device 14 for receiving an input operation, and an output device 15 for outputting information. Have. Each functional unit described in the functional block configuration to be described later can be realized by processing that a program stored in the storage device 12 causes the CPU 11 to execute. The program can be stored, for example, in a non-temporary recording medium.

<Function block configuration>
FIG. 3 is a diagram showing an example of a functional block configuration of the distribution server 10. As shown in FIG. The distribution server 10 includes a reception unit 101, a search unit 102, an output unit 103, a generation unit 104, and a storage unit 105. The storage unit 105 stores lecture data DB.

The reception unit 101 has a function of receiving a search target character string input by the user on the screen of the terminal 20.

The search unit 102 performs lecture data on “text information” including the character string to be searched for received by the reception unit 101, “appearance time” corresponding to the text information, and “lecture moving image” corresponding to the text information. Search from DB

The output unit 103 displays a region (first region) for reproducing the lecture moving image searched by the search unit 102, and a region (second region) for displaying the searched text information and appearance time (time information) in chronological order. Output a screen that contains The output screen is displayed on the display of the terminal 20. The output unit 103 may have, for example, a web server function, and may have a function of transmitting a website to which a lecture moving image is distributed to the terminal 20. Alternatively, the output unit 103 may have a function of transmitting, to the terminal 20, content for displaying a lecture moving image or the like on the screen of an application installed on the terminal 20.

The generation unit 104 generates a lecture data DB by performing character recognition on a handwritten character string displayed in a moving image of a lecture moving image. The generation unit 104 further includes an area extraction unit 1041, a division unit 1042, a single character recognition engine 1043, a character string recognition engine 1044, and a DB generation unit 1045. The processing performed by the area extraction unit 1041, the division unit 1042, the single character recognition engine 1043, the character string recognition engine 1044, and the DB generation unit 1045 will be described later.

<About generation of lecture data DB>
Subsequently, a method of generating the lecture data DB will be specifically described with reference to FIG. In the following description, the generation unit 104 of the distribution server 10 is described on the premise of generating a lecture data DB, but the distribution server 10 does not necessarily have to create the lecture data DB by itself, and an external information processing apparatus It may be generated by In that case, the generation unit 104 is not implemented in the distribution server 10, but implemented in another information processing apparatus different from the distribution server 10, and the lecture data DB generated by the information processing apparatus is stored in the storage unit 105 of the distribution server 10. It may be registered in the

FIG. 4 is a flowchart showing an example of a processing procedure when generating the lecture data DB.

In step S101, the area extraction unit 1041 extracts an image (first image) of a character display area in which a handwritten character string is displayed in a lecture moving image. Further, the time (appearing time) from the display of the handwritten character string to the end of the display in the lecture moving image is determined and output. If there are a plurality of handwritten character strings, extraction of an image of the character display area and determination of appearance time are performed on each handwritten character string.

A specific example of processing for extracting an image of a character display area and determining an appearance time for one handwritten character string will be described with reference to FIG. The region extraction unit 1041 performs image processing on a moving image (FIG. 5A) in which a lecturer is writing characters on a blackboard (FIG. 5A) by a predetermined number of frames (for example, 80 frames). Extract the area (area other than background) that is distinguished from For example, the region extraction unit 1041 outputs a score (probability) indicating the possibility of being different from the background image in pixel units and in units of the predetermined number of frames. As a result of this processing, a score equal to or greater than a predetermined value is output for the pixels of the area where the characters are written on the blackboard and the pixels of the area where the lecturer appears.

Subsequently, the region extraction unit 1041 extracts pixels whose output score is equal to or more than a predetermined value. An example of the extracted pixel is shown in FIG. 5 (b). An extraction part 500 shown in FIG. 5B indicates a part where the extracted pixels are gathered. In addition, when extracting the area distinguished from the background, the area extraction unit 1041 preferably performs a process excluding the area in which the lecturer appears. For example, in the process of extracting an area distinguished from the background, the area extracting unit 1041 extracts pixels (areas) whose score variation in a predetermined time length (for example, 10 seconds) is equal to or less than a predetermined threshold. You may In this way, it is possible to prevent the pixels at which the moving instructor in the moving image is recognized from being extracted as an area distinguished from the background. Also, in the process of extracting the area distinguished from the background by the area extraction unit 1041, the area extraction unit 1041 determines that the area of the area where the extracted pixels are gathered is larger than the predetermined value. However, the instructor may be regarded as being extracted and treated as an exclusion target. This makes it possible to prevent the pixel in which the lecturer appears from being extracted as an area distinguished from the background. The region extraction unit 1041 determines the time from the appearance of the part where the pixels are gathered to the disappearance in the lecture moving image as the appearance time when the handwritten character string is displayed in the lecture moving image.

Subsequently, the area extracting unit 1041 determines the position (for example, the pixel position at the lower left of the rectangle when the lower left of the moving image is the starting point) and the size (longitudinal and lateral directions) Determine the size of The frame 510 shown in FIG. 5B is an example of the determined rectangular frame.

Subsequently, the region extraction unit 1041 cuts out the region surrounded by the rectangular frame from the image of an arbitrary frame from the images of the respective frames constituting the lecture moving image during the appearance time, whereby handwritten characters in the lecture moving image Extract the image of the character display area where the column is displayed.

In step S102, the dividing unit 1042 divides the image of the character display area extracted by the area extracting unit 1041 into an image (second image) of one character unit constituting a handwritten character string. The dividing unit 1042 binarizes the image of the character display area and, for example, regards a portion where the illuminance of all pixels in the vertical axis direction of the image is less than a predetermined threshold as a break of the character. Divide into The specific example of the position of a break is shown in FIG.5 (c).

In step S103, the single character recognition engine 1043 performs character recognition on the image of one character unit constituting the handwritten character string, and outputs a plurality of candidate characters for each image. A specific example is shown using FIG. Candidate characters 1 to 5 shown in FIG. 6 indicate examples of a plurality of candidate characters output by performing character recognition on each of the “different”, “sex”, and “body” images.

If the single character recognition engine 1043 has high accuracy character recognition capability, the candidate character output by the single character recognition engine 1043 is directly used as text information in the lecture data DB without proceeding to the processing procedure of step S104. It may be stored. For example, in the example of FIG. 6, the single character recognition engine 1043 has the ability to correctly recognize "different", "sex", and "body" with respect to "different", "sex", and "body" images. In this case, the “isomer” obtained by combining the recognized “different”, “sex”, and “body” text may be stored as it is in the lecture data DB as text information.

In step S104, the character string recognition engine 1044 (output unit) generates a plurality of candidate character strings by combining a plurality of candidate characters output for each one-character image according to the arrangement order of characters in the handwritten character string. . For example, in the example of FIG. 6, it is generated by combining five candidate characters corresponding to "different", five candidate characters corresponding to "sex", and five candidate characters corresponding to "body". Generate 125 (5 × 5 × 5) candidate character strings.

Here, the character string recognition engine 1044 has learned in advance a plurality of keywords (character strings) which may be used in the lecture moving image, and by inputting an arbitrary character string, the plurality of keywords can be selected. It has a function of outputting a keyword that is determined to be most similar to the input character string and a score indicating the similarity. For example, in the case of a lecture moving image of Japanese history, the keyword which may be used in the lecture moving image is a keyword as described in the index of a textbook such as "Yamaidai Country" or "Tokugawa Ieyasu". However, keywords are generally different for each subject. Therefore, prepare a character string recognition engine 1044 in which different keywords are learned according to the attribute (subject or lecture name, etc.) of the lecture moving image, and use the character string recognition engine 1044 according to the attribute of the lecture moving image The processing procedure of S104 may be performed.

Subsequently, the character string recognition engine 1044 selects one of a plurality of generated candidate character strings from among keywords (character strings) learned in advance as a plurality of keywords (character strings) which may be used in the lecture moving image. The keyword (character string) determined to be most similar is output as text information corresponding to the handwritten character string. More specifically, the character string recognition engine 1044 outputs, for each of the plurality of generated candidate character strings, a keyword (similarity score (score)) to the keyword judged to be the most similar, and the outputted keyword having the highest similarity. Is output as text information corresponding to the handwritten character string.

In FIG. 6, the string recognition engine 1044 outputs and outputs the degree of similarity between each of the 125 candidate strings and the learned keywords (including at least the “isomer” in the example of FIG. 6). The example at the time of outputting the learned keyword "isomer" with the highest similarity as text information corresponding to a handwritten character string is shown. Even if the single character recognition engine 1043 can not correctly recognize the character and the 125 candidate strings do not contain the "isomer" itself, there are a plurality of candidate strings. If a candidate character string (for example, "disidentate" etc.) similar to "isomer" is included in "," "isomer" is output from character string recognition engine 1044 as text information corresponding to a handwritten character string. It will be.

The generation unit 104 repeatedly performs the processing procedure up to the steps S101 to 104 described above for each handwritten character string displayed in the lecture moving image, whereby each of the plurality of handwritten character strings displayed in the lecture moving image is generated. For keywords and time of appearance.

In step S105, the DB generation unit 1045 receives the text information output from the character string recognition engine 1044 in the processing procedure in step S104, the appearance time output from the area extraction unit 1041 in step S101, and the lecture moving image to be processed. A lecture data DB is generated in correspondence with (a file name of a lecture moving image may be used).

FIG. 7 is a diagram showing an example of the lecture data DB. An identifier for uniquely identifying a lecture moving image is stored in the "lecture moving image". The identifier includes the subject of the lecture video, the lecture name, and the like. The identifier may be, for example, a file name including a subject of a lecture moving image. The “appearing time” stores the time from when the handwritten character string is displayed in the lecture video until it disappears. Text data corresponding to the handwritten character string is stored in the "text information". In the example of FIG. 7, “complex ion formation reaction” is displayed for 0 minutes 05 seconds to 3 minutes 10 seconds in the lecture animation of “chemistry_first course_structure determination of organic compound_chapter 1” Data indicating that "elemental analysis" is displayed for one minute 20 seconds to three minutes 10 seconds is stored.

<About search of lectures>
Subsequently, a processing procedure when the user searches for a lecture moving image will be specifically described. 8 and 9 are diagrams showing an example of a screen displayed on the terminal 20. FIG. FIG. 8A is an example of a screen for searching a lecture moving image. In the screen for searching a lecture moving image, an input box 1001 for inputting a character string to be searched and a subject of the lecture moving image to be searched is provided. When the search button displayed on the right side of the input box 1001 is pressed, the search unit 102 accesses the lecture data DB, and the character string of the search target in the text information of the lecture moving image corresponding to the input subject Search whether there is a lecture video that includes. When there is a lecture moving image in which a text string to be searched is included in the text information, the output unit 103 outputs a screen displaying a list of the searched lecture moving images. Note that the output unit 103 outputs a screen displaying a list of lecture moving images when there are a plurality of searched lecture moving images, and when there is one searched lecture moving image, “replay the lecture moving image described later is output. It is also possible to make a direct transition to the screen to be displayed (FIG. 9 (a)).

FIG. 8B is an example of a screen displaying a list of searched lecture moving images. The search results are displayed in a list in the display area 1003. For example, if the user selects "Chemistry" as the subject and enters "Aeon" as the search target character string, the lecturer writes "Ion" on the blackboard from among the lecture videos on chemistry. The lecture videos are displayed in a list in the display area 1003 as a search result.

Subsequently, when the user selects a lecture moving image desired to be viewed from among the lecture moving images displayed in a list in the display area 1003, a transition is made to a screen for reproducing the lecture moving image. Since the display area 1003 has a function of accepting selection of a lecture moving image desired to be viewed by the user in addition to displaying a list of searched lecture moving images, the user views the screen including the display area 1003 May be referred to as a screen for receiving a selection of a lecture moving image for which

An example of the screen for reproducing the lecture moving image is shown in FIG. In FIG. 9A, a display area 2001 (first area) for reproducing a lecture moving image, text information including a character string to be searched, and a start time at which the display of a handwritten character string is started are arranged in the horizontal direction. A display area 2002 (second area) arranged and displayed in chronological order and a display area 2004 (third area) displaying a character string searched in the past with respect to the subject of the lecture moving image reproduced in the display area 2001 are included. At the top of the display area 2002, a button 2003 for displaying a list of start times and text information is displayed. When the user presses the button 2003, as shown in FIG. 9 (b), a display in which text information including a character string to be searched and time stamp information are arranged in chronological order in the vertical direction instead of the display area 2002. Area 2005 (second area) is displayed.

In the display area 2002 and the display area 2005, a message indicating that the search result is a handwritten character string to be displayed in the lecture moving image (a handwritten character string corresponding to the searched text information is displayed in the lecture moving image The word "board" is displayed as a message indicating that there is a problem. Further, in the upper part of the display area 2002 and the display area 2005, the number of times the text information including the character string to be searched for is searched is displayed in the display area 2102.

Of the text information displayed in the display area 2002 and the display area 2005, a portion corresponding to the character string to be searched may be highlighted. For example, in the examples of FIG. 9A and FIG. 9B, the portion of “ion” which is a character string to be searched out of “complex ion formation reaction” and “hydrogen ion” is highlighted.

The display area 2002 and the display area 2005 may further display an end time at which the display of the handwritten character string ends. For example, in the display area 2002 and the display area 2005, the appearance time of the handwritten character string may be displayed as "0:05 to 3:10 complex ion forming reaction".

Note that in the display area 2001, information indicating the position where the handwritten character string corresponding to the searched text information is displayed in the lecture moving image may be displayed superimposed on the lecture moving image. For example, as shown in FIGS. 9A and 9B, in the display area 2001, a frame indicating a position where “complex ion forming reaction”, which is the searched text information, is displayed in the lecture moving image 2101 may be displayed. In order to make 2101 displayable, information indicating the position at which the frame 2101 is displayed and the size of the frame 2101 may be further stored in the lecture data DB for each record. The information stored in the lecture data DB as information indicating the position where the frame 2101 is displayed and the size of the frame 2101 includes the position and size of the rectangular frame surrounding the set of extracted pixels described in step S101 of FIG. The same information as the information indicating the position may be stored. In addition, the frame 2101 may be continuously displayed on the display area 2001 during the appearance time corresponding to the searched text information.

When the user selects a lecture moving image in the display area 1003 (FIG. 8B), reproduction of the lecture moving image is started in the display area 2001. Subsequently, when the user selects the start time and text information desired to be viewed from the start time and text information displayed in display area 2002 or display area 2005, the lecture moving image displayed in display area 2001 is displayed. The selected start time or the time which is a predetermined time before (for example, 10 seconds before) the start time is reproduced. For example, when the user taps a portion displayed as 2:15 in the display area 2002, a lecture moving image is reproduced in the display area 2001 from 2:15 or a predetermined time (for example, 2:06). Ru.

When the user selects a lecture moving image in the display area 1003 (FIG. 8B), reproduction of the lecture moving image is not started in the display area 2001, and the user starts the reproduction start button displayed in the display area 2001. Or, the user can start playing back lecture videos for the first time by selecting time stamp information desired to be viewed from time stamp information and text information displayed in display area 2002 or display area 2005. It may be done.

Also, the user may swipe the display area 2002 from right to left (or left to right) to display the next (or previous) start time and text information. For example, in the example of FIG. 9A, when the user swipes the display area 2002 from right to left, the text having a start time of 0: 05 disappears from the left and the text has a start time of 2: 15 The information may move from the right to the left, and the next text information may appear on the right.

Similarly, when the user swipes the display area 2005 from top to bottom (or from bottom to top), the next (or previous) time stamp information and text information may be displayed.

When the number of characters of the text included in the text information searched by the search unit 102 is equal to or more than the predetermined number of characters, the output unit 103 searches the display area 2002 for at least the text included in the searched text information. Only partial text including the target character string may be output. As a result, when the number of characters of the text included in the text information is too large to display all the characters in the display area 2002 or the display area 2005, or when the terminal 20 is a smartphone or the like and the display size is small Even when it is difficult to display all the information, it is possible to display the text information without largely sacrificing the visibility.

In addition, the character string displayed in the display area 2004 for the subject of the lecture moving image is input among the character strings previously input by the plurality of users using the moving image distribution system as the search target character string It may be displayed in descending order of the number of times performed. When the user selects a character string displayed in the display area 2004, the selected character string may be automatically input to the input box 1001.

The present embodiment has been described above. In this embodiment, the lecture data DB stores text information obtained by converting the characters written by the lecturer on the blackboard into text in the lecture video, and comparing the character string to be searched with the text information to search for the lecture video To do. As a result, the present embodiment has a technical effect that the search speed can be improved as compared with a method of searching for a character string while directly analyzing a moving image of a lecture moving image.

In the above description, in the appearance time stored in the lecture data DB, the time when the display of the handwritten character string is started (the time when the character string is written on the blackboard) and the time when the display is ended (for example And the like) are included, but it is also possible to include only the time when the display of the handwritten character string is started. Thereby, the data volume of lecture data DB can be reduced. The time when the display of the handwritten character string is started and the time when the display is terminated may be collectively referred to as "time information", or only the time when the display of the handwritten character string is started is "time information". You may call it.

In the above description, although the moving image in which the character string is displayed is described as a lecture moving image in which the lecturer gives a lecture while writing the handwriting character on the blackboard, the present embodiment relates to a lecture moving image and handwritten characters. It is not limited to. The present embodiment can be applied to any character string or moving image as long as the character string is displayed on the moving image.

The embodiments described above are for the purpose of facilitating the understanding of the present invention, and are not for the purpose of limiting the present invention. The flowcharts, sequences, and elements included in the embodiments and their arrangements, materials, conditions, shapes, sizes, and the like described in the embodiments are not limited to those illustrated, and can be changed as appropriate. In addition, configurations shown in different embodiments can be partially substituted or combined with each other.

Claims

For a moving image in which a plurality of first character string images are displayed, a second character string generated by character recognition of the first character string image and an image of the first character string in the moving image are displayed A storage unit that stores a database that stores time information indicating time of day and the moving image in association with each other;
A reception unit that receives a character string to be searched;
A search unit configured to search the database for a second character string including the character string to be searched, time information corresponding to the second character string, and a moving image corresponding to the second character string;
An output unit for outputting a screen including a first display area for reproducing a searched moving image, and a second display area for displaying a searched second character string and time information in chronological order;
An information processing apparatus having
The output unit outputs, on the second display area, a screen on which the retrieved second character string and time information are arranged in chronological order in the horizontal direction or in the vertical direction.
An information processing apparatus according to claim 1.
The output unit further displays a message indicating that an image of a first character string corresponding to the searched second character string is displayed in the moving image in the second display area.
The information processing apparatus according to claim 2.
The output unit superimposes information indicating a position at which an image of a first character string corresponding to the searched second character string is displayed in the moving image, and displays the information.
The information processing apparatus according to any one of claims 1 to 3.
The output unit highlights a portion corresponding to the search target character string in the second character string displayed in the second display area.
The information processing apparatus according to any one of claims 1 to 4.
The above video is a video of the lecturer taking a lesson using a blackboard,
The first character string is a character string including a plurality of handwritten characters handwritten on the blackboard.
The information processing apparatus according to any one of claims 1 to 5.
An extraction unit which extracts a first image which is an area where an image of a first character string is displayed in a moving image, and outputs time information when display of the image of the first character string is started in the moving image;
A division unit that divides the first image extracted by the extraction unit into a second image for each character included in the first character string;
A character recognition unit that outputs a plurality of candidate characters for each of the second images by performing character recognition on each of the plurality of second images;
For a plurality of candidate character strings generated by combining the plurality of candidate characters output for each of the second images in accordance with the arrangement order of characters in the first character string, a plurality of possibilities may be used in the moving image An output unit that outputs, as a second character string, a character string determined to be most similar to any of the plurality of candidate character strings among the character strings of
A generation unit that generates a database in which the second character string output by the output unit, the time information output by the extraction unit, and the moving image are associated;
An information processing apparatus having
For a moving image in which a plurality of first character string images are displayed, a second character string generated by character recognition of the first character string image and an image of the first character string in the moving image are displayed A moving image search method performed by an information processing apparatus having a storage unit that stores a database in which time information indicating the time of day and the moving image are stored in association with each other,
Receiving a search target character string;
Searching the database for a second character string including the search target character string, time information corresponding to the second character string, and a moving image corresponding to the second character string;
Outputting a screen including a first display area for reproducing the searched moving image, and a second display area for displaying the searched second character string and time information in chronological order;
Video search method with.
Computer,
For a moving image in which a plurality of first character string images are displayed, a second character string generated by character recognition of the first character string image and an image of the first character string in the moving image are displayed Storage means for storing a database in which time information indicating the time of day and the moving image are stored in association with each other;
Receiving means for receiving a character string to be searched;
Search means for searching the database for a second character string including the search target character string, time information corresponding to the second character string, and a moving image corresponding to the second character string;
Outputting means for outputting a screen including a first display area for reproducing the searched moving image, and a second display area for displaying the searched second character string and time information in chronological order;
Program to function as.