WO2018224839A2

WO2018224839A2 - Methods and systems for generating a reaction video

Info

Publication number: WO2018224839A2
Application number: PCT/GB2018/051564
Authority: WO
Inventors: Dušan BREJKA; Marc Williams
Original assignee: Reactoo Ltd
Priority date: 2017-06-08
Filing date: 2018-06-08
Publication date: 2018-12-13
Also published as: WO2018224839A3; GB2563267A; GB201709162D0

Abstract

There is described a computer-implemented method of generating a reaction video stream showing reaction to stream of video content. The method comprises a server receiving a content video stream, converting the content video stream to a real-time protocol to generate a converted video stream, and multicasting the converted video stream to one or more viewer devices. The server receives, from each of the one or more viewer devices, a viewer video stream encoded with a real-time protocol, and processing the converted video stream and each viewer video stream to generate a reaction video stream having a plurality of panes shown in parallel, one of the plurality of panes showing a sequence of images corresponding to the converted video stream and each other pane showing a sequence of images corresponding to a respective viewer video stream.

Description

METHODS AND SYSTEMS FOR GENERATING A REACTION VIDEO

Technical Field

The present invention relates to methods and systems for generating a reaction video showing the reaction of at least one viewer of a video presentation to an event occurring in that video presentation. The invention has particular relevance to generating a reaction video showing the reaction of multiple viewers of the same video presentation on remote display devices. Background

During television coverage of a live sporting event, sometimes video footage is shown of the reactions of different groups of fans watching the television coverage at particular moments (for example, when a goal is scored in a football match). The popularity of television programs such as British reality television program Gogglebox has also demonstrated that there is a public appetite for capturing the reactions of viewers to moments within television programs of other types. Furthermore, video footage of the reactions of viewers to online videos, such as online videos viewed on YouTube or Facebook, are commonly shared on social media. In some cases, composite reaction videos are formed in which, for example, footage of a viewer reacting to an online video is embedded within the online video itself. Patent specifications US2016/0366203A1, US2015/120413A1 and US2014/0096167A1 discuss methods of capturing and processing video footage of reactions of viewers to online videos.

Television broadcasts, and in particular television coverage of sporting events, are increasingly being made available over the Internet through live streaming services. Devices that are used for viewing live streams of video data are necessarily connected to the Internet, and attempts have been made to use this connectivity to introduce a social aspect to the user experience of watching live stream broadcasts. For example, patent specification WO2012/150331A1 discusses a method of synchronising a live stream of video data between multiple viewers that are in communication whilst viewing the live stream, for example in a video conference, such that all of the viewers react to an event within the live stream at the same time. Summary

According to a first aspect of the present invention, there is provided a computer-implemented method of generating a reaction video in response to an event in a live stream of video content, the method comprising apparatus having an internal clock: receiving live stream data, the live stream data comprising one or more segments of live stream video data and associated metadata, wherein the metadata comprises at least one timestamp; buffering the live stream video data; recording reaction video data captured by a camera; rendering the live stream video data; calculating a temporal offset using the at least one timestamp and a timing provided by the internal clock; receiving event data indicating a time related to the event; determining, using the event data and the calculated temporal offset, the time at which the event is rendered; and saving a portion of reaction video data corresponding to an interval spanning a temporal window encompassing the time at which the event is rendered.

Calculating a temporal offset using the at least one timestamp allows a reaction video to be generated in which the reaction of a viewer to an event in the live stream occurs at a prescribed time within the reaction video. In this way, if reaction videos of different viewers are generated in response to the same event, and the reaction videos are rendered simultaneously, the reactions of the different viewers occur simultaneously.

According to a second aspect of the present invention, there is provided a data processing system for generating a reaction video in response to an event in a live stream of video content, the system comprising one or more user devices and a server, wherein each of the one or more user devices comprises a respective internal clock and is communicatively coupled to the server and is operable to: receive live stream data, the live stream data comprising one or more segments of live stream video data and associated metadata, wherein the metadata comprises at least one timestamp; buffer the live stream video data; record reaction video data captured by a camera; render the live stream video data; calculate a temporal offset using the at least one timestamp and a timing from the internal clock for that user device; receive event data indicating a time related to the event; determine, using the event data and the calculated temporal offset, the time at which the event is rendered; and save a portion of reaction video data corresponding to an interval spanning a temporal window encompassing the time at which the event is rendered, and wherein the server is adapted to receive reaction video data from the one or more user devices.

A data processing system having a configuration as described allows for the server to generate composite reaction videos using reaction video data from multiple user devices without the need to synchronise the live streams of video data between the multiple user devices.

Further features and advantages of the invention will become apparent from the following description of preferred embodiments of the invention, given by way of example only, which is made with reference to the accompanying drawings.

Brief Description of the Drawings

Figure 1 schematically shows a first example of a data processing system used to generate reaction videos in response to an event in a live stream of video data.

Figure 2 schematically shows an event server in the system of Figure 1.

Figure 3 schematically shows a reaction video server in the system of Figure 1.

Figure 4 schematic shows a user device in the system of Figure 1.

Figure 5 is a flow chart representing the routine executed by the data processing system of Figure 1 to render a live stream of video data.

Figure 6 is a flow chart representing the routine executed by the data processing system of Figure 1 in response to an event in a live stream of video data.

Figure 7 is a flow chart representing an alternative routine executed by the data processing system of Figure 1 in response to an event in a live stream of video data.

Figure 8 is a flow chart representing the routine executed by a user device in response to a drop in the rate of data transfer between the streaming server and a user device.

Figure 9 shows a second example of a data processing system used to generate reaction videos in response to an event in a live stream of video data.

Figure 10 shows a third example of a data processing system used to generate reaction videos in response to an event in a live stream of video data.

Figure 11 shows a fourth example of a data processing system used to generate reaction videos in response to an event in a live stream of video data. Figure 12 shows an example of a data processing system used to generate a reaction video in response to a social media response.

Figure 13 shows an example of an alternative data processing system used to generate reaction videos in response to an event in a live stream of video data.

Detailed Description

In the following description, "video content" refers to a media presentation having visual content and possibly also having audio content. Similarly, "video data" refers to data that, when decoded using a suitable decoder, is rendered to generate visual and possibly also processed to generate audio signals. The word "broadcast" is to be interpreted in a broad sense, and is intended to include distribution of data over a network via a subscription-free service, as well as distribution of data to a limited number of subscribing client devices (sometimes referred to as multicasting).

As shown in Figure 1, a first example of a data processing system in accordance with the present invention includes streaming server 110, event server 120, reaction video server 130, and one or more user devices of which only user device 160, operated by user 190, is shown. Each of the one or more user devices may be, for example, a personal computer, a laptop, a tablet, a smartphone, a games console, or a smart television, and the one or more user devices are not necessarily of the same type as one another. In this example, user device 160 is a personal computer. Streaming server 110, event server 120, reaction video server 130, and each of the one or more user devices, are interconnected via network 100. In this example, network 100 is the Internet. Event server 120 and user device 160 are further able to communicate with each other via Message Queuing Telemetry Transport (MQTT) broker 150.

As shown in Figure 2, event server 120 includes power supply 131 and system bus 133. System bus 133 is connected to: CPU 135; memory 137; network interface 139; MQTT publisher module 141; and internal clock 143. Memory 137 contains storage 145, working memory 147, and cache 149. Working memory 147 includes Random Access Memory (RAM) and Read Only Memory (ROM). Storage 145 includes one or more of: hard disk drives; optical disk storage; flash memory; and solid state drives. As shown in Figure 3, reaction video server 120 includes power supply 331 and system bus 333. System bus 333 is connected to: CPU 335; memory 337; network interface 339; and internal clock 343. Memory 337 contains storage 345, working memory 347, and cache 349. Working memory 347 includes Random Access Memory (RAM) and Read Only Memory (ROM). Storage 345 includes one or more of: hard disk drives; optical disk storage; flash memory; and solid state drives.

As shown in Figure 4, user device 160 includes power supply 161 and system bus 163. System bus 163 is connected to: CPU 165; memory 167; internal clock 169; network interface 171; MQTT subscriber module 173; camera 175; speaker 177; display 179; and input/output (I/O) devices 181. I/O devices 181 may include, for example, a keyboard, a mouse, a touchscreen, a microphone, headphones, additional displays and additional speakers. Memory 167 contains storage 183, working memory 185, and cache 187. Working memory 185 includes RAM and ROM. Storage 183 includes one or more of: hard disk drives; optical disk storage; flash memory; and solid state drives. Internal clock 169 refers to hardware and/or software elements that function together as a clock. Internal clock 169 may be a different combination of hardware and software elements to the main internal clock of user device 160. In this example, internal clock 169 is synchronised to Coordinated Universal Time (UTC), though the method described hereafter does not require internal clock 169 to be synchronised to UTC, or to be synchronised with the internal clocks of any of the servers in Figure 1.

The functions of the components of the system of Figure 1 will now be outlined in broad terms in the context of a typical data processing operation in which a reaction video is generated in response to an event in a live stream of video content, the live stream of video content being viewed by user 190 using user device 160.

Streaming server 110 is operated by a live stream service provider, and broadcasts live stream data to user devices via network 100.

Event server 120 and reaction video server 130 are operated by a reaction video service provider. Event server 120 provides event data, via MQTT broker 150, to the user devices receiving live stream data from streaming server 110, including user device 160, in response to an event occurring in the live stream data provided by streaming server 110. Reaction video server 130 stores reaction video data, and distributes reaction videos to user devices subscribed to the reaction video service.

User 190 has an account with the operator of the reaction video service, and watches the live stream of video content using user device 160. User device 160 records reaction video data from camera 177, where camera 177 is arranged to face user 190.

In response to receiving event data from event server 120, user device 160 generates a video containing the reaction of user 190 to the event, and sends the reaction video to reaction video server 130.

As shown in Figure 5, a data processing method according to the present invention begins with user device 160 requesting, at S501, live stream data from streaming server 110. In this example, the user device 160 requests live stream data in response to user 190 selecting a live stream of video content from a list provided by an application stored on user device 160, the application being hosted by event server 120.

In response to user device 160 requesting live stream data, event server 130 connects, at S503, user device 160 to streaming server 110. In this example, connecting user device 160 to streaming server 110 involves providing user device 160 with a uniform resource locator (URL) corresponding to the network location from which the live stream data is broadcasted.

Next, event server 120 connects, at S505, user device 160 to a messaging service. In this example, the messaging service is a simple messaging service implemented using the MQTT protocol, and connecting user device 160 to the messaging service includes connecting MQTT subscriber module 173 of user device 160 to MQTT broker 150. The MQTT protocol allows the transfer of simple messages between event server 120 and user device 160 with a low bandwidth requirement.

Streaming server 110 broadcasts, at S507, live stream data over network 100.

In the context of live streaming, broadcasting refers to the process of a server making data available to be downloaded from a network location by one or more client devices, and the time at which data is broadcasted refers to the time at which the data is first made available, by the server, to be downloaded by the one or more client devices. The live stream data includes segments of live stream video data and associated metadata. In this example, each segment of live stream video data corresponds to the same duration of video content, and each segment of live stream video data corresponds to a duration of video content of between one and five seconds. The metadata associated with the live stream video data includes a UTC timestamp indicating the time, to a precision of one millisecond, at which broadcasting of live stream video data from streaming server 110 begins. The metadata further includes data indicating the duration of video content that each segment of live stream video data corresponds to, and data indicating the order of the segments in the live stream. In response to receiving metadata, user devices download available segments from streaming server 110 in the order specified by the metadata.

User device 160 receives, at S509, live stream data from streaming server 110. User device 160 stores the live stream metadata in working memory 185 and buffers, at S511 , segments of live stream video data in the order specified by the metadata. User device 160 buffers the segments of live stream video data by storing the segments in working memory 185, such that they can be decoded and rendered in the correct order at a later time.

As user device 160 buffers segments of live stream video data, user device activates, at S513, camera 177. Camera 177 is arranged to face user 190 and is used to capture reaction video data.

After a prescribed number of the segments of live stream video data have been buffered by user device 160, user device 160 starts rendering, at S515, the live stream video data. The prescribed number of segments may depend on one or more of: the data size of each segment; the duration of video content that each segment corresponds to; a data transfer rate measured by user device 160; and data received by user device 160 indicating a data transfer rate. Preferably, the prescribed number of segments is sufficiently high to allow rendering of video content to continue in the event of typical fluctuations in data transfer rate. In addition to buffering segments of live stream video data, user device 160 stores live stream video data in working memory 185 for a prescribed period of time after it is rendered, for example ten minutes, and live stream video data that has been stored for this prescribed period of time is automatically deleted. Storing live stream video data is necessary for later stages in the data processing routine in which live stream video data is used to generate composite reaction videos. Compared with storing the live stream video data indefinitely, storing live stream video data for a prescribed period of time is advantageous if either or both of the working memory 185 and storage 184 of user device 160 have a low data capacity, and particularly in cases where the duration of the live stream of video content is long or unbounded.

At the same time that user device 160 starts rendering video content, user device 160 starts recording, at S517, reaction video data captured by camera 177. By recording reaction video data, user device 160 stores reaction video data in working memory 185. In this example, user device 160 stores reaction video data in working memory 185 for a prescribed period of time, for example ten minutes, and reaction video data that has been stored for this prescribed period of time is automatically deleted.

Finally, user device 160 calculates, at S519, a temporal offset between the broadcasting and the rendering of the live stream of video data. The temporal offset is the difference between the time at which streaming server 110 broadcasts the live stream video data and the time measured by internal clock 169 when user device 160 renders the live stream video data. In this example, the temporal offset is calculated as the difference between the UTC timestamp received at S509 and the time provided by internal clock 169 when user device 160 starts rendering the live stream video data.

The routine of Figure 6 begins when event server 130 sends, at S601, event data to user device 160 as a push message through MQTT broker 150 using the MQTT protocol. In this example, event server 130 receives information indicating that an event has occurred from a third party application programming interface (API), though in other examples, other methods of identifying an event may be used. The event data sent at S601 includes a UTC timestamp indicating a time, to a precision of one millisecond, at which an event is broadcasted by streaming server 110. An event is assumed to be broadcast at a single point in time. The event data further includes information indicating the type of the event. For example, if the live stream of video content corresponds to live coverage of a football match, types of event include: a goal being scored; a free kick being awarded; a free kick being taken; and the end of the football match.

In response to receiving, at S603, event data from reaction video sever 130, user device 160 determines, at S605, the time, in this example to a precision of one millisecond, that internal clock 169 measures when the event is rendered by user device 160. The time measured by internal clock 169 when user device 160 renders the event is determined using the UTC timestamp included within the event data received from reaction video server 130 and the temporal offset calculated at S519.

Next, user device 160 looks up, at S607, data from a table stored in storage 184. For each type of event associated with the live stream, the table stores a relative starting time and a relative ending time. Adding the relative starting time to the time determined at S605 defines the start of a temporal window, and adding the relative ending time to the time determined at S605 defines the end of the temporal window. For example, if the live stream of video content corresponds to a live broadcast of a football match, a goal being scored may correspond to a relative starting time of minus five seconds and a relative ending time of ten seconds, thereby defining a temporal window with a duration of fifteen seconds, in which the goal being scored is rendered by user device 160 five seconds after the start of the temporal window.

After user device 160 measures, using internal clock 169, that the temporal window defined at S607 has passed, user device 160 copies, at S609, a portion of reaction video data captured by camera 177 during the temporal window defined at S507. User device 160 saves the copy in working memory 185. Unlike the original portion of reaction video data, the copy will not be deleted after a prescribed period of time elapses from the time at which the portion of reaction video data is recorded.

User device 160 further copies, at S611, a portion of live stream video data that was rendered during the temporal window defined at S607. User device 160 saves the copy in working memory 185. Unlike the original portion of live stream video data, the copy will not be deleted after a prescribed period of time elapses from the time at which the portion of reaction video data is recorded. The saved copy of reaction video data and the saved copy of live stream video data are such that if the two copies are rendered simultaneously, the reaction of user 190 will be synchronised with the event, with the maximum temporal separation between rendering of the reaction and rendering of the event being approximately one millisecond.

After user device 160 saves the copy of live stream data at S611, user device 160 provides user 190 with a list of options including "delete reaction", "save reaction", and "share reaction". In response to user 190 selecting "delete reaction", user device 160 discards, at S615, the copies of reaction video data and live stream video data stored at S609 and S611. In response to user 190 selecting "save reaction" or "share reaction", user device 160 generates, at S617, a composite reaction video using the saved copy of reaction video data and the saved copy of live stream video data. A composite reaction video is a single video file in which live stream video content and reaction video content are presented together, for example the reaction video content being presented adjacent to the live stream video content, or a "video in video" format in which the reaction video content is presented in a small frame embedded within a larger frame, the live stream video content being presented within the larger frame.

User device 160 sends, at S619, the composite reaction video to reaction video server 130. Reaction video server 130 stores, at S621, the composite reaction video, along with user data corresponding to user 190, in storage 145. User 190 is able to view the composite reaction video at a later time using a user device, for example user device 160. In the case of user 190 selecting "share reaction", reaction video server 130 sends messages to user devices approved by user 190 indicating that a composite reaction video has been generated, and containing further information such as the title of the live stream of video content, for example the title of the live stream of video content, and data indicating the network location of the composite reaction video (for example a URL).

In the example routine of Figure 6, user device 160 generates a composite reaction video using reaction video data captured by camera 177, and sends the composite reaction video to reaction video server 130. This method is advantageous in cases where a large number of users having accounts with the operator of the reaction video service simultaneously view the same live stream of video content, because according to this method the majority of the data processing operations are performed by user devices, thereby minimising the processing demand experienced by reaction video server 130. The composite reaction video generated by the routine of Figure 6 contains reaction video data recorded exclusively by user device 160.

Figure 7 shows an alternative routine executed by the data processing system of Figure 1 in response to an event in a live stream of video data. In contrast with the routine of Figure 6, in this example, it is necessary that reaction video server 130 receives the live stream data broadcasted by streaming server 110. Steps S701 to S715 of Figure 7 are equivalent to steps S601 to S615 of Figure 6, but in the routine of Figure 7, the step of copying live stream video data is omitted by user device 160. In contrast with the routine of Figure 6, in response to user 190 selecting "save reaction" or "share reaction", user device 160 sends reaction video data, at S717, to reaction video server 130. Reaction video server stores, at S719, the reaction video data, along with user data corresponding to user 190, in storage 145.

Reaction video server 130 generates, at S721, a composite reaction video using the reaction video data received from user device 160 and the live stream data received from streaming server 110 at S509. Optionally, user device 160 may be associated with one or more additional user devices, and each of the one or more additional user devices may also send reaction video data to reaction video server 130. In this case, reaction video server 130 may generate a composite reaction video using the live stream data received from streaming server 110 at S509 and reaction video data received from one or more user devices. Reaction video server stores, at S723, the composite reaction video, along with user data corresponding to user 190, in storage 145.

During the live streaming process, it is possible that the connection between streaming server 110 and user device 160 will be interrupted, or that the rate at which data is transferred between streaming server 110 and user device 160 will otherwise be temporarily reduced. In some cases, the buffered live stream video data will not be sufficient for rendering of live stream video data to continue, and the rendering will pause at the beginning of a segment. The routine executed by user device 160 in such a case depends on whether user device 160 is configured in "catch-up" mode or not. If user device 160 is configured in catch-up mode and the rendering of live stream data is paused due to a reduction in data transfer rate, user device 160 will skip segments of live stream video data before resuming rendering, so that the rendering of live stream video data is not delayed. If user device 160 is not configured in catch-up mode and the rendering of live stream data is paused due to a reduction in data transfer rate, user device 160 will continue to buffer segments of live stream video data, before resuming rendering the live stream video data with a delay.

As shown in Figure 8, in response to the connection between streaming service 110 and user device 160 being interrupted at S801, user device 160 pauses, at S803, rendering of live stream video data. If user device 160 is not configured in catch-up mode, user device 160 continues to buffer, at S805, live stream video data. When a prescribed number of data of segments have been buffered, user device 160 resumes rendering, at S807, live stream video data. The prescribed number of segments may again depend on one or more of: the data size of each segment; the duration of video content that each segment corresponds to; a data transfer rate measured by user device 160; and data received by user device 160 indicating a data transfer rate. User device 160 calculates, at S809, a new temporal offset between the broadcasting and the rendering of the live stream of video data. The new temporal offset is calculated as the difference between the UTC time at which streaming server 110 broadcasts the paused segment of live stream video data, and the time recorded by internal clock when user device 160 starts rendering the paused segment of live stream video data. The time at which streaming server 110 broadcasts the paused segment of live stream video data is calculated using the metadata received from streaming server 110, including the UTC timestamp received at S509, the data indicating the duration of video content that each segment of live stream video data corresponds to, and the metadata indicating the order of the segments in the live stream.

If user device 160 is configured in catch-up mode, user device 160 skips, at S811, segments of live stream video data. By skipping segments of live stream video data, user device 160 refrains from downloading one or more consecutive segments of live stream video data. When the data transfer rate becomes sufficiently high to allow continuous rendering of live stream video data to resume, user device 160 resumes buffering segments of live stream video data, starting with the first segment that is broadcast after the data transfer rate becomes sufficiently high. When a prescribed number of data of segments have been buffered, user device 160 resumes rendering, at S813, live stream video data. The prescribed number of segments may again depend on one or more of: the data size of each segment; the duration of video content that each segment corresponds to; a data transfer rate measured by user device 160; and data received by user device 160 indicating a data transfer rate. User device 160 calculates, at S815, a new temporal offset between the broadcasting and the rendering of the live stream of video data. The new temporal offset is calculated as the difference between the UTC time at which streaming server 110 broadcasts the first segment that is broadcast after the data transfer rate becomes sufficiently high, and the time recorded by internal clock when user device 160 starts rendering the first segment that is broadcast after the data transfer rate becomes sufficiently high. Catch-up mode is advantageous for cases in which a significant delay in rendering of live stream video data is unacceptable. One example in which a significant delay in rendering of live stream video data is unacceptable is discussed with reference to Figure 11.

In the system of Figure 1, camera 177 is a component of user device 160. In other embodiments of the data processing system, the user device being used to render the live stream video data is a first user device and the camera used to capture reaction video data is a component of a second user device. In some cases, the second user device is connected to the first user device by a wireless connection method. For example, the system of Figure 9 has similar components to those of Figure 1, and is similarly adapted to execute the routines of Figures 5, 6, and 7. However, in the system of Figure 9, camera 977 is a peripheral webcam connected to user device 960 via a Bluetooth connection. In this case, there is a delay between time at which reaction video data is captured by camera 977 and the time at which the reaction video data is recorded by user device 960. User device 960 receives reaction metadata associated with the reaction video data received from camera 977. In order to copy, at S609 or S709, a portion of reaction video data captured by camera 977 during the temporal window defined at S607 or S707, user device 960 uses the reaction metadata to determine the portion of reaction video data that is captured by camera 977 during the temporal window defined at S607 or S707. In this example, reaction metadata includes data indicating the temporal delay between the capturing and recording of reaction video data. In another example, the second user device has an internal clock that is synchronised with the internal clock of the first user device (for example by both user clocks being synchronised to UTC), and the reaction metadata includes one or more UTC timestamps indicating times at which individual frames of reaction video data are captured by the camera, as measured by the internal clock of the second user device. In yet another example the second user device has an internal clock that is not synchronised with the internal clock of the first user device, and the reaction metadata includes one or more timestamps indicating times at which individual frames of reaction video data are captured by the camera, as measured by the internal clock of the second user device. In this case, the second user device further sends data to the first user device from which the timing offset between the internal clock of the second user device and the internal clock of the first user device can be determined. In some examples, the second user device records the reaction video data before sending the reaction video data to the first user device.

The system of Figure 10 also has similar components to those of Figure 1, and is similarly adapted to execute the routines of Figures 5, 6, and 7. However, in Figure

10, user 1090 operates control device 1040, which in this example is a mobile phone storing an application hosted by reaction video server 1030. Prior to the streaming server 1010 broadcasting live stream data, user 1090 associates control device 1040 with the user device 1060. In this example, associating control device 1040 with user device 1060 is achieved by user device 1060 presenting a Quick Response (QR) code, and user device 1040 scanning the QR code. The QR code contains data that allows control device 1040 to connect to MQTT broker 1050 and to reaction video server 1030. In response to an event in the live stream of video content, event server 1020 sends event data to user device 1060 through MQTT broker 1050, and also sends event data to control device 1040 through MQTT broker 1050. In response to control device 1040 receiving event data from event server 1020, control device 1040 provides user 1090 with a list of options including "delete reaction", "save reaction", and "share reaction". In response to user 1090 selecting "save reaction" or "share reaction", user device 1060 and reaction video server 1030 generate and save a composite reaction video in accordance with the routine of Figure 6 or Figure 7 (depending on the configuration of user device 1060 and reaction video server 1030). This embodiment is advantageous in situations where multiple users watch a live stream of video content using the same user device, for example in cases where the user device is in a public location such as a cafe or a public house. In the case that the user device is located in a cafe or a public house, control device 1040 may be operated by a member of staff of the cafe or public house.

The system of Figure 11 also has similar components to those of Figure 1, and is similarly adapted to execute the routines of Figures 5, 6, and 7. However, in Figure

11, user devices are grouped into virtual conference rooms. In this example, virtual conference room 1170 contains three user devices. Generally, a virtual conference room may contain any number of user devices. Each user device in Figure 11 is connected to network 1100, and is further connected to MQTT broker 1150. The user devices in a virtual conference room may be connected via an additional network such as a Local Area Network (LAN), but the user devices in a virtual conference room are not necessarily located in the vicinity of each other. The users of the user devices in virtual conference room 1170 are in communication with one another, for example by video conference. Each user device in conference room 1170 is configured in catch-up mode, as discussed above with reference to Figure 8, and each user device has an internal clock synchronised with UTC time. Each user device in virtual conference room 1170 is used to view the same live stream of video data. In response to live stream data being broadcasted by streaming server 1110, each user device in virtual conference room buffers segments of live stream video data. Each of the user devices in virtual conference room 1170 starts rendering live stream video data at the same time. In this example, each user device in virtual conference room 1170 starts rendering live stream data five seconds after the live stream data is first broadcasted, where the time at which the live stream data is first broadcasted is indicated by a UTC timestamp within the live stream metadata. Due to the fact that each user device is configured in catch-up mode, the rendering of live stream video data remains synchronised throughout the duration of the live stream of video content. However, rendering of live stream video data is not necessarily synchronised between user devices that are not in the same virtual conference room as each other.

Each user device in Figure 11 is adapted to send reaction video data to reaction video server 1130, in accordance with the method outlined by Figure 7. Reaction video server 1130 is adapted to generate composite reaction videos using live stream video data received from streaming server 1110 and reaction video data received from one or more user devices. The method of the present invention allows for composite reaction videos to be generated using reaction video data from one or more user devices that are not in the same virtual conference room as each other.

The system of Figure 12 is used to generate reaction videos in cases for which an event service is not necessarily available to provide event data corresponding to key moments in a live stream of video content. Accordingly, the system of Figure 12 does not include an event server, and events in the live stream of video content are instead identified by responses of subscribers to a live stream of reaction video data generated by a single "producer" who views the original live stream of video content.

The system of Figure 12 includes streaming server 1210, reaction video server 1230, reaction streaming server 1240, producer device 1260 (operated by producer 1290) and viewer devices 1280, of which three viewer devices are shown. Viewer devices 1280 are associated with producer device 1260, where associating a viewer device with a producer device is initiated by the user of the viewer device subscribing to a social media channel associated with producer 1290. In this example, the social media channel associated with producer 1290 is hosted by reaction video server 1230. Producer 1290 views, using producer device 1260, a live stream of video content broadcasted by streaming server 1210. Producer device 1260 includes a camera, and while producer 1290 views the live stream of video content, producer device 1260 captures and records reaction video data. Producer device 1260 automatically sends the reaction video data to reaction streaming server 1240, and reaction streaming server 1240 broadcasts the reaction video data over network 1200 as a live stream of reaction video content. Along with the reaction video data, reaction streaming server 1240 broadcasts metadata associated with the reaction video data, the metadata including at least one timestamp indicating the time at which the reaction video data is first broadcasted. Each of the viewer devices 1280 receives the reaction video data from the reaction streaming server, buffers the reaction video data, and then renders the reaction video data for viewing. Each of the viewer devices 1280 calculates a temporal offset between broadcasting and rendering of the reaction video data, using an equivalent method to that described above with reference to Figure 5.

Whilst viewing the reaction video content, the user of each of the viewer devices 1280 may provide viewer responses to the live stream of reaction video content. Examples of viewer responses to a live stream of reaction video content include selecting "like" or "dislike" options, or submitting a comment. In response to one of the viewer devices 1280 receiving a viewer response whilst rendering a specific moment of reaction video data, the viewer device calculates, using the calculated temporal offset, the time at which the specific moment of reaction video data was broadcasted, and sends a message to reaction video server 1230 indicating the type of viewer response, and further including a timestamp indicating the time at which the corresponding specific moment of reaction video content was first broadcasted. Reaction video server 1230 sends a signal to producer device 1260 and to each of the viewer devices 1280, causing producer device 1260 and each of the viewer devices 1280 to render a visual representation of the viewer response. For example, if the viewer response is a comment, then producer device 1260 and each of the viewer devices 1280 may render the comment in a comments box. If the viewer response is a "like", then producer device 1260 and each viewer device may render a corresponding icon, such as a "thumbs up" icon. In this way, producer 1290 and users of viewer devices 1280 are presented with a rolling feed of viewer responses.

Reaction video server 1230 may receive a large number of viewer responses associated with times in a particular interval. For example, if producer 1290 reacts in an amusing fashion to an event in the live stream of video content received from streaming server 1210, the reaction video server may receive a large number of viewer responses with timestamps corresponding to times shortly after the event. If the number of viewer responses exceeds a threshold number of viewer responses within a prescribed interval (for example, if a threshold number of viewer responses are received with timestamps falling within any ten second interval), reaction video server 1230 sends a signal to producer device 1260, causing producer device 1260 to save a copy of a portion of the reaction video data, the portion corresponding to a temporal window encompassing the interval during which the threshold number of viewer responses were received. The threshold number of viewer responses may depend on the total number of viewer devices. Similarly, if the number of viewer responses of a certain type (for example, "likes") exceeds a threshold number of viewer responses of that type within a prescribed interval, reaction video server 1230 sends a signal to producer device 1260, causing producer device 1250 to save a copy of a portion of the reaction video data corresponding to a temporal window encompassing the interval during which the threshold number of viewer responses of that type were received. In each of these cases, the temporal window is defined by a relative starting time and a relative ending time with respect to the start of the interval with which the threshold number of viewer responses is associated. The relative starting time and relative ending time may be different for different types of viewer response. At the end of the live streaming of reaction video content, producer device 1260 sends the copied portions of reaction video data to reaction video server 1230, and the copied portions of reaction video data are then available to be viewed via the social media channel associated with producer 1290. Optionally, producer device 1260 may generate composite reaction videos, as described above with reference to Figure 6, using the live stream data received from streaming server 1210 and the reaction video data copied and saved by producer device 1260 during the live streaming process.

In the example of Figure 12, producer 1290 views a live stream of video content that is broadcast by streaming server 1210. In other examples, producer 1290 may view other types of video presentation, which may not necessarily be broadcast as live streams. In such examples, no streaming server is necessary. In some examples, the producer device deletes redundant reaction video data that has not been copied and saved after a prescribed period of time. In some examples, the reaction video data is copied and saved by the reaction streaming server instead of the producer device. In a simplified embodiment, no temporal offset is calculated, and the threshold number of viewer responses is measured with reference to the times at which the viewer responses are received by the reaction server. This simplified embodiment is only suitable for cases in which the delay between broadcasting and rendering of reaction video data is guaranteed to be small.

The above embodiments are to be understood as illustrative examples of the invention. Further embodiments of the invention are envisaged.

In some embodiments, data processing system has different server configurations to those described in Figures 1, 9, 10, 11 or 12. For example, in some embodiments the reaction video server and the event server are part of the same server. In other embodiments, the functionality of one or more of the servers may be performed by multiple servers. In particular, multiple streaming servers or multiple reaction video servers may be used. Other server configurations are envisaged.

In some embodiments, the MQTT protocol is not used to send messages between the event sever, the reaction video server, and user devices, and accordingly no MQTT broker is included in the data processing system. For example, in some embodiments the WebSocket protocol is used to send messages between the event server, the reaction video server, and user devices. In some embodiments, the routines executed by the data processing system are different from the routines described in Figures 5, 6, and 7. For example, in some examples, user devices used to generate reaction videos do not delete live stream video data or reaction video data after a prescribed period of time. In these examples, the steps of copying reaction video data and copying live stream video data can be executed after the user has finished viewing the live stream. This method is advantageous in cases where the processing power of a user device is limited, or where the bandwidth available to a user device is limited.

In some embodiments, the user device being used to render live stream video data does not request live stream data from the streaming server in response to a user selecting a live stream of video content from a list provided by an application. For example, in one embodiment a user device requests live stream data from the streaming server by accessing a website hosted by the streaming server. In this embodiment, the user device connects to the reaction video server via a plugin on the website hosted by the streaming server.

In some embodiments, the streaming server uses adaptive streaming in which the segments of live stream video data do not correspond to the same duration of video content, and instead the duration is automatically adjusted during the streaming process to optimise the streaming process in response to fluctuating rates of data transfer. In this case, the metadata associated with each segment of video data may contain a timestamp indicating the broadcasting time of the segment, and the user device uses these timestamps to calculate the temporal offset between the broadcasting and rendering of live stream video data.

Embodiments in which multiple user devices are grouped in a virtual conference room may use methods other than the method described with reference to Figure 11 to ensure that live stream video data is rendered synchronously between the user devices in the virtual conference room.

Figure 13 shows an example of an alternative data processing system used to generate reaction videos in response to an event in a live stream of video data. In the system of Figure 13, a broadcaster system 1310 transmits a video stream to a select group of users 1340 whose reactions to the video stream are to be monitored via a reaction video system 1320. The reaction video system 1320 converts the video stream received from the broadcaster system 1310 to a real time protocol and multicasts the converted video stream to the select group of users, so that the viewing of the converted video stream by the select group of users 1340 is substantially synchronised. Reaction videos are supplied by each of the select group of users 1340 to the reaction video system 1320 using a real time protocol, and the reaction video system merges the reaction videos into a composite video and outputs the composite video to the broadcaster system 1310. This arrangement provides the composite reaction video to the broadcaster system 1310 with low latency.

As shown in Figure 13, the broadcaster system 1310 includes a media stream generator 1312, which may generate a live stream, which in this embodiment is transmitted to the reaction video system 1320 using the HLS (HTTP Live Streaming) protocol. The broadcaster system 1310 also includes a reaction stream receiver 1314 which receives the composite reaction video stream from the reaction video system. It will be appreciates that this composite reaction video stream could then be broadcast by the broadcaster.

The reaction video system 1320 implements a WebRTC (Web Real-Time Communication) node to allow real-time communication with the select group of users 1340, who are registered subscribers to the reaction video system 1320. In this example, the HLS-encoded video stream from the broadcaster system is processed by a video muxer/recoder 1322 to multiplex HLS video stream and recode the HLS video stream in accordance with the RTP protocol. The RTP-encoded video data is then input to a Selective Forwarding Unit (SFU) 1324 in accordance the WebRTC software, which uses a media RTP streamer 1324 to multicast the RTP-encoded video data to the selected group of users 1340.

Each of the selected group of users 1340 has an associated user device 1342 which forwards the data to a monitor 1344 for display. Each user also has a camera 1346 which images the viewers of the monitor, and sends a reaction video stream, via the user device 1342, to a reaction RTP server 1330 of the SFU 1324 of the reaction video system 1320. The Reaction RTP receiver 1330 forwards the RTP encoded data for processing by an RTP demultiplexer and RTMP listener 1332. The demultiplexed reaction video streams are then processed by a video merger 1338 that merges the reaction video streams, together with the RTP-encoded data output by the video muxer/recoder 1322, to generate a composite video stream. In this embodiment, the composite video stream has a plurality of panes, with the number of panes corresponding to one more than the number of reaction video streams. Each reaction video stream is shown in a corresponding pane, with the final pane showing the original RTP-encoded video stream from the video muxer/recoder 1322. In this embodiment, a delay compensator 1326 can introduce a small delay to the RTP-encoded data from the video muxer/recoder 1322 before input to the video merger 1338 to improve synchronism with the reaction video streams.

In this embodiment, the RTP demultiplexer and RTMP listener 1332 converts incoming RTP streams from SFU 1324 to RTMP to which it actively listens. In this way, the RTP demultiplexer and RTMP listener 1332 waits for incoming data rather than polling for data. The RTP demultiplexer and RTMP listener 1332 also outputs data from the reaction video streams to a UDP traffic monitor 1336, which detects any interruptions in the reaction video stream. If the UDP traffic monitor 1336 detects that the reaction video stream from a viewer device has been interrupted, or degraded below an acceptable level, then the UDP traffic monitor 1336 sends a control signal to the video merger 1338 that may either remove the pane for that reaction video stream or replace that reaction video stream with a default video stream (e.g. a static image indicating that there is a temporary interruption).

The composite video stream output by the video merger 1338 is then processed by encoder/multiplexer/segmenter 1339 to generate a HLS video stream that is output to the reaction stream receiver 1314 of the broadcaster system.

The CPUs in Figures 2, 3, and 4 are typical examples of processor circuitry which, when programmed with suitable instructions, are operable to perform routines in accordance with the present invention. However, it will be appreciated that other examples of processor circuitry could be used for some or all of the steps described. For example, one or more graphics processing units (GPUs) may be used for the rendering and recording operations performed by the user device in Figure 6. Furthermore, the processor circuitry used to perform the routines may include multiple CPUs located included within multiple devices, and each CPU may have multiple processor cores. Methods described herein may be implemented by way of computer program code that is storable in a memory. Examples of memory suitable for storing computer program code are the memory components described with reference to Figures 2, 3, and 4, but it will be appreciated that the memory may be any non-transitory computer- readable media able to contain, store, or maintain programs and data for use by or in connection with an instruction execution system. Such media may be any physical media such as, for example, electronic, magnetic, optical, electromagnetic, or semiconductor media. More specific examples of suitable media include, but are not limited to, a hard drive, RAM, ROM, erasable programmable read-only memory, or a portable disc. Elements of the memory used to store program code may be volatile or non- volatile and may include additional functionality, for example to minimise latency.

It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.

Claims

A computer-implemented method of generating a reaction video stream showing reaction to stream of video content, the method comprising:

receiving a content video stream;

converting the content video stream to a real-time protocol to generate a converted video stream;

multicasting the converted video stream to one or more viewer devices; receiving, from each of the one or more viewer devices, a viewer video stream encoded with a real-time protocol;

processing the converted video stream and each viewer video stream to generate a reaction video stream having a plurality of panes shown in parallel, one of the plurality of panes showing a sequence of images corresponding to the converted video stream and each other pane showing a sequence of images corresponding to a respective viewer video stream; and outputting the reaction video stream.

A computer- implemented method according to claim 1, wherein the content video stream is received from a broadcaster and the reaction video stream is output to the broadcaster.

A computer-implemented method according to claim 1 or claim 2, wherein the method is performed by a server apparatus.

4. A computer- implemented method according to claim 3, wherein the server implements a WebRTC node and the viewer devices are registered as WebRTC clients. 5. A computer- implemented method according to any preceding claim, further comprising:

monitoring UDP traffic statistics for each viewer video stream;

identifying from the UDP traffic statistics an interruption to one viewer video stream;

controlling the processing of the converted video stream and each viewer video stream in response to the identified interruption.

A computer-implemented method according to claim 5, wherein the controlling comprises one of:

removing the pane in the reaction video stream corresponding to the interrupted viewer video stream; and

showing a default sequence of images in the pane corresponding to the interrupted viewer video stream.

A data processing comprising:

at least one processor; and

memory storing computer-implementable instructions that, when executed by the at least one processor, perform a method as claimed in any preceding claim. A computer- implemented method of generating a reaction video in response to an event in a live stream of video content, the method comprising apparatus having an internal clock:

receiving live stream data, the live stream data comprising one or more segments of live stream video data and associated metadata, wherein the metadata comprises at least one timestamp;

buffering the live stream video data;

recording reaction video data captured by a camera;

rendering the live stream video data;

calculating a temporal offset using the at least one timestamp and a timing provided by the internal clock;

receiving event data indicating a time related to the event;

determining, using the event data and the calculated temporal offset, the time at which the event is rendered; and

saving a portion of reaction video data corresponding to an interval spanning a temporal window encompassing the time at which the event is rendered.

The method of claim 8, further comprising:

receiving reaction metadata in conjunction with the reaction video data, wherein the reaction metadata comprises timing information; and determining a temporal delay between capturing and recording of the reaction video data using the timing information. The method of claim 8 or 9, further comprising discarding unsaved reaction video data after a prescribed period of time has elapsed from the time at which the unsaved reaction video data was recorded.

The method of any of claims 8 to 10, wherein:

the event data further comprises information corresponding to the type of event; and

the temporal window depends on the information corresponding to the type of event.

The method of any of claims 8 to 11, wherein each segment of live stream video data has the same duration.

The method of any of claims 8 to 12, wherein each segment of live stream video data has an associated timestamp.

The method of any of claims 8 to 13, further comprising:

saving a portion of live stream video data corresponding to the saved portion of reaction video data; and

generating a composite reaction video using data comprising the saved portion of live stream video data and the corresponding saved portion of reaction video data.

15. A data processing system for generating a reaction video in response to an event in a live stream of video content, the system comprising one or more user devices and a server,

wherein each of the one or more user devices comprises a respective internal clock and is communicatively coupled to the server and is operable to:

receive live stream data, the live stream data comprising one or more segments of live stream video data and associated metadata, wherein the metadata comprises at least one timestamp;

buffer the live stream video data;

record reaction video data captured by a camera;

render the live stream video data;

calculate a temporal offset using the at least one timestamp and a timing from the internal clock for that user device;

receive event data indicating a time related to the event;

determine, using the event data and the calculated temporal offset, the time at which the event is rendered; and

save a portion of reaction video data corresponding to an interval spanning a temporal window encompassing the time at which the event is rendered,

and wherein the server is adapted to receive reaction video data from the one or more user devices.

16. The data processing system of claim 15, wherein at least one of the one or more user devices is further operable to:

receive reaction metadata in conjunction with the reaction video data, wherein the reaction metadata comprises timing information; and

determine a temporal delay between capturing and recording of the reaction video data using the timing information.

The data processing system of claim 15 or 16, wherein at least one of the one or more user devices is further operable to discard unsaved reaction video data after a prescribed period of time has elapsed from the time at which the unsaved reaction video data was recorded.

The data processing system of any of claims 15 to 17, wherein:

The data processing system of any of claims 15 to 18, wherein at least one of the one or more user devices is operable to:

save a portion of live stream video data corresponding to the saved portion of reaction video data; and

generate a composite reaction video using data comprising the saved portion of live stream video data and the corresponding saved portion of reaction video data.

20. The data processing system of any of claims 15 to 19, wherein at least one of the one or more user devices is further operable to send a portion of reaction video data to the server, and the server is further operable to: save a portion of live stream video data corresponding to a portion of reaction video data received from a user device; and

generate a composite reaction video using data comprising the saved portion of live stream video data and at least one corresponding portion of reaction video data.

21. The data processing system of any of claims 15 to 20, further comprising one or more control devices, each control device being communicatively coupled to the server and associated with one of the one or more user devices, and wherein each control device is operable to:

receive the event data; and

in response to receiving the event data, prompt a user to select an action from a list of actions, wherein one of the actions in the list of actions causes the user device and server to generate and save a composite reaction video.

22. The data processing system of any of claims 15 to 21, wherein the data processing system comprises a plurality of user devices, and at least two of the plurality of user devices are grouped such that the grouped user devices render the live stream video data synchronously.

23. Apparatus for generating a reaction video, the apparatus comprising: an internal clock;

processor circuitry; and

memory storing instructions which, when executed by the processor circuitry, enable the apparatus to:

buffer the live stream video data;

record reaction video data captured by a camera;

render the live stream video data;

calculate a temporal offset using the at least one timestamp and a timing provided by the internal clock;

receive event data indicating a time related to the event;

save a portion of reaction video data corresponding to an interval spanning a temporal window encompassing the time at which the event is rendered.

A server for use in the system of any of claims 15 to 22, the server adapted to receive reaction video data from the one or more user devices.

25. A control device for use in the system of claim 21.

26. A computer program comprising instructions which, when executed by processor circuitry, causes a user device to perform the method of any of claims 8 to 14. 27. A computer-readable storage medium comprising instructions which, when executed by a computer, cause the computer to carry out the method of any of claims 8 to 14.

28. A computer- implemented method of generating a reaction video in response to an event in a video presentation, the method comprising:

rendering video data;

recording reaction video data captured by a camera;

broadcasting the reaction video data as a live stream;

receiving responses from viewers of the live stream of reaction video data, wherein each response is associated with a specific time relative to the live stream of reaction video data; and

in response to a threshold number of responses being received that are associated with times within a prescribed interval, saving a portion of reaction video data corresponding to an interval spanning a temporal window encompassing the interval within which the threshold number of responses is received.