[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2016080873A1 - Method and apparatus for video processing based on physiological data - Google Patents

Method and apparatus for video processing based on physiological data Download PDF

Info

Publication number
WO2016080873A1
WO2016080873A1 PCT/SE2014/051368 SE2014051368W WO2016080873A1 WO 2016080873 A1 WO2016080873 A1 WO 2016080873A1 SE 2014051368 W SE2014051368 W SE 2014051368W WO 2016080873 A1 WO2016080873 A1 WO 2016080873A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
video data
interest
video
segment
Prior art date
Application number
PCT/SE2014/051368
Other languages
French (fr)
Inventor
Matthew John LAWRENSON
Julian Charles Nolan
Till BURKERT
Original Assignee
Telefonaktiebolaget L M Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget L M Ericsson (Publ) filed Critical Telefonaktiebolaget L M Ericsson (Publ)
Priority to PCT/SE2014/051368 priority Critical patent/WO2016080873A1/en
Publication of WO2016080873A1 publication Critical patent/WO2016080873A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • G11B27/32Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording on separate auxiliary tracks of the same or an auxiliary record carrier
    • G11B27/327Table of contents

Definitions

  • the present invention generally relates to video processing, and particularly relates to video processing based on physiological data acquired for a human participant in an activity represented in video data.
  • the events captured in a typical video file often include portions of less interest than others. For example, a thirty-minute recording of a mountain biking excursion may capture a number of relatively brief periods of exciting downhill runs separated by longer interludes of relatively less interesting or less intense activity.
  • a video operation bearing on video data is controlled in response to determining interest levels of a human participant engaged in an activity represented in the video data.
  • Video operation control in this manner relies on the processing of physiological data captured for the human participant while engaged in the activity represented in the video and infers the participant's interest levels from data characteristics determined via the processing.
  • Example operational controls include, but are not limited, to dynamically enabling and disabling video capture, dynamically controlling video capture parameters, and dynamically controlling one or more editing functions applied to captured video data.
  • a method of controlling a video operation bearing on video data responsive to determining interest levels of a human participant engaged in an activity represented in the video data includes obtaining physiological data acquired for a human participant while engaged in an activity represented in video data. The method further includes quantifying interest levels of the human participant with respect to successive segments of the video data, based on characterizing the physiological data associated with each segment and thereby obtaining a quantified interest level for the segment. Still further, the method includes controlling the video operation, in dependence on the quantified interest levels associated with the segments.
  • an apparatus configured to control a video operation bearing on video data, in response to determining interest levels of a human participant engaged in an activity represented in the video data.
  • the example apparatus includes communication interface circuitry configured to obtain the video data and further configured to obtain physiological data acquired for the human participant while engaged in the activity.
  • the apparatus further includes a processing circuit operatively associated with the communication interface circuitry and configured to quantify interest levels of the human participant with respect to successive segments of the video data. Quantification is based on the processing circuit characterizing the physiological data associated with each segment and thereby obtaining a quantified interest level for the segment.
  • the processing circuit is configured to control the video operation, in dependence on the quantified interest levels associated with the segments.
  • Fig. 1 is a block diagram of one embodiment of an apparatus that is configured to control a video operation on video data, in dependence on quantifying the interest levels of a human participant engaged in an activity represented in the video data.
  • Fig. 2 is a diagram of one embodiment of a temporal association scheme, whereby quantified interest levels are determined for successive segments of video data according to corresponding physiological data acquired for a human participant engaged in an activity represented in the video data.
  • Fig. 3 is a diagram of one embodiment of a spatial association scheme, whereby particularized regions of interest are determined within given video frames in video data, based according to corresponding gaze data acquired for a human participant engaged in an activity represented in the video data.
  • Fig. 4 is a block diagram of example details for the apparatus introduced in Fig. 1, according to some embodiments.
  • Fig. 5 is a block diagram of example physiological sensor devices, any one or more of which are used in some embodiments herein, to obtain physiological data from a human participant engaged in an activity represented in corresponding video data.
  • Fig. 6 is a diagram of example contents of stored physiological data, e.g., data files, in one or more embodiments, for use in quantifying the interest levels of a human participant in corresponding segments of video data.
  • Fig. 7 is a block diagram of example details for the apparatus introduced in Fig. 1, according to some embodiments, wherein the apparatus is configured to receive one or more real-time physiological data sensor streams and/or video data streams.
  • Fig. 8 is a logic flow diagram of one embodiment of a method of processing, whereby a video operation is controlled in dependence on quantified interest levels of a human participant with respect to segments of video data capturing an activity involving the human participant.
  • Fig. 9 is a block diagram of one embodiment of functional modules, such as may be implemented in a processing circuit, for carrying out the video operation automation taught herein.
  • Fig. 10 is a logic flow diagram of another embodiment of a method of processing, whereby a video operation is controlled in dependence on quantified interest levels of a human participant with respect to segments of video data capturing an activity involving the human participant.
  • Fig. 1 illustrates an example apparatus 10 that is configured to control a video operation bearing on video data 12, responsive to determining interest levels of a human participant engaged in an activity represented in the video data 12. More particularly, the apparatus 10 obtains physiological data 14, from which it determines quantified interest levels 18, e.g., a quantified interest level 18-1 corresponding to a first segment of the video data 12, a quantified interest level 18-2 corresponding to a second segment of the video data 12, and so on. As seen in Fig. 2, successive portions 14-1, 14-2, and so on of the physiological data 14, are associated with respective segments of the video data 12. Thus, the apparatus 10 evaluates the physiological data 14-1 to obtain the quantified interest level 18-1 for the first segment of video data 12, evaluates the physiological data 14-2 to obtain the quantified interest level 18-2 for the second segment of video data 12, and so on.
  • quantified interest levels 18 e.g., a quantified interest level 18-1 corresponding to a first segment of the video data 12, a quantified interest level 18-2 corresponding to a second segment
  • the apparatus 10 in at least one embodiment also obtains gaze data 20, which provides indications of the spatial focus of the human participant while engaged the activity represented in the video data 12.
  • the gaze data 20 comprises, for example, one or more types of data values that indicate the spatial focus of the human participant in terms of the field of view associated with capture of the video data 12.
  • the gaze data 20 comprises gaze angle or direction data that indicates regions of interest 22 within the spatial field of the video frames comprising the video data 12.
  • the apparatus 10 outputs a modified video signal 24 and/or acquisition control signaling 26.
  • a modified video signal 24 reflects the results of dynamically controlling one or more video operations on the video data 12, in dependence on the quantified interest levels 18.
  • acquisition control signaling 26 controls one or more image capture parameters— e.g., enabling or disabling capture, changing camera focus settings, changing capture quality settings, etc.
  • the apparatus 10 includes communication interface circuitry 30 configured to obtain the video data 12 and further configured to obtain physiological data 14 acquired for the human participant while engaged in the activity.
  • the apparatus 10 further includes a processing circuit 32 operatively associated with the communication interface circuitry 30.
  • the processing circuit 32 is configured to quantify interest levels of the human participant with respect to successive segments of the video data 12, based on characterizing the physiological data 14 associated with each segment and thereby obtaining a quantified interest level 18 for the segment, and to control the video operation, in dependence on the quantified interest levels 18 associated with the segments.
  • the "video operation” comprises one or more operations associated with capturing the video data 12 and/or with post-capture processing of the video data 12.
  • the processing circuit 32 implements a quantifier circuit 34 that generates the quantified interest levels 18, and a video operation control circuit 36 that controls the video operation in dependence on the quantified interest levels 18.
  • the apparatus 10 further comprises storage 40.
  • the storage 40 comprises any one or more of magnetic disk storage, solid-state disk storage, FLASH memory, EEPROM, Static RAM or SRAM, and Dynamic RAM or DRAM.
  • the storage 40 may comprise non- volatile memory, volatile memory, or any mix thereof.
  • the storage 40 shall be understood as comprising one or more types of computer-readable medium that provide non-transitory storage for any one or more of a computer program 42, configuration data 44, and stored characteristics 46.
  • “non-transitory” storage does not necessarily mean permanent or unchanging storage, and includes copies of data and/or program instructions held in working memory for dynamic processing, but the term “non-transitory” does exclude mere propagating signals.
  • the processing circuit 32 comprises digital processing circuitry that is specially adapted or otherwise configured to carry out the processing and associated algorithms taught herein, based on its execution of computer programs comprising the computer program 42.
  • digital processing circuitry comprises, for example, one or more microprocessors, microcontrollers, Digital Signal Processors or DSPs, Field Programmable Gate Arrays or FPGAs, Application Specific Integrated Circuits or ASICs, or Complex Programmable Logic Devices or CPLDs.
  • the processing circuit 32 may comprise fixed circuitry, programmed circuitry, or a mix thereof.
  • the apparatus 10 comprises a Personal Computer or PC
  • the processing circuit 32 comprises a CPU or other processing circuit of the PC
  • the storage 40 comprises the memory and/or disk storage of the PC.
  • the teachings herein are not limited to such implementations, and in other embodiments the apparatus 10 comprises a dedicated device, such as computer peripheral or other "connected" device, or is integrated with the image acquisition system used to capture the video data 12.
  • the communication interface circuitry 30 is configured to communicatively couple the processing circuit 32 to one or more data stores 50.
  • the data store 50 may be local to the apparatus 10, e.g., part of the overall storage of the PC or other computer system implementing the apparatus 10, but this arrangement is non-limiting.
  • the data store 50 in an example embodiment contains one or more data files 52 and 56 containing the video data 12 and the physiological data 14.
  • the processing circuit 32 is configured to control the communication interface circuitry 30 obtain the physiological data 14 from the one or more data files 52, 56.
  • the physiological data 14 is captured in a file or files separate from the video data 12.
  • sensor equipment generates the data file or files 56 based on capturing the physiological data 14 from one or more physiological sensors associated with the human participant, while the human participant is engaged in the activity in question.
  • a separate image acquisition system captures the video data 12.
  • there may be more than one file 56 with each file corresponding to a different type of physiological data, in cases where more than one type of physiological sensor is used to capture the physiological data— where such multi-type sensing offers a potentially more robust mechanism for assessing the interest levels of the participant, at the expense of more sensing equipment, larger data sets and greater processing complexity.
  • multi- sensor data can be consolidated into a single file 56, for processing by the apparatus 10.
  • the physiological data 14, regardless of how many files are involved, may already be synchronized with the video data 12 in the data file 52.
  • a single data file may include both the video data 12 and the corresponding, synchronized physiological data 14.
  • the apparatus 10 is configured to synchronize the physiological data 14 with the video data 12, if the physiological data is not already synchronized, based on time stamps or other embedded temporal markers in the video data 12 and in the physiological data 14.
  • the camera 70 has a field of view 72 and captures video data 12 corresponding to that field of view.
  • the image acquisition apparatus 70 is, for example, a wearable digital video camera, such as a chest- or helmet- worn camera.
  • a supplemental video camera 80 for obtaining supplemental video data 82 that is concurrent with video capture by the image acquisition apparatus 70.
  • the supplemental video data 82 comprises, for example, facial expression data captured for the human participant while engaged in the activity being videoed.
  • the supplemental video data comprises pupillometry data captured for the human participant while engaged in the activity being videoed and/or the aforementioned gaze data 20.
  • the supplemental video camera 80 thus may be specialized camera mounted in such a way as to focus on the face or eyes of the human participant, to obtain further types of physiological data indicating the participant's interest levels—e.g., video data from which the participant's pupil responses and/or facial expressions can be determined. Additionally, or alternatively, the supplemental video camera 80 may be used to obtain supplemental video data that is preprocessed for the apparatus 10, or is processed by the apparatus 10, to derive gaze angle or direction values, indicating the direction or angle of the human participant's gaze within a measurement framework or coordinate system that maps to the field of view of the image acquisition apparatus 70 or, more particularly, to the spatial area represented by the video frames comprising the video data 12, so that regions of interest 22 are identifiable therein.
  • a measurement framework or coordinate system that maps to the field of view of the image acquisition apparatus 70 or, more particularly, to the spatial area represented by the video frames comprising the video data 12, so that regions of interest 22 are identifiable therein.
  • the human participant is further associated with a electro-encephalograph, EEG, device 90, which includes a number of EEG sensors 92 that are configured to couple to or otherwise contact the body of the human participant, to thereby provide EEG measurement signals to a processing circuit 94 during the time that the human participant is engaged in the activity in question.
  • the EEG device 90 further includes storage 96, for capturing and storing the EEG data 98 produced by the processing circuit 94.
  • the EEG data 98 is accessible to the apparatus 10, such as the data being transferrable to the storage 40 of the apparatus 10 or to the data store 50.
  • the storage 96 may be removable from the EEG device 90, or it may have a USB, or other communication interface available for reading out such data.
  • the human participant is associated with an audio recording device 100, which includes a microphone that is configured to provide an audio signal to a processing circuit 104 during the time that the human participant is engaged in the activity in question.
  • the audio recording device 100 further includes storage 106, for capturing and storing the audio data 108 produced by the processing circuit 104.
  • the audio data 108 is accessible to the apparatus 10, such as the data being transferrable to the storage 40 of the apparatus 10 or to the data store 50.
  • the storage 106 may be removable from the audio device 100, or it may have a USB, or other communication interface available for reading out such data.
  • the audio data 108 provides the apparatus 10 with yet another type of physiological data 14, in the sense that vocalizations by the human participant and other types or levels of sound recorded in conjunction with capturing the video data 12 provides the
  • apparatus 10 with further cues regarding the interest levels of the human participant.
  • the human participant is associated with a galvanic sensing device 110, which includes one or more skin conductivity sensors 112 that provide skin conductivity signals to a processing circuit 114 during the time that the human participant is engaged in the activity in question.
  • the galvanic sensing device 110 further includes
  • the skin conduction data 118 is accessible to the apparatus 10, such as the data being transferrable to the storage 40 of the apparatus 10 or to the data store 50.
  • the storage 116 may be removable from the galvanic sensing device 110, or it may have a USB, or other communication interface available for reading out such data.
  • the skin conduction data 118 provides the apparatus 10 with yet another type of physiological data 14, in the sense that skin conductivity of the human participant provides the apparatus 10 with further cues regarding the interest level of the human participant.
  • EEG sensing integrated into the helmet provides a convenient and complementary arrangement for obtain EEG data 98.
  • the wearing of a helmet also provides a convenient mechanism for wearing or mounting the supplemental video camera 80 shown in Fig. 5.
  • data may be obtained using sensors built into the glasses or other protective eyewear worn by the human participant.
  • Fig. 6 depicts an arrangement wherein the physiological data 14 is stored as one or more data files 56, and comprises any one or more of EEG data 98, audio data 108, and skin conduction data 118. It will be appreciated that the capture of such data may have been synchronized with capturing the video data 12, meaning that the data values in question are already synchronized time wise with the video data 12. In other cases, the data values in question are not already synchronized but they include one or more timestamps or other temporal information and the apparatus 10 uses this temporal information to logically align the physiological data 14 with the video data. Additionally, or alternatively, the apparatus 10 may know— e.g., via the configuration data 44 shown in Fig. 1— the sampling rate or period for the values comprising any given type of physiological data 14, and thus knows the overall timing of the data values therein from such information and, e.g., knowledge of the start timing of the data acquisition relative to the video capture.
  • the apparatus 10 may know— e.g., via the configuration data 44 shown in Fig.
  • the video data 12 is stored video data 52 and the physiological data 14 is stored physiological data 56.
  • the processing circuit 32 is configured to obtain the physiological data 14 by reading the stored physiological data 56 from a data store 50.
  • the communication interface circuitry 30 comprises the file I/O system or subsystem of a computer system, and thus obtains the physiological data 14 and the video data 12, via the file storage system of the computer.
  • the apparatus 10 may additionally or alternatively be configured to work with live data streams.
  • the communication interface circuitry 30 is therefore also configured, or is configured in the alternative, to receive a video data stream as the video data 12, and to receive one or more physiological sensor data streams incoming in realtime in association with live capture of the video data, as the physiological data 14.
  • Such capabilities complement, for example, live use of the apparatus 10 during the activity
  • the apparatus 10 may be integrated in or communicatively coupled with the image acquisition apparatus 70 introduced in Fig. 5, or at least coupled with one or more physiological data sensing devices.
  • the communication interface circuitry 30 is configured to communicatively couple to the image acquisition apparatus 70, and any one or more of the supplemental video camera 80, the EEG device 90, the audio recording device 100, and the galvanic sensing device 110.
  • these devices may be simplified in the sense that they include the requisite sensors and any needed analog acquisition, conditioning or filtering circuitry, but the digital processing associated with forming or otherwise obtaining usable measurement values representing the physiological data 14 is consolidated in processing circuit 32 of the apparatus 10.
  • the apparatus 10 may be configured to interface with and read a multiplicity of physiological sensor device types, allowing the human participant to mix and match the particular sensor types based on the nature of the activity and/or based on which particular sensor devices the human participant has access to.
  • the connectivity depicted in Fig. 7 also complements dynamic, live control of the image acquisition apparatus 70. That is, in one or more embodiments contemplated herein, the video operation controlled by the apparatus 10 is an image acquisition operation bearing on acquisition of the video data 12 by an image acquisition apparatus 70, the processing circuit 32 is configured to control the image acquisition operation via signaling 26 output from the communication interface circuitry 30— see Fig. 1 for its illustration of acquisition control signaling 26.
  • the apparatus 10 enables and disables video capture in dependence on the interest level of the human participant, as determined for successive segments of the video data 12.
  • the apparatus 10 may not wait until the end of a given segment of video data 12 to make an interest- level based decision for the segment. For example, it may evaluate only an initial fraction of the physiological data 14 corresponding to the segment, characterize that data to obtain a quantified interest level for the segment, and then toggle video capture on or off in dependence on the quantified interest level, or make some other acquisition parameter adjustment, such as changing the camera focus, the capture quality, etc.
  • the apparatus 10 buffers the incoming segments of video data 12 and the physiological data 14 associated with each segment. The segment is at least provisionally held in the buffer for saving or other processing, and such saving or other processing is then performed in dependence on evaluating the correspondingly buffered physiological data 14, to determine the quantified interest level associated with the buffered segment.
  • the apparatus 10 is configured to process or otherwise handle the video data 12 as successive segments, which generally are of uniform size— as measured in terms of the number of video frames included in each segment or in the time window duration defining each segment.
  • Each segment is temporally associated with a corresponding portion of the data values comprising the physiological sensor data 14.
  • the interest level of the human participant is determined for each segment of the video data 12 according to the physiological data 14 associated with the segment.
  • processing may consider all or a fraction of those physiological data values that are temporally associated with the segment of video data 12 under consideration, for determining the quantified interest level of the human participant with respect to the segment.
  • the processing circuit 32 in at least some embodiments is configured to characterize the physiological data 14 associated with each segment by evaluating the associated physiological data 14, or parameters derived therefrom, with respect to stored characteristics 46 corresponding to different interest levels— see Fig. 1.
  • the processing circuit 32 is configured to identify which stored characteristics 46 best correspond with the associated physiological data 14, or the parameters derived therefrom, and to take the corresponding interest level value as the quantified interest level 18 for the segment.
  • the quantified interest levels 18 correspond to different emotional states of the human participant, as explicitly or implicitly represented in the physiological data 14.
  • the stored characteristics 46 for at least one interest level value are characteristic of any one or more of the following emotional states: a high arousal state, a high cognition state, an anxious state, and a flow state.
  • These emotional states may have, for example, characteristic maximum or minimum measured values— in whatever units are involved for the given sensor types— or may have characteristic average values or patterns of values.
  • the stored characteristics 46 may comprise distinct sets of reference physiological data sensor values, or parameters derived therefrom, that are characteristic of different emotional states.
  • the physiological sensor data 14 for a given segment of the video data 12 is evaluated, either directly or after filtering or other parameterization, with respect to the different sets of reference physiological data values or parameters, for a determination of the best matching set.
  • the emotional state associated with that best matching set in turn maps to a numerical value representing the quantified interest level 18 associated with segment of video data 12 under consideration. Emotional states that are associated with relatively high levels of user emotion are assigned one or more numerical values indicating relatively higher levels of interest, while emotional states that are associated with relatively lower levels of user emotion are assigned one or more numerical values indicating relatively lower levels of interest.
  • a binary indication system is used, wherein the quantified interest level 18 determined for any given segment of the video data 12 is set to a first value if the associated physiological data 14 indicates an interest level below a defined threshold, and is set to a second value if the associated physiological data indicates an interest level that at least meets the defined threshold.
  • the emotional state of the user may be considered only implicitly—i.e., the physiological data 14 associated with a given segment of video data 14 is evaluated with respect to multiple sets of reference physiological sensor data 14, where each such set is characteristic for a given level of interest and has associated therewith a different numerical value.
  • the quantified interest level 18 for the segment of video data 12 in question would then be assigned as the numeric value of the reference data set that best matches— e.g., best correlates— with the associated physiological data 14.
  • metrics e.g., parameters or patterns— are derived from the associated physiological data 14, and then evaluated with respect to different reference metrics or sets thereof, with the different reference metrics being characteristic for different levels of interest.
  • the apparatus 10 may generate a histogram or histograms from the associated physiological data 14, e.g., based on binning the range of numeric values seen in the data, and compare the histogram(s) with reference histograms.
  • average or mean values e.g., as obtained by filtering, are compared.
  • patterns are detected from the physiological sensor data values and compared to reference patterns.
  • Gerbera, et al. "An affective circumplex model of neural systems subserving valence, arousal, and cognitive overlay during the appraisal of emotional faces," Neuropsychologia, Vol. 46, Issue 8, July 2008, Pages 2129-2139.
  • Using this bipolar model enables the apparatus 10 to evaluate EEG data 98 and determine therefrom times where the user is considered to be in heightened emotional state, which in turn implies that the video data 12 corresponding to those times is of relatively higher interest than the video data 12 corresponding to other times within the overall window of time spanned by the video data 12.
  • the most basic approach is to detect when the user's arousal is higher than a threshold, with the threshold potentially being set during a learning or calibration phase performed with the user.
  • interest levels are quantified based on detecting whether the arousal measure is above a threshold and/or whether the valence measure is above another defined threshold.
  • EEG data processing additionally or alternatively may be based on evaluating left/right lobe asymmetries in activity. For example, it has been observed that greater relative left-lobe EEG activity is seen for positively valenced emotions and that, conversely, greater right-lobe activity is seen for negatively valenced emotions.
  • the apparatus 10 determines arousal and valence metrics for EEG data 98 acquired whilst the user was engaged in the activity represented in the video data 12 in question.
  • the apparatus 10 may have a preconfigured target for reducing the length or size of the video data 12 or the user may input a target reduction value. In either case, the apparatus 10 uses the target value to define arousal and valence metric thresholds and then compares the EEG data 98 associated with each segment of the video data 12 to control the application of one or more video operations bearing on the video data 12, in dependence on whether the associated EEG data 98 satisfies the metric thresholds.
  • the user wears a helmet containing a number of EEG sensors, with the location of the individual sensors covering portions of the scalp associated with both the left and right sides of the frontal lobe.
  • the apparatus 10, or another configured computer system begins monitoring EEG data for the user during a measurement period in which the user remains inactive. These readings may be stored in the stored characteristics 46 as baseline EEG readings.
  • the apparatus 10, or the other configured computer system then collects EEG data while the user participates in activity intended to produce one or more heightened emotional states in the user.
  • the apparatus 10 or other computer system is configured to receive a user input, which the user is instructed to actuate at point in time during the activity in which the user experiences a desired emotional state.
  • the apparatus 10 or other computer system captures this point in time.
  • the corresponding EEG data or parameters derived therefrom are saved as reference EEG data in the stored characteristics 46.
  • the stored characteristics 46 at least in part comprise physiological sensor data, e.g., any one or more of sensor data 82, 98, 108, 118, or parameters derived therefrom, as previously obtained for the human participant in a prior calibration session.
  • physiological sensor data e.g., any one or more of sensor data 82, 98, 108, 118, or parameters derived therefrom, as previously obtained for the human participant in a prior calibration session.
  • the same approach also may be used for the aforementioned gaze data 20.
  • the field of view 72 for the image acquisition apparatus 70 is logically split into a number of tiles, e.g., into a geometric grid of i rows and j columns.
  • a particular area within the field of view 72 is identifiable with reference to the involved tile(s) T t j .
  • the rotation of one or both of the user's eyeballs is determined with respect to each tile and the parameters learnt during this calibration are then used to gauge which tile the user is looking at any given time during capture of the video data 12.
  • the apparatus 10 in one or more embodiments is configured to carry out a calibration process in which the user is instructed to look at a given location and the horizontal eye rotation of the user is then determined in conjunction with the user shifting his or her gaze between two points lying on a plane that is perpendicular to the imaging sensor and a known distance from the image sensor. The points are mapped to respective tiles, and the known distance is than used to calculate the horizontal rotations associated with other tiles in the field of view 72. A similar calibration is used for vertical rotation.
  • the user mounts both the image acquisition system 70 and the supplemental video camera 80, as they would be worn while engaged in a physical activity.
  • the supplemental video camera 80 is used for acquisition of eye movement data and calibration software—such as may be executed by the apparatus 10 or another computer system— displays a reference object on a display screen.
  • the dimensions of the reference object as displayed are known to the software, as are the resolution and dimensions of the display screen, the user' s viewing distance, etc.
  • the display screen image as seen by the image acquisition system 70 is captured and the captured image is overlaid on the reference image.
  • the user moves his head such that the captured object exactly overlays the reference object.
  • the software then displays a rectangle on the display and the user is prompted to look at the respective corners of the rectangle whilst eye movement data is captured for the user.
  • the detected eye rotations are then used to determine the rotations associated with each of the defined tiles.
  • the apparatus 10 evaluates gaze data 20 that was acquired whilst the user participated in the activity represented in the video data 12 and uses the eye rotation information in, or derived from, the gaze data to identify changing regions of interest 22 in the video frames comprising the video data 12— as illustrated in Fig. 3.
  • the processing circuit 32 is configured to identify a dominant interest level for the segment.
  • the "dominant" interest level is the interest level of longest detected duration within the segment.
  • the processing circuit 32 takes the dominant interest level as the quantified interest level 18 for the segment.
  • the processing circuit 32 is configured to control a video operation— i.e., one or more video operations— in dependence on the quantified interest levels.
  • a video operation i.e., one or more video operations— in dependence on the quantified interest levels.
  • the processing circuit 32 is configured to control acquisition and/or processing of the video data 12 in dependence on the quantified interest levels.
  • the processing circuit 32 is configured to modify the video data 12 in dependence on the quantified interest levels 18, such as by retaining or emphasizing segments associated with quantified interest levels 18 meeting a defined interest threshold, and discarding or de-emphasizing segments associated with quantified interest levels 18 not meeting the defined interest threshold.
  • the processing circuit 32 may be configured to implement any one or more of the following operations: modifying the video data 12 by retaining or emphasizing segments associated with quantified interest levels 18 meeting a defined interest threshold; discarding or de-emphasizing segments associated with quantified interest levels 18 not meeting the defined interest threshold; and determining an emphasis or de-emphasis modification to apply to the video data 12 for a given segment of the video data 12 in dependence on determining which one or more defined interest thresholds are met among a plurality of defined interest thresholds.
  • three interest level thresholds are defined: low, medium and high.
  • the threshold definitions will vary in dependence on how the quantified interest levels 18 are defined, but in an example case assume that the universe of quantified interest level values spans from 1-4, and assume that the "low" interest threshold is set to 1, the medium interest threshold is set to 3, and the high interest threshold is set to 4. Segments having a quantified interest level 18 of 1 would be processed as low-interest video segments, segments having a quantified interest level 18 of 2 or 3 would be processed as medium-interest segments, and segments having a quantified interest level 18 of 4 would be processed as high-interest segments. Such processing may involve modifying the video data 12 to have one of three image quality or compression settings corresponding to the three interest level ranges in play.
  • the processing circuit 32 may evaluate the quality, quantity, and/or variety of the data comprising the physiological data 14 associated with given video data 12.
  • the processing circuit 32 may use a larger set of quantified interest levels when it has physiological data 14 corresponding to two or more types— such as when it has EEG data 98 and pupil data 82 or skin response data 118. Additionally, or alternatively, it may evaluate the physiological data 14 for quality, e.g., such as by assessing the degree of correlation when comparing the physiological data 14, or parameters derived therefrom, with the stored characteristics 46.
  • the quantization resolution used for determining interest levels, and the number of recognized interest level ranges used for controlling the video operation in question may also be varied in dependence on the richness or quality of the calibration performed by the user.
  • the apparatus 10 may be configured such that a basic or quick calibration process is offered as an initial mechanism for generating the stored characteristics 46, but also may offer a longer, more complex calibration routine that allows the apparatus 10 to discern interest levels with greater resolution and/or accuracy.
  • de-emphasizing comprises, for example, any one or more of reducing the video quality, applying a higher level of video compression, reducing the image size or image resolution, applying a more aggressive time-lapse processing, reducing or eliminating video layers, at least in video data 12 that uses layers.
  • “emphasizing” comprises, for example, applying highlighting or other overlay data, decreasing the level of video compression, increasing an image size, saving or retaining a greater number of video layers, etc.
  • Emphasizing and de- emphasizing also may comprise or include changing the image focus, image centering, etc.
  • the processing circuit 32 in at least one embodiment is configured to reduce a file size or run time of the video data 12, as the controlled video operation in question, based on discarding or de-emphasizing segments of the video data 12 that are associated with quantified interest levels 18 not meeting a defined interest threshold.
  • the processing circuit 32 is configured to dynamically control an extent of file size or run time reduction in dependence on a configurable value.
  • the configurable value may be input by the user— e.g., via a keyboard, pointer device, or touchscreen interface, that is included in the apparatus 10, or one that the apparatus 10 is otherwise configured to communicatively couple with. Additionally, the configurable value may be stored in the configuration data 44, although it still may be updated by the user.
  • the processing circuit 32 is configured to reduce a file size or run time of the video data 12 as the video operation in question, based on being configured to determine a reduction target indicating a desired reduction in the file size or the run time of the video data, set an interest level threshold in dependence on the reduction target, and remove segments associated with quantified interest levels 18 below the interest level threshold.
  • the apparatus 10 may operate on the video data 12, such that the video data 12 is altered, or the apparatus 10 may generate modified video data 24, such as shown in Fig. 1.
  • the modified video data 24 represents a processed version of the video data 12 and is obtained in dependence on processing the video data 12 in dependence on the quantified interest levels 18 determined for the respective segments of the video data 12.
  • the processing circuit 32 after removing the segments associated with quantified interest levels 18 that are below the defined interest level threshold, the processing circuit 32 shall be understood as having modified video data comprising the segments not removed— i.e., the remaining segments. In at least one embodiment, for one or more of these remaining segments of the video data 12, the processing circuit 32 is configured to set the value of a time lapse parameter for the remaining segment in dependence on the quantified interest level 18 associated with the remaining segment, and control a time lapse effect applied to the remaining segment according to the value of the time lapse parameter.
  • a given interest level threshold is used to cull segments of the video data 12 that are deemed to be uninteresting.
  • the culling threshold may be set at the second interest level value, such that only segments having a quantified interest level equal or higher than the second interest level remain after culling.
  • a finer gradation is applied to these remaining segments by more or less aggressively applying time-lapse to them, as a function of their associated quantified interest levels 18.
  • the most aggressive time-lapse affect can be applied to remaining segments of the second interest level, a less aggressive time-lapse affect applied to remaining segments of the third interest level, and no time-lapse applied to remaining segments of the highest, fourth interest level.
  • the processing circuit 32 in one or more embodiments is, for at least one of the segments remaining after culling, is configured to evaluate gaze data 20 to identify a region of interest 22 within video frames of the video data 12 comprised within the segment, and modify the video data 12 comprised within the segment to emphasize the region of interest 22. Emphasizing the region of interest 22 may be based on selectively focusing on the region of interest 22, defocusing the remaining area of the video frame, centering the image on the region of interest 22, adding overlay or other highlighting video data, etc.
  • the gaze data 20 is synchronous with the video data 12 and acquired for the human participant while engaged in the activity represented in the video data 12.
  • the processing circuit 32 is configured to cull the least interesting segments and further to apply region-of-interest processing to one or more of the remaining segments.
  • the region-of-interest processing may be reserved only for the remaining segments having the highest quantified interest levels 18 and/or may be varied as a function of the quantified interest levels 18 of the remaining segments.
  • one or more embodiments of the apparatus 10 use region-of-interest processing with or without first culling the least interesting segments.
  • the processing circuit 32 is configured to evaluate gaze data 20 to identify a region of interest 22 within video frames of the video data 12 comprised within the segment, and to modify the video data 12 comprised within the segment to emphasize the region of interest 22.
  • the gaze data 20 is synchronized with the video data 12 and acquired for the human participant while engaged in the activity represented in the video data 12.
  • Fig. 8 depicts a method 800 according to one embodiment and it shall be understood that the method 800 may be performed by the apparatus 10 through the fixed or programmatic configuration of the processing circuit 32, or by another circuit arrangement.
  • the method 800 controls a video operation bearing on video data 12, responsive to determining the interest levels of a human participant engaged in an activity represented in the video data 12, and includes obtaining (Block 802) physiological data 14 acquired for the human participant while engaged in the activity.
  • this obtaining operation may comprise reading or otherwise obtaining stored data, or may comprise receiving one or more real time or near real time data streams.
  • the method 800 further includes quantifying (Block 804) interest levels of the human participant with respect to successive segments of the video data 12, based on characterizing the physiological data 14 associated with each segment and thereby obtaining a quantified interest level 18 for the segment. Still further, the method 800 includes controlling (Block 806) the video operation, in dependence on the quantified interest levels 18 associated with the segments.
  • Fig. 9 depicts another embodiment, wherein the processing operations taught herein are implemented via a set of functional modules.
  • the modules in question are, for example, implemented via the processing circuit 32 of the apparatus 10.
  • a first obtaining module configured to obtain the physiological data 14 in question—e.g., a file reader module and/or an interface module operative to receive incoming streaming data comprising one or more types of physiological sensor data.
  • a second obtaining module 102 configured to obtain the video data 12, which module again may be a file reader module configured to read stored video files and/or an interface module configured to receive streaming video data.
  • a third obtaining module 104 configured to obtain the gaze data 20, which module again may be a file reader module configured to read stored gaze data and/or an interface module configured to receive streaming gaze data.
  • Fig. 9 further depicts a characterizing module 110, which evaluates the physiological data 14 and the gaze data 20 in synchronicity with the video data 12— i.e., for each given segment of video data 12, the characterizing module 110 evaluates the corresponding portions or segments of the time-aligned physiological data 14 and gaze data 20.
  • the characterizing module 110 also uses the stored characteristics 46 to evaluate the physiological data 14 and/or the gaze data 20 associated with each segment of the video data 12, to thereby determine the quantified interest level 18 for the segment and any region(s) of interest 22 within the video data 12 comprised within the segment.
  • the stored characteristics 46 comprise, for example, baseline EEG data and one or more further sets of EEG data corresponding to one or more heightened emotional states.
  • the characterizing module 110 provides its output—i.e., the quantified interest levels 18 determined by it— to a processing and control module 112.
  • the processing and control module 112 controls a video operation bearing on the video data 12, in dependence on the quantified interest levels 18.
  • a video operation in the singular shall be understood as encompassing one or more video operations, any number of acquisition and/or post-capture video operations may be controlled in dependence on the quantified interest levels.
  • Fig. 10 illustrates a method 1000, which may be understood as a more detailed implementation of the method 800.
  • the method 1000 includes the operations of capturing the video data 12 (Block 1002), capturing the physiological data 14 (Block 1004), and capturing the gaze data 20 (Block 1006).
  • Processing further includes collating the various data— i.e., synchronizing or otherwise aligning, as needed, the physiological data 14 and the gaze data 20 with the video data 12 (Block 1008). Processing continues with partitioning the video data 12 into sections or segments, e.g., dividing the overall video data 12 uniformly into a series of segments of the same width— although a first or last segment may be smaller or larger than the others (Block 1010).
  • dynamic segmenting may be used, such as by using smaller or larger segments in areas where the physiological data 14 is more dynamic, such as where in an initial run of processing it is observed that a number of the uniformly sized segments involve more than one interest level transition, or otherwise where it becomes difficult to identify the dominant interest level within a given segment.
  • Processing continues with associating segments of the physiological data 14 and gaze data 20 with corresponding segments of the video data 12 (Block 1012). These associations are used to determine the sections with important gaze locations, IGLs, or otherwise identify higher- interest segments of the video data 12 (Block 1014). Still further, processing continues with removing the low-interest segments (Block 1016), and setting a time lapse parameter, TLP, in dependence on the determined quantified interest levels (Block 1018). Further, where appropriate, the method 1000 includes adjusting the video data 12 to emphasize the IGLs (Block 1020), and processing continues with creating a final video— i.e., the aforementioned modified video data 24— which reflects the interest-level dependent processing performed in the method 1000.
  • a camera is used to capture video of a user participating in an activity, with the footage collected being defined as the video data 12 at issue herein.
  • the user wears the camera and captures first-person footage.
  • the camera is separated from the user.
  • multiple cameras may be used, e.g., with each one capturing the same scene from a different angle.
  • the camera may be replaced by a system capable of storing streamed video, such as a video game system capturing video gameplay as it takes place.
  • an apparatus is configured to track the gaze of the user and provide data indicating that location in terms of the image field of view captured by the camera used to capture the video data 12.
  • the location of the user's gaze within the spatial area defined by the video data 12 is referred to as the "gaze location”.
  • one or more sensors capable of monitoring a physiological or mental signal of the user provides the physiological data 14. These sensors include any one or more of an EEG headset, a camera capturing either facial expressions or pupil response, a microphone configured for monitoring the user's voice, and one or more sensors configured to measure the skin conductance of the user.
  • the apparatus 10 employs various algorithms, such as an algorithm to associate sections of the video data 12 with user states— as represented by quantified interest levels 18— and important gaze locations— as represented by determined regions of interest 22.
  • This algorithm may be referred to as the "VA" algorithm.
  • apparatus 10 implements a further algorithm, denoted as the "VP" algorithm, to prepare modified video data 24, as a shortened version of the video data 12.
  • the modified video data 24 gives a higher level of prominence— or only includes— those sections of the video data 12 that were determined to be of higher interest to the user.
  • the apparatus 10 may use various look-up tables, such as a lookup table that stores the association between sensor readings and user state—e.g., a look- up table or LUT that maps characterized physiological data 14 to predefined numeric values representing different levels of user interest.
  • the apparatus 10 also may use an "additional material" or "AM" LUT, which associates different additional materials with different user states or interest levels.
  • the additional materials comprise, for example, different video effects, which may be selectively overlaid into the video data 12, e.g., to emphasize IGLs and/or to highlight segments of higher interest.
  • the physiological data 14 and gaze data 20 are captured simultaneously with capturing the video data 12, and the apparatus 10 is configured to control the acquisition and collection so that all such data is synchronized, or at least so that all such data includes time-stamp information for use in post-capture synchronization.
  • the apparatus 10 may be split into a first part that controls the aforementioned acquisition, and a second part that performs the below described post processing.
  • the VA algorithm associates segments of the video data 12 with user states.
  • An example of how this is achieved in one or more embodiments is as follows: the video data 12 is divided into a number of sections or segments, each with equal length; for each section, the VA algorithm determines the user state that the user was in for the longest time, defined as the "SUS" or section user state. The VA algorithm further determines a section gaze location or "SGL" for the time period defined by each section.
  • SGL is identified, for example, as being the tile(s) within the field-of-view corresponding to the longest-duration of detected user gaze direction, for the time period in question.
  • the VA algorithm also may operate with a minimum gaze duration threshold, such that no SGL is detected if the gaze data for the video section in question does not indicate that the user looked in any particular direction for more than the minimum threshold of time.
  • a section of the video data 12, or subsequent sections of the video data 12 is determined to have an important gaze location or "IGL".
  • IGL important gaze location
  • the climber in question may have looked intently at one or more rocks or other critical handholds during the climb. Similar instances occur during skiing, such as where the skier looks intently at an impending jump or other downhill obstacle.
  • These gaze locations are tagged as IGLs by the VA Algorithm.
  • the VA Algorithm detects cases where the SGL determined for a given succession of video sections is substantially the same over those sections, and the involved sections are also detected as having a heightened level of interest or focus.
  • the SGL(s) would be tagged as being an IGL.
  • the VA algorithm at least temporarily the start and end times of the involved sections, together with the SUS and SGL, and any IGL tags.
  • the VP algorithm takes the output from the VA algorithm and uses it to create a final video.
  • the starting length of the video data 12 is denoted as "SL”.
  • the user indicates the desired final length of the modified video data 24 as "FL”.
  • the user state LUT associates with each defined user state a defined level of interest or "LOI" that ranges from “0" to denote “least interesting” up to "10" to denote most interesting.
  • the VP algorithm removes segments from the video data 12. For example, if SF/FL is larger than a defined threshold, all segments of the video data 12 having an LOI of 0 are removed. If SF/LF is smaller than that threshold, or smaller than another defined threshold, segments of the video data 12 having an LOI of 0, 1, 2 or 3 are removed. Thus, the "bar" for retaining segments becomes increasingly aggressive in dependence on the desired amount of length reduction. Of course, multiple thresholds, each with a different associated number of LOIs for culling can be used by the apparatus 10.
  • each LOI is associated with one or more time lapse parameters or TLPs.
  • TLP time lapse parameters
  • An example of a TLP is the regularity of frames that are to be removed from the video. For example a more interesting segment of the video data 12 can have nine from each ten frames of video removed, whereas a less interesting segment can have nineteen of each twenty frames removed. As the remaining frames are displayed for the same duration, events in the more interesting video will appear to pass slower, whilst events in the less interesting video will appear to pass more quickly.
  • the VP algorithm adjusts the video such that the IGL has more prominent visibility. Examples of how this is achieved include: where the frame size of the modified video data 34 is less than the capture video data 12, the center of the image can be moved such that the IGL has a more central location; the image properties can be adjusted such that the IGL has a differentiated appearance, e.g., such as being brighter; additional materials can be used to highlight the IGL, such as by overlaying arrows or other graphics. Once any such highlighting is performed, the VP algorithm performs any further actions required to create a viewable file containing the modified video data 34 in viewable form.
  • gaze data processing also may include object recognition techniques, such that the regions of interest 22 are based on user gaze data 20 and/or based on recognizing key objects within the video data 12. Further, the use of highlighting overlays or other such additional material can be used to differentiate not only regions of interest 22, but also to differentiate between video segments that are associated with different quantified interest levels 18.
  • final video data obtained according to the teachings herein will better reflect the user's actual levels of interest during participation in the activity represented in the final video data.
  • Such processing thus results in more enjoyable videos, or videos that better reflect the experience of the recorded event.
  • processing generally reduces video transmission bandwidth requirements and the potentially significant reductions in video file size and/or video length can, in the aggregate, provide significant storage benefits, wherein the user effectively archives what amounts to "highlight reels" that are automatically generated according to the teachings herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

In one aspect of the teachings presented herein, a video operation bearing on video data is controlled in response to determining interest levels of a human participant engaged in an activity represented in the video data. Video operation control in this manner relies on the processing of physiological data captured for the human participant while engaged in the activity represented in the video and infers the participant's interest levels from physiological data characteristics determined via the processing. Example operational controls include but are not limited to dynamically enabling and disabling video capture, dynamically controlling video capture parameters, and dynamically controlling one or more editing functions applied to captured video data.

Description

METHOD AND APPARATUS FOR VIDEO
PROCESSING BASED ON PHYSIOLOGICAL DATA
TECHNICAL FIELD
The present invention generally relates to video processing, and particularly relates to video processing based on physiological data acquired for a human participant in an activity represented in video data.
BACKGROUND
Participants in sporting activities increasingly use wearable video cameras to capture their experiences. Manufacturers responding to this trend and capitalizing on advances in digital video capture and editing, offer increasingly sophisticated camera systems capable of capturing potentially lengthy high-definition videos. Similarly, people playing computer games often capture the gameplay in a video, as can be seen by the large amount of video game footage uploaded to various internet video sites.
With increasing video quality comes increasing video file sizes and these larger files sizes impose significant short-term storage requirements on the video capture systems, and even greater aggregate storage requirements on whatever computer systems are used to accumulate, edit and archive the video files. Further, because it is common for people to upload video files to content-viewing sites, cloud storage, etc., file size impacts transmission bandwidth requirements, storage costs, etc.
Moreover, it is recognized herein that the events captured in a typical video file often include portions of less interest than others. For example, a thirty-minute recording of a mountain biking excursion may capture a number of relatively brief periods of exciting downhill runs separated by longer interludes of relatively less interesting or less intense activity.
While automatic "shortening" of videos is known in some sense, these approaches are not intelligent. Consider, for example, known approaches to time lapse, which shorten videos by uniformly reducing the number of video frames. Conventional automated time-lapse processing consequently does not differentiate between more and less "interesting" portions of a given video and, indeed, has no understanding of which portions of a video file are more interesting than others. SUMMARY
In one aspect of the teachings presented herein, a video operation bearing on video data is controlled in response to determining interest levels of a human participant engaged in an activity represented in the video data. Video operation control in this manner relies on the processing of physiological data captured for the human participant while engaged in the activity represented in the video and infers the participant's interest levels from data characteristics determined via the processing. Example operational controls include, but are not limited, to dynamically enabling and disabling video capture, dynamically controlling video capture parameters, and dynamically controlling one or more editing functions applied to captured video data.
In an example embodiment, a method of controlling a video operation bearing on video data responsive to determining interest levels of a human participant engaged in an activity represented in the video data includes obtaining physiological data acquired for a human participant while engaged in an activity represented in video data. The method further includes quantifying interest levels of the human participant with respect to successive segments of the video data, based on characterizing the physiological data associated with each segment and thereby obtaining a quantified interest level for the segment. Still further, the method includes controlling the video operation, in dependence on the quantified interest levels associated with the segments.
In another example embodiment, an apparatus is configured to control a video operation bearing on video data, in response to determining interest levels of a human participant engaged in an activity represented in the video data. The example apparatus includes communication interface circuitry configured to obtain the video data and further configured to obtain physiological data acquired for the human participant while engaged in the activity. The apparatus further includes a processing circuit operatively associated with the communication interface circuitry and configured to quantify interest levels of the human participant with respect to successive segments of the video data. Quantification is based on the processing circuit characterizing the physiological data associated with each segment and thereby obtaining a quantified interest level for the segment. Correspondingly, the processing circuit is configured to control the video operation, in dependence on the quantified interest levels associated with the segments.
Of course, the present invention is not limited to the above features and advantages. Indeed, those skilled in the art will recognize additional features and advantages upon reading the following detailed description, and upon viewing the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a block diagram of one embodiment of an apparatus that is configured to control a video operation on video data, in dependence on quantifying the interest levels of a human participant engaged in an activity represented in the video data.
Fig. 2 is a diagram of one embodiment of a temporal association scheme, whereby quantified interest levels are determined for successive segments of video data according to corresponding physiological data acquired for a human participant engaged in an activity represented in the video data.
Fig. 3 is a diagram of one embodiment of a spatial association scheme, whereby particularized regions of interest are determined within given video frames in video data, based according to corresponding gaze data acquired for a human participant engaged in an activity represented in the video data.
Fig. 4 is a block diagram of example details for the apparatus introduced in Fig. 1, according to some embodiments.
Fig. 5 is a block diagram of example physiological sensor devices, any one or more of which are used in some embodiments herein, to obtain physiological data from a human participant engaged in an activity represented in corresponding video data.
Fig. 6 is a diagram of example contents of stored physiological data, e.g., data files, in one or more embodiments, for use in quantifying the interest levels of a human participant in corresponding segments of video data.
Fig. 7 is a block diagram of example details for the apparatus introduced in Fig. 1, according to some embodiments, wherein the apparatus is configured to receive one or more real-time physiological data sensor streams and/or video data streams.
Fig. 8 is a logic flow diagram of one embodiment of a method of processing, whereby a video operation is controlled in dependence on quantified interest levels of a human participant with respect to segments of video data capturing an activity involving the human participant.
Fig. 9 is a block diagram of one embodiment of functional modules, such as may be implemented in a processing circuit, for carrying out the video operation automation taught herein.
Fig. 10 is a logic flow diagram of another embodiment of a method of processing, whereby a video operation is controlled in dependence on quantified interest levels of a human participant with respect to segments of video data capturing an activity involving the human participant. DETAILED DESCRIPTION
Fig. 1 illustrates an example apparatus 10 that is configured to control a video operation bearing on video data 12, responsive to determining interest levels of a human participant engaged in an activity represented in the video data 12. More particularly, the apparatus 10 obtains physiological data 14, from which it determines quantified interest levels 18, e.g., a quantified interest level 18-1 corresponding to a first segment of the video data 12, a quantified interest level 18-2 corresponding to a second segment of the video data 12, and so on. As seen in Fig. 2, successive portions 14-1, 14-2, and so on of the physiological data 14, are associated with respective segments of the video data 12. Thus, the apparatus 10 evaluates the physiological data 14-1 to obtain the quantified interest level 18-1 for the first segment of video data 12, evaluates the physiological data 14-2 to obtain the quantified interest level 18-2 for the second segment of video data 12, and so on.
The apparatus 10 in at least one embodiment also obtains gaze data 20, which provides indications of the spatial focus of the human participant while engaged the activity represented in the video data 12. As seen in Fig. 3, the gaze data 20 comprises, for example, one or more types of data values that indicate the spatial focus of the human participant in terms of the field of view associated with capture of the video data 12. For example, the gaze data 20 comprises gaze angle or direction data that indicates regions of interest 22 within the spatial field of the video frames comprising the video data 12.
The apparatus 10 outputs a modified video signal 24 and/or acquisition control signaling 26. For example, in embodiments that involve outputting a modified video signal 24, such a signal reflects the results of dynamically controlling one or more video operations on the video data 12, in dependence on the quantified interest levels 18. In another example, in embodiments that involve outputting acquisition control signaling 26, such signaling controls one or more image capture parameters— e.g., enabling or disabling capture, changing camera focus settings, changing capture quality settings, etc.
To better understand the apparatus 10 and its operational features in the example implementation given in Fig. 1, one sees that the apparatus 10 includes communication interface circuitry 30 configured to obtain the video data 12 and further configured to obtain physiological data 14 acquired for the human participant while engaged in the activity. The apparatus 10 further includes a processing circuit 32 operatively associated with the communication interface circuitry 30.
The processing circuit 32 is configured to quantify interest levels of the human participant with respect to successive segments of the video data 12, based on characterizing the physiological data 14 associated with each segment and thereby obtaining a quantified interest level 18 for the segment, and to control the video operation, in dependence on the quantified interest levels 18 associated with the segments. Here, the "video operation" comprises one or more operations associated with capturing the video data 12 and/or with post-capture processing of the video data 12. At least functionally, the processing circuit 32 implements a quantifier circuit 34 that generates the quantified interest levels 18, and a video operation control circuit 36 that controls the video operation in dependence on the quantified interest levels 18.
The apparatus 10 further comprises storage 40. In one or more examples, the storage 40 comprises any one or more of magnetic disk storage, solid-state disk storage, FLASH memory, EEPROM, Static RAM or SRAM, and Dynamic RAM or DRAM. Thus, the storage 40 may comprise non- volatile memory, volatile memory, or any mix thereof. More broadly, the storage 40 shall be understood as comprising one or more types of computer-readable medium that provide non-transitory storage for any one or more of a computer program 42, configuration data 44, and stored characteristics 46. Here, "non-transitory" storage does not necessarily mean permanent or unchanging storage, and includes copies of data and/or program instructions held in working memory for dynamic processing, but the term "non-transitory" does exclude mere propagating signals.
For example, in at least one embodiment, the processing circuit 32 comprises digital processing circuitry that is specially adapted or otherwise configured to carry out the processing and associated algorithms taught herein, based on its execution of computer programs comprising the computer program 42. Such digital processing circuitry comprises, for example, one or more microprocessors, microcontrollers, Digital Signal Processors or DSPs, Field Programmable Gate Arrays or FPGAs, Application Specific Integrated Circuits or ASICs, or Complex Programmable Logic Devices or CPLDs. Of course, the processing circuit 32 may comprise fixed circuitry, programmed circuitry, or a mix thereof. In some embodiment, the apparatus 10 comprises a Personal Computer or PC, the processing circuit 32 comprises a CPU or other processing circuit of the PC, and the storage 40 comprises the memory and/or disk storage of the PC. However, the teachings herein are not limited to such implementations, and in other embodiments the apparatus 10 comprises a dedicated device, such as computer peripheral or other "connected" device, or is integrated with the image acquisition system used to capture the video data 12.
With reference to Fig. 4, in some embodiments, the communication interface circuitry 30 is configured to communicatively couple the processing circuit 32 to one or more data stores 50. The data store 50 may be local to the apparatus 10, e.g., part of the overall storage of the PC or other computer system implementing the apparatus 10, but this arrangement is non-limiting. The data store 50 in an example embodiment contains one or more data files 52 and 56 containing the video data 12 and the physiological data 14. In turn, the processing circuit 32 is configured to control the communication interface circuitry 30 obtain the physiological data 14 from the one or more data files 52, 56.
In some embodiments, the physiological data 14 is captured in a file or files separate from the video data 12. For example, sensor equipment generates the data file or files 56 based on capturing the physiological data 14 from one or more physiological sensors associated with the human participant, while the human participant is engaged in the activity in question. At the same time, a separate image acquisition system captures the video data 12. Notably, there may be more than one file 56, with each file corresponding to a different type of physiological data, in cases where more than one type of physiological sensor is used to capture the physiological data— where such multi-type sensing offers a potentially more robust mechanism for assessing the interest levels of the participant, at the expense of more sensing equipment, larger data sets and greater processing complexity.
Of course, it is also contemplated that multi- sensor data can be consolidated into a single file 56, for processing by the apparatus 10. It is also contemplated that the physiological data 14, regardless of how many files are involved, may already be synchronized with the video data 12 in the data file 52. Indeed, a single data file may include both the video data 12 and the corresponding, synchronized physiological data 14. However, in at least some embodiments, the apparatus 10 is configured to synchronize the physiological data 14 with the video data 12, if the physiological data is not already synchronized, based on time stamps or other embedded temporal markers in the video data 12 and in the physiological data 14.
Consider the example arrangement of Fig. 5, where one sees an image acquisition apparatus 70 associated with a human participant. The camera 70 has a field of view 72 and captures video data 12 corresponding to that field of view. The image acquisition apparatus 70 is, for example, a wearable digital video camera, such as a chest- or helmet- worn camera. In further association with the participant, one sees a supplemental video camera 80, for obtaining supplemental video data 82 that is concurrent with video capture by the image acquisition apparatus 70. The supplemental video data 82 comprises, for example, facial expression data captured for the human participant while engaged in the activity being videoed. Additionally, or alternatively, the supplemental video data comprises pupillometry data captured for the human participant while engaged in the activity being videoed and/or the aforementioned gaze data 20.
The supplemental video camera 80 thus may be specialized camera mounted in such a way as to focus on the face or eyes of the human participant, to obtain further types of physiological data indicating the participant's interest levels— e.g., video data from which the participant's pupil responses and/or facial expressions can be determined. Additionally, or alternatively, the supplemental video camera 80 may be used to obtain supplemental video data that is preprocessed for the apparatus 10, or is processed by the apparatus 10, to derive gaze angle or direction values, indicating the direction or angle of the human participant's gaze within a measurement framework or coordinate system that maps to the field of view of the image acquisition apparatus 70 or, more particularly, to the spatial area represented by the video frames comprising the video data 12, so that regions of interest 22 are identifiable therein.
By way of example and not limitation, the human participant is further associated with a electro-encephalograph, EEG, device 90, which includes a number of EEG sensors 92 that are configured to couple to or otherwise contact the body of the human participant, to thereby provide EEG measurement signals to a processing circuit 94 during the time that the human participant is engaged in the activity in question. The EEG device 90 further includes storage 96, for capturing and storing the EEG data 98 produced by the processing circuit 94. It will be appreciated that the EEG data 98 is accessible to the apparatus 10, such as the data being transferrable to the storage 40 of the apparatus 10 or to the data store 50. For example, the storage 96 may be removable from the EEG device 90, or it may have a USB, or other communication interface available for reading out such data.
Additionally, or alternatively, the human participant is associated with an audio recording device 100, which includes a microphone that is configured to provide an audio signal to a processing circuit 104 during the time that the human participant is engaged in the activity in question. The audio recording device 100 further includes storage 106, for capturing and storing the audio data 108 produced by the processing circuit 104. It will be appreciated that the audio data 108 is accessible to the apparatus 10, such as the data being transferrable to the storage 40 of the apparatus 10 or to the data store 50. For example, the storage 106 may be removable from the audio device 100, or it may have a USB, or other communication interface available for reading out such data. The audio data 108 provides the apparatus 10 with yet another type of physiological data 14, in the sense that vocalizations by the human participant and other types or levels of sound recorded in conjunction with capturing the video data 12 provides the
apparatus 10 with further cues regarding the interest levels of the human participant.
Additionally, or alternatively, the human participant is associated with a galvanic sensing device 110, which includes one or more skin conductivity sensors 112 that provide skin conductivity signals to a processing circuit 114 during the time that the human participant is engaged in the activity in question. The galvanic sensing device 110 further includes
storage 116, for capturing and storing the skin conduction data 118 produced by the processing circuit 114. It will be appreciated that the skin conduction data 118 is accessible to the apparatus 10, such as the data being transferrable to the storage 40 of the apparatus 10 or to the data store 50. For example, the storage 116 may be removable from the galvanic sensing device 110, or it may have a USB, or other communication interface available for reading out such data. The skin conduction data 118 provides the apparatus 10 with yet another type of physiological data 14, in the sense that skin conductivity of the human participant provides the apparatus 10 with further cues regarding the interest level of the human participant.
It will be appreciated that the particular type or types of physiological data sensed will depend to some extent on the nature of the involved activity. For example, skin conductivity would, as a general rule, not be used in cases where the involved activity involved immersion of the human participant in water. As another example, such as where the human participant wears a protective helmet, EEG sensing integrated into the helmet provides a convenient and complementary arrangement for obtain EEG data 98. The wearing of a helmet also provides a convenient mechanism for wearing or mounting the supplemental video camera 80 shown in Fig. 5. Of course, with increasing miniaturization, such data may be obtained using sensors built into the glasses or other protective eyewear worn by the human participant.
In any case, Fig. 6 depicts an arrangement wherein the physiological data 14 is stored as one or more data files 56, and comprises any one or more of EEG data 98, audio data 108, and skin conduction data 118. It will be appreciated that the capture of such data may have been synchronized with capturing the video data 12, meaning that the data values in question are already synchronized time wise with the video data 12. In other cases, the data values in question are not already synchronized but they include one or more timestamps or other temporal information and the apparatus 10 uses this temporal information to logically align the physiological data 14 with the video data. Additionally, or alternatively, the apparatus 10 may know— e.g., via the configuration data 44 shown in Fig. 1— the sampling rate or period for the values comprising any given type of physiological data 14, and thus knows the overall timing of the data values therein from such information and, e.g., knowledge of the start timing of the data acquisition relative to the video capture.
Thus, in one or more embodiments, the video data 12 is stored video data 52 and the physiological data 14 is stored physiological data 56. Correspondingly, the processing circuit 32 is configured to obtain the physiological data 14 by reading the stored physiological data 56 from a data store 50. Here, in one or more example implementations, the communication interface circuitry 30 comprises the file I/O system or subsystem of a computer system, and thus obtains the physiological data 14 and the video data 12, via the file storage system of the computer. However, the apparatus 10 may additionally or alternatively be configured to work with live data streams. In some embodiments, the communication interface circuitry 30 is therefore also configured, or is configured in the alternative, to receive a video data stream as the video data 12, and to receive one or more physiological sensor data streams incoming in realtime in association with live capture of the video data, as the physiological data 14. Such capabilities complement, for example, live use of the apparatus 10 during the activity
represented in the video data 12.
For example, the apparatus 10 may be integrated in or communicatively coupled with the image acquisition apparatus 70 introduced in Fig. 5, or at least coupled with one or more physiological data sensing devices. Such an arrangement is shown in Fig. 7, wherein the communication interface circuitry 30 is configured to communicatively couple to the image acquisition apparatus 70, and any one or more of the supplemental video camera 80, the EEG device 90, the audio recording device 100, and the galvanic sensing device 110. Indeed, in one or more embodiments contemplated herein, these devices may be simplified in the sense that they include the requisite sensors and any needed analog acquisition, conditioning or filtering circuitry, but the digital processing associated with forming or otherwise obtaining usable measurement values representing the physiological data 14 is consolidated in processing circuit 32 of the apparatus 10. The apparatus 10 may be configured to interface with and read a multiplicity of physiological sensor device types, allowing the human participant to mix and match the particular sensor types based on the nature of the activity and/or based on which particular sensor devices the human participant has access to.
The connectivity depicted in Fig. 7 also complements dynamic, live control of the image acquisition apparatus 70. That is, in one or more embodiments contemplated herein, the video operation controlled by the apparatus 10 is an image acquisition operation bearing on acquisition of the video data 12 by an image acquisition apparatus 70, the processing circuit 32 is configured to control the image acquisition operation via signaling 26 output from the communication interface circuitry 30— see Fig. 1 for its illustration of acquisition control signaling 26. By way of example, the apparatus 10 enables and disables video capture in dependence on the interest level of the human participant, as determined for successive segments of the video data 12.
Of course, when operating on live video and physiological sensor data streams, the apparatus 10 may not wait until the end of a given segment of video data 12 to make an interest- level based decision for the segment. For example, it may evaluate only an initial fraction of the physiological data 14 corresponding to the segment, characterize that data to obtain a quantified interest level for the segment, and then toggle video capture on or off in dependence on the quantified interest level, or make some other acquisition parameter adjustment, such as changing the camera focus, the capture quality, etc. Alternatively, the apparatus 10 buffers the incoming segments of video data 12 and the physiological data 14 associated with each segment. The segment is at least provisionally held in the buffer for saving or other processing, and such saving or other processing is then performed in dependence on evaluating the correspondingly buffered physiological data 14, to determine the quantified interest level associated with the buffered segment.
Turning back to Fig. 2 momentarily, one sees that the apparatus 10 is configured to process or otherwise handle the video data 12 as successive segments, which generally are of uniform size— as measured in terms of the number of video frames included in each segment or in the time window duration defining each segment. Each segment is temporally associated with a corresponding portion of the data values comprising the physiological sensor data 14. Thus, the interest level of the human participant is determined for each segment of the video data 12 according to the physiological data 14 associated with the segment. As noted, such processing may consider all or a fraction of those physiological data values that are temporally associated with the segment of video data 12 under consideration, for determining the quantified interest level of the human participant with respect to the segment.
In determining these quantified interest levels, the processing circuit 32 in at least some embodiments is configured to characterize the physiological data 14 associated with each segment by evaluating the associated physiological data 14, or parameters derived therefrom, with respect to stored characteristics 46 corresponding to different interest levels— see Fig. 1. In particular, the processing circuit 32 is configured to identify which stored characteristics 46 best correspond with the associated physiological data 14, or the parameters derived therefrom, and to take the corresponding interest level value as the quantified interest level 18 for the segment.
In at least one implementation, the quantified interest levels 18 correspond to different emotional states of the human participant, as explicitly or implicitly represented in the physiological data 14. Correspondingly, the stored characteristics 46 for at least one interest level value are characteristic of any one or more of the following emotional states: a high arousal state, a high cognition state, an anxious state, and a flow state. These emotional states may have, for example, characteristic maximum or minimum measured values— in whatever units are involved for the given sensor types— or may have characteristic average values or patterns of values.
Thus, the stored characteristics 46 may comprise distinct sets of reference physiological data sensor values, or parameters derived therefrom, that are characteristic of different emotional states. The physiological sensor data 14 for a given segment of the video data 12 is evaluated, either directly or after filtering or other parameterization, with respect to the different sets of reference physiological data values or parameters, for a determination of the best matching set. The emotional state associated with that best matching set in turn maps to a numerical value representing the quantified interest level 18 associated with segment of video data 12 under consideration. Emotional states that are associated with relatively high levels of user emotion are assigned one or more numerical values indicating relatively higher levels of interest, while emotional states that are associated with relatively lower levels of user emotion are assigned one or more numerical values indicating relatively lower levels of interest. In one example, a binary indication system is used, wherein the quantified interest level 18 determined for any given segment of the video data 12 is set to a first value if the associated physiological data 14 indicates an interest level below a defined threshold, and is set to a second value if the associated physiological data indicates an interest level that at least meets the defined threshold.
Of course, finer gradations are contemplated and it should also be appreciated that the emotional state of the user may be considered only implicitly— i.e., the physiological data 14 associated with a given segment of video data 14 is evaluated with respect to multiple sets of reference physiological sensor data 14, where each such set is characteristic for a given level of interest and has associated therewith a different numerical value. The quantified interest level 18 for the segment of video data 12 in question would then be assigned as the numeric value of the reference data set that best matches— e.g., best correlates— with the associated physiological data 14. In one or more other embodiments, metrics— e.g., parameters or patterns— are derived from the associated physiological data 14, and then evaluated with respect to different reference metrics or sets thereof, with the different reference metrics being characteristic for different levels of interest.
Thus, the apparatus 10 may generate a histogram or histograms from the associated physiological data 14, e.g., based on binning the range of numeric values seen in the data, and compare the histogram(s) with reference histograms. In other approaches, average or mean values, e.g., as obtained by filtering, are compared. In still other approaches, patterns are detected from the physiological sensor data values and compared to reference patterns.
Still further, it will be appreciated that the nature of the evaluation will depend on the nature of the underlying physiological data 14. For example, with respect to discerning emotional states from EEG data, see X. Wanga, N. Dan, and L. Bao-Lian, "Emotional state classification from EEG data using machine learning approach " Neurocomputing, Vol.129, 10 April 2014, Pages 94-106. For further example information, see M. Murugappan, N.
Ramachandran, and Y. Sazali, "Classification of human emotion from EEG using discrete wavelet transform " Biomedical Science and Engineering, 2010, 3, 390-396, as published online on 10 April 2010. For yet further details regarding the relationships between emotional states and EEG data, skin conduction data, and more, the interested reader may refer to Carlson, Neil R. (2012), Physiology of Behavior (11th ed.), Pearson Publishing. In general, various techniques can be used to associate EEG measurements to human emotion. One approach contemplated herein uses a bipolar model, with the axis being (i) arousal and (ii) valence. See, e.g., A. Gerbera, et al., "An affective circumplex model of neural systems subserving valence, arousal, and cognitive overlay during the appraisal of emotional faces," Neuropsychologia, Vol. 46, Issue 8, July 2008, Pages 2129-2139. Using this bipolar model enables the apparatus 10 to evaluate EEG data 98 and determine therefrom times where the user is considered to be in heightened emotional state, which in turn implies that the video data 12 corresponding to those times is of relatively higher interest than the video data 12 corresponding to other times within the overall window of time spanned by the video data 12. The most basic approach is to detect when the user's arousal is higher than a threshold, with the threshold potentially being set during a learning or calibration phase performed with the user. In a more complex case, interest levels are quantified based on detecting whether the arousal measure is above a threshold and/or whether the valence measure is above another defined threshold.
Arousal is associated with higher values of some neurological signals, for example the "beta wave" components of the EEG data 98. These components fall within the frequency range of human brain activity between 12.5 and 30 Hz. EEG data processing additionally or alternatively may be based on evaluating left/right lobe asymmetries in activity. For example, it has been observed that greater relative left-lobe EEG activity is seen for positively valenced emotions and that, conversely, greater right-lobe activity is seen for negatively valenced emotions.
Thus, in at least one embodiment, the apparatus 10 determines arousal and valence metrics for EEG data 98 acquired whilst the user was engaged in the activity represented in the video data 12 in question. The apparatus 10 may have a preconfigured target for reducing the length or size of the video data 12 or the user may input a target reduction value. In either case, the apparatus 10 uses the target value to define arousal and valence metric thresholds and then compares the EEG data 98 associated with each segment of the video data 12 to control the application of one or more video operations bearing on the video data 12, in dependence on whether the associated EEG data 98 satisfies the metric thresholds.
In one such approach, the user wears a helmet containing a number of EEG sensors, with the location of the individual sensors covering portions of the scalp associated with both the left and right sides of the frontal lobe. The apparatus 10, or another configured computer system, begins monitoring EEG data for the user during a measurement period in which the user remains inactive. These readings may be stored in the stored characteristics 46 as baseline EEG readings. The apparatus 10, or the other configured computer system, then collects EEG data while the user participates in activity intended to produce one or more heightened emotional states in the user. The apparatus 10 or other computer system is configured to receive a user input, which the user is instructed to actuate at point in time during the activity in which the user experiences a desired emotional state. The apparatus 10 or other computer system captures this point in time. The corresponding EEG data or parameters derived therefrom are saved as reference EEG data in the stored characteristics 46.
Broadly, then, in some embodiments, the stored characteristics 46 at least in part comprise physiological sensor data, e.g., any one or more of sensor data 82, 98, 108, 118, or parameters derived therefrom, as previously obtained for the human participant in a prior calibration session. The same approach also may be used for the aforementioned gaze data 20.
For example, the field of view 72 for the image acquisition apparatus 70 is logically split into a number of tiles, e.g., into a geometric grid of i rows and j columns. Thus, a particular area within the field of view 72 is identifiable with reference to the involved tile(s) Tt j . In a calibration session, the rotation of one or both of the user's eyeballs is determined with respect to each tile and the parameters learnt during this calibration are then used to gauge which tile the user is looking at any given time during capture of the video data 12.
The apparatus 10 in one or more embodiments is configured to carry out a calibration process in which the user is instructed to look at a given location and the horizontal eye rotation of the user is then determined in conjunction with the user shifting his or her gaze between two points lying on a plane that is perpendicular to the imaging sensor and a known distance from the image sensor. The points are mapped to respective tiles, and the known distance is than used to calculate the horizontal rotations associated with other tiles in the field of view 72. A similar calibration is used for vertical rotation.
In an alternative approach, the user mounts both the image acquisition system 70 and the supplemental video camera 80, as they would be worn while engaged in a physical activity. The supplemental video camera 80 is used for acquisition of eye movement data and calibration software— such as may be executed by the apparatus 10 or another computer system— displays a reference object on a display screen. The dimensions of the reference object as displayed are known to the software, as are the resolution and dimensions of the display screen, the user' s viewing distance, etc.
Correspondingly, the display screen image as seen by the image acquisition system 70 is captured and the captured image is overlaid on the reference image. The user moves his head such that the captured object exactly overlays the reference object. The software then displays a rectangle on the display and the user is prompted to look at the respective corners of the rectangle whilst eye movement data is captured for the user. The detected eye rotations are then used to determine the rotations associated with each of the defined tiles. Later, in video processing operation, the apparatus 10 evaluates gaze data 20 that was acquired whilst the user participated in the activity represented in the video data 12 and uses the eye rotation information in, or derived from, the gaze data to identify changing regions of interest 22 in the video frames comprising the video data 12— as illustrated in Fig. 3.
Also recognized herein is the possibility of the user's interest level changing during the time period represented by any given segment of video data 12. In some embodiments, at least where the physiological data 14 associated with a given segment indicates changing interest levels within the given segment, the processing circuit 32 is configured to identify a dominant interest level for the segment. For example, the "dominant" interest level is the interest level of longest detected duration within the segment. The processing circuit 32 takes the dominant interest level as the quantified interest level 18 for the segment.
However the quantified interest levels 18 are determined, the processing circuit 32 is configured to control a video operation— i.e., one or more video operations— in dependence on the quantified interest levels. For example, the processing circuit 32 is configured to control acquisition and/or processing of the video data 12 in dependence on the quantified interest levels. Additionally, or alternatively, the processing circuit 32 is configured to modify the video data 12 in dependence on the quantified interest levels 18, such as by retaining or emphasizing segments associated with quantified interest levels 18 meeting a defined interest threshold, and discarding or de-emphasizing segments associated with quantified interest levels 18 not meeting the defined interest threshold.
For example, the processing circuit 32 may be configured to implement any one or more of the following operations: modifying the video data 12 by retaining or emphasizing segments associated with quantified interest levels 18 meeting a defined interest threshold; discarding or de-emphasizing segments associated with quantified interest levels 18 not meeting the defined interest threshold; and determining an emphasis or de-emphasis modification to apply to the video data 12 for a given segment of the video data 12 in dependence on determining which one or more defined interest thresholds are met among a plurality of defined interest thresholds. In a non-limiting example of this last possibility, three interest level thresholds are defined: low, medium and high. The threshold definitions will vary in dependence on how the quantified interest levels 18 are defined, but in an example case assume that the universe of quantified interest level values spans from 1-4, and assume that the "low" interest threshold is set to 1, the medium interest threshold is set to 3, and the high interest threshold is set to 4. Segments having a quantified interest level 18 of 1 would be processed as low-interest video segments, segments having a quantified interest level 18 of 2 or 3 would be processed as medium-interest segments, and segments having a quantified interest level 18 of 4 would be processed as high-interest segments. Such processing may involve modifying the video data 12 to have one of three image quality or compression settings corresponding to the three interest level ranges in play.
Of course, more or fewer interest-level ranges may be used, either as a fixed
configuration or as a variable configuration that is set according to, e.g., user input and/or operating conditions. For example, the processing circuit 32 may evaluate the quality, quantity, and/or variety of the data comprising the physiological data 14 associated with given video data 12. As a non-limiting example, the processing circuit 32 may use a larger set of quantified interest levels when it has physiological data 14 corresponding to two or more types— such as when it has EEG data 98 and pupil data 82 or skin response data 118. Additionally, or alternatively, it may evaluate the physiological data 14 for quality, e.g., such as by assessing the degree of correlation when comparing the physiological data 14, or parameters derived therefrom, with the stored characteristics 46.
Still further, the quantization resolution used for determining interest levels, and the number of recognized interest level ranges used for controlling the video operation in question may also be varied in dependence on the richness or quality of the calibration performed by the user. For example, the apparatus 10 may be configured such that a basic or quick calibration process is offered as an initial mechanism for generating the stored characteristics 46, but also may offer a longer, more complex calibration routine that allows the apparatus 10 to discern interest levels with greater resolution and/or accuracy.
In any case, as regards to de-emphasis as one example of controlling a video operation,
"de-emphasizing" comprises, for example, any one or more of reducing the video quality, applying a higher level of video compression, reducing the image size or image resolution, applying a more aggressive time-lapse processing, reducing or eliminating video layers, at least in video data 12 that uses layers. Conversely, "emphasizing" comprises, for example, applying highlighting or other overlay data, decreasing the level of video compression, increasing an image size, saving or retaining a greater number of video layers, etc. Emphasizing and de- emphasizing also may comprise or include changing the image focus, image centering, etc.
Particularly, the processing circuit 32 in at least one embodiment is configured to reduce a file size or run time of the video data 12, as the controlled video operation in question, based on discarding or de-emphasizing segments of the video data 12 that are associated with quantified interest levels 18 not meeting a defined interest threshold. In at least one such embodiment the processing circuit 32 is configured to dynamically control an extent of file size or run time reduction in dependence on a configurable value. The configurable value may be input by the user— e.g., via a keyboard, pointer device, or touchscreen interface, that is included in the apparatus 10, or one that the apparatus 10 is otherwise configured to communicatively couple with. Additionally, the configurable value may be stored in the configuration data 44, although it still may be updated by the user.
However the reduction target is established, in at least one embodiment, the processing circuit 32 is configured to reduce a file size or run time of the video data 12 as the video operation in question, based on being configured to determine a reduction target indicating a desired reduction in the file size or the run time of the video data, set an interest level threshold in dependence on the reduction target, and remove segments associated with quantified interest levels 18 below the interest level threshold.
In this embodiment, and in other embodiments involving a modification of the video data 12, it should be understood that the apparatus 10 may operate on the video data 12, such that the video data 12 is altered, or the apparatus 10 may generate modified video data 24, such as shown in Fig. 1. The modified video data 24 represents a processed version of the video data 12 and is obtained in dependence on processing the video data 12 in dependence on the quantified interest levels 18 determined for the respective segments of the video data 12.
In any case, after removing the segments associated with quantified interest levels 18 that are below the defined interest level threshold, the processing circuit 32 shall be understood as having modified video data comprising the segments not removed— i.e., the remaining segments. In at least one embodiment, for one or more of these remaining segments of the video data 12, the processing circuit 32 is configured to set the value of a time lapse parameter for the remaining segment in dependence on the quantified interest level 18 associated with the remaining segment, and control a time lapse effect applied to the remaining segment according to the value of the time lapse parameter.
Thus, in the context of the above processing, a given interest level threshold is used to cull segments of the video data 12 that are deemed to be uninteresting. Say, for example, that four levels of interest are defined using numerical values in ascending order of interest. The culling threshold may be set at the second interest level value, such that only segments having a quantified interest level equal or higher than the second interest level remain after culling. Then, a finer gradation is applied to these remaining segments by more or less aggressively applying time-lapse to them, as a function of their associated quantified interest levels 18. For example, the most aggressive time-lapse affect can be applied to remaining segments of the second interest level, a less aggressive time-lapse affect applied to remaining segments of the third interest level, and no time-lapse applied to remaining segments of the highest, fourth interest level.
With or without time-lapsing the remaining segments, the processing circuit 32 in one or more embodiments is, for at least one of the segments remaining after culling, is configured to evaluate gaze data 20 to identify a region of interest 22 within video frames of the video data 12 comprised within the segment, and modify the video data 12 comprised within the segment to emphasize the region of interest 22. Emphasizing the region of interest 22 may be based on selectively focusing on the region of interest 22, defocusing the remaining area of the video frame, centering the image on the region of interest 22, adding overlay or other highlighting video data, etc. As noted before, the gaze data 20 is synchronous with the video data 12 and acquired for the human participant while engaged in the activity represented in the video data 12.
Thus, in at least one embodiment of the apparatus 10, the processing circuit 32 is configured to cull the least interesting segments and further to apply region-of-interest processing to one or more of the remaining segments. The region-of-interest processing may be reserved only for the remaining segments having the highest quantified interest levels 18 and/or may be varied as a function of the quantified interest levels 18 of the remaining segments.
Further, one or more embodiments of the apparatus 10 use region-of-interest processing with or without first culling the least interesting segments. Broadly, for one or more segments of the video data 12, the processing circuit 32 is configured to evaluate gaze data 20 to identify a region of interest 22 within video frames of the video data 12 comprised within the segment, and to modify the video data 12 comprised within the segment to emphasize the region of interest 22. As before, the gaze data 20 is synchronized with the video data 12 and acquired for the human participant while engaged in the activity represented in the video data 12.
Fig. 8 depicts a method 800 according to one embodiment and it shall be understood that the method 800 may be performed by the apparatus 10 through the fixed or programmatic configuration of the processing circuit 32, or by another circuit arrangement. The method 800 controls a video operation bearing on video data 12, responsive to determining the interest levels of a human participant engaged in an activity represented in the video data 12, and includes obtaining (Block 802) physiological data 14 acquired for the human participant while engaged in the activity. As noted, this obtaining operation may comprise reading or otherwise obtaining stored data, or may comprise receiving one or more real time or near real time data streams.
The method 800 further includes quantifying (Block 804) interest levels of the human participant with respect to successive segments of the video data 12, based on characterizing the physiological data 14 associated with each segment and thereby obtaining a quantified interest level 18 for the segment. Still further, the method 800 includes controlling (Block 806) the video operation, in dependence on the quantified interest levels 18 associated with the segments.
Fig. 9 depicts another embodiment, wherein the processing operations taught herein are implemented via a set of functional modules. The modules in question are, for example, implemented via the processing circuit 32 of the apparatus 10. One sees a first obtaining module configured to obtain the physiological data 14 in question— e.g., a file reader module and/or an interface module operative to receive incoming streaming data comprising one or more types of physiological sensor data. Further, one sees a second obtaining module 102 configured to obtain the video data 12, which module again may be a file reader module configured to read stored video files and/or an interface module configured to receive streaming video data. Still further, one sees a third obtaining module 104 configured to obtain the gaze data 20, which module again may be a file reader module configured to read stored gaze data and/or an interface module configured to receive streaming gaze data.
Fig. 9 further depicts a characterizing module 110, which evaluates the physiological data 14 and the gaze data 20 in synchronicity with the video data 12— i.e., for each given segment of video data 12, the characterizing module 110 evaluates the corresponding portions or segments of the time-aligned physiological data 14 and gaze data 20. The characterizing module 110 also uses the stored characteristics 46 to evaluate the physiological data 14 and/or the gaze data 20 associated with each segment of the video data 12, to thereby determine the quantified interest level 18 for the segment and any region(s) of interest 22 within the video data 12 comprised within the segment. The stored characteristics 46 comprise, for example, baseline EEG data and one or more further sets of EEG data corresponding to one or more heightened emotional states.
The characterizing module 110 provides its output— i.e., the quantified interest levels 18 determined by it— to a processing and control module 112. In turn, the processing and control module 112 controls a video operation bearing on the video data 12, in dependence on the quantified interest levels 18. Here, the use of "a video operation" in the singular shall be understood as encompassing one or more video operations, any number of acquisition and/or post-capture video operations may be controlled in dependence on the quantified interest levels.
In a further embodiment, Fig. 10 illustrates a method 1000, which may be understood as a more detailed implementation of the method 800. The method 1000 includes the operations of capturing the video data 12 (Block 1002), capturing the physiological data 14 (Block 1004), and capturing the gaze data 20 (Block 1006).
Processing further includes collating the various data— i.e., synchronizing or otherwise aligning, as needed, the physiological data 14 and the gaze data 20 with the video data 12 (Block 1008). Processing continues with partitioning the video data 12 into sections or segments, e.g., dividing the overall video data 12 uniformly into a series of segments of the same width— although a first or last segment may be smaller or larger than the others (Block 1010).
Additionally, dynamic segmenting may be used, such as by using smaller or larger segments in areas where the physiological data 14 is more dynamic, such as where in an initial run of processing it is observed that a number of the uniformly sized segments involve more than one interest level transition, or otherwise where it becomes difficult to identify the dominant interest level within a given segment.
Processing continues with associating segments of the physiological data 14 and gaze data 20 with corresponding segments of the video data 12 (Block 1012). These associations are used to determine the sections with important gaze locations, IGLs, or otherwise identify higher- interest segments of the video data 12 (Block 1014). Still further, processing continues with removing the low-interest segments (Block 1016), and setting a time lapse parameter, TLP, in dependence on the determined quantified interest levels (Block 1018). Further, where appropriate, the method 1000 includes adjusting the video data 12 to emphasize the IGLs (Block 1020), and processing continues with creating a final video— i.e., the aforementioned modified video data 24— which reflects the interest-level dependent processing performed in the method 1000.
By way of further example, consider the following four defined user states: a highly aroused state, a high cognitive load state, an anxious state, and a "flow state". These states may be seen via analysis or parameterization of the physiological data 14. Broadly, a camera is used to capture video of a user participating in an activity, with the footage collected being defined as the video data 12 at issue herein. In some embodiments, the user wears the camera and captures first-person footage. In alternate embodiments, the camera is separated from the user. Further, multiple cameras may be used, e.g., with each one capturing the same scene from a different angle. Further, the camera may be replaced by a system capable of storing streamed video, such as a video game system capturing video gameplay as it takes place.
Still further, an apparatus is configured to track the gaze of the user and provide data indicating that location in terms of the image field of view captured by the camera used to capture the video data 12. The location of the user's gaze within the spatial area defined by the video data 12 is referred to as the "gaze location". Further, one or more sensors capable of monitoring a physiological or mental signal of the user provides the physiological data 14. These sensors include any one or more of an EEG headset, a camera capturing either facial expressions or pupil response, a microphone configured for monitoring the user's voice, and one or more sensors configured to measure the skin conductance of the user.
In at least one embodiment, the apparatus 10 employs various algorithms, such as an algorithm to associate sections of the video data 12 with user states— as represented by quantified interest levels 18— and important gaze locations— as represented by determined regions of interest 22. This algorithm may be referred to as the "VA" algorithm. The
apparatus 10 implements a further algorithm, denoted as the "VP" algorithm, to prepare modified video data 24, as a shortened version of the video data 12. The modified video data 24 gives a higher level of prominence— or only includes— those sections of the video data 12 that were determined to be of higher interest to the user.
In the course of such processing, the apparatus 10 may use various look-up tables, such as a lookup table that stores the association between sensor readings and user state— e.g., a look- up table or LUT that maps characterized physiological data 14 to predefined numeric values representing different levels of user interest. The apparatus 10 also may use an "additional material" or "AM" LUT, which associates different additional materials with different user states or interest levels. Here, the additional materials comprise, for example, different video effects, which may be selectively overlaid into the video data 12, e.g., to emphasize IGLs and/or to highlight segments of higher interest.
In one implementation, the physiological data 14 and gaze data 20 are captured simultaneously with capturing the video data 12, and the apparatus 10 is configured to control the acquisition and collection so that all such data is synchronized, or at least so that all such data includes time-stamp information for use in post-capture synchronization. The apparatus 10 may be split into a first part that controls the aforementioned acquisition, and a second part that performs the below described post processing.
In any case, the VA algorithm associates segments of the video data 12 with user states. An example of how this is achieved in one or more embodiments is as follows: the video data 12 is divided into a number of sections or segments, each with equal length; for each section, the VA algorithm determines the user state that the user was in for the longest time, defined as the "SUS" or section user state. The VA algorithm further determines a section gaze location or "SGL" for the time period defined by each section. Here, an SGL is identified, for example, as being the tile(s) within the field-of-view corresponding to the longest-duration of detected user gaze direction, for the time period in question. The VA algorithm also may operate with a minimum gaze duration threshold, such that no SGL is detected if the gaze data for the video section in question does not indicate that the user looked in any particular direction for more than the minimum threshold of time.
Also, as earlier mentioned, it may be that a section of the video data 12, or subsequent sections of the video data 12 is determined to have an important gaze location or "IGL". For example, if the activity represented in the video data 12 is rock climbing, the climber in question may have looked intently at one or more rocks or other critical handholds during the climb. Similar instances occur during skiing, such as where the skier looks intently at an impending jump or other downhill obstacle. These gaze locations are tagged as IGLs by the VA Algorithm. As an example, the VA Algorithm detects cases where the SGL determined for a given succession of video sections is substantially the same over those sections, and the involved sections are also detected as having a heightened level of interest or focus. Here, the SGL(s) would be tagged as being an IGL. The VA algorithm at least temporarily the start and end times of the involved sections, together with the SUS and SGL, and any IGL tags.
Then, the VP algorithm takes the output from the VA algorithm and uses it to create a final video. For example, the starting length of the video data 12 is denoted as "SL". The user indicates the desired final length of the modified video data 24 as "FL". The user state LUT associates with each defined user state a defined level of interest or "LOI" that ranges from "0" to denote "least interesting" up to "10" to denote most interesting.
Depending on the ratio SL/FL, the VP algorithm removes segments from the video data 12. For example, if SF/FL is larger than a defined threshold, all segments of the video data 12 having an LOI of 0 are removed. If SF/LF is smaller than that threshold, or smaller than another defined threshold, segments of the video data 12 having an LOI of 0, 1, 2 or 3 are removed. Thus, the "bar" for retaining segments becomes increasingly aggressive in dependence on the desired amount of length reduction. Of course, multiple thresholds, each with a different associated number of LOIs for culling can be used by the apparatus 10.
For the segments of the video data 12 remaining after the above removals, each LOI is associated with one or more time lapse parameters or TLPs. An example of a TLP is the regularity of frames that are to be removed from the video. For example a more interesting segment of the video data 12 can have nine from each ten frames of video removed, whereas a less interesting segment can have nineteen of each twenty frames removed. As the remaining frames are displayed for the same duration, events in the more interesting video will appear to pass slower, whilst events in the less interesting video will appear to pass more quickly.
Where the remaining sections of the video data 12 contain an IGL, the VP algorithm adjusts the video such that the IGL has more prominent visibility. Examples of how this is achieved include: where the frame size of the modified video data 34 is less than the capture video data 12, the center of the image can be moved such that the IGL has a more central location; the image properties can be adjusted such that the IGL has a differentiated appearance, e.g., such as being brighter; additional materials can be used to highlight the IGL, such as by overlaying arrows or other graphics. Once any such highlighting is performed, the VP algorithm performs any further actions required to create a viewable file containing the modified video data 34 in viewable form.
Note that gaze data processing also may include object recognition techniques, such that the regions of interest 22 are based on user gaze data 20 and/or based on recognizing key objects within the video data 12. Further, the use of highlighting overlays or other such additional material can be used to differentiate not only regions of interest 22, but also to differentiate between video segments that are associated with different quantified interest levels 18.
Regardless of these variations and range of embodiments, it will be appreciated that final video data obtained according to the teachings herein will better reflect the user's actual levels of interest during participation in the activity represented in the final video data. Such processing thus results in more enjoyable videos, or videos that better reflect the experience of the recorded event. Further, such processing generally reduces video transmission bandwidth requirements and the potentially significant reductions in video file size and/or video length can, in the aggregate, provide significant storage benefits, wherein the user effectively archives what amounts to "highlight reels" that are automatically generated according to the teachings herein.
Notably, modifications and other embodiments of the disclosed invention(s) will come to mind to one skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the invention(s) is/are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of this disclosure. Although specific terms may be employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

CLAIMS What is claimed is:
1. A method (800) of controlling a video operation bearing on video data (12), responsive to determining interest levels of a human participant engaged in an activity represented in the video data (12), said method implemented by an apparatus (10) and comprising:
obtaining (802) physiological data (14) acquired for the human participant while engaged in the activity;
quantifying (804) interest levels of the human participant with respect to successive
segments of the video data (12), based on characterizing the physiological data (14) associated with each segment and thereby obtaining a quantified interest level (18) for the segment; and
controlling (806) the video operation, in dependence on the quantified interest levels (18) associated with the segments.
2. The method (800) of claim 1, wherein characterizing the physiological data (14) associated with each segment comprises evaluating the associated physiological data (14), or parameters derived therefrom, with respect to stored characteristics (46) corresponding to different interest level values.
3. The method (800) of claim 2, wherein characterizing the physiological data (14) associated with each segment further comprises identifying which stored characteristics (46) best correspond with the associated physiological data (14), or the parameters derived therefrom, and taking the corresponding interest level value as the quantified interest level (18) for the segment.
4. The method (800) of 2 or 3, wherein the quantified interest levels (18) correspond to different emotional states of the human participant, as explicitly or implicitly represented in the physiological data (14), and wherein the stored characteristics (46) for at least one interest level value are characteristic of any one or more of the following emotional states: a high arousal state, a high cognition state, an anxious state, and a flow state.
5. The method (800) of any of claims 2-4, wherein the stored characteristics (46) at least in part comprise physiological sensor data (82, 98, 108, 118), or parameters derived therefrom, as previously obtained for the human participant in a prior calibration session.
6. The method (800) of any of claims 1-5, further comprising, at least where the physiological data (14) associated with a given segment indicates changing interest levels within the segment, identifying a dominant interest level for the segment, where the dominant interest level is the interest level of longest detected duration within the segment, and taking the dominant interest level as the quantified interest level (18) for the segment.
7. The method (800) of any of claims 1-6, wherein the physiological data (14) comprises any one or more of:
electroencephalography, EEG, data (98) indicating brain activity levels of the human participant;
audio data (108) indicating vocalizations by the human participant;
supplemental video data (82) indicating any one or more of facial expressions and pupil responses of the human participant; and
skin conductance data (118) indicating skin conductance of the human participant.
8. The method (800) of any of claims 1-7, wherein controlling the video operation comprises controlling acquisition and/or processing of the video data (12).
9. The method (800) of any of claims 1-8, wherein controlling the video operation comprises at least one of:
modifying the video data (12) by retaining or emphasizing segments associated with quantified interest levels (18) meeting a defined interest threshold; discarding or de-emphasizing segments associated with quantified interest levels (18) not meeting a defined interest threshold; and
determining an emphasis or de-emphasis modification to apply to the video data (12) for a given segment of the video data (12) in dependence on determining which one or more defined interest thresholds are met among a plurality of defined interest thresholds.
10. The method (800) of any of claims 1-8, wherein controlling the video operation comprises reducing a file size or run time of the video data (12) by discarding or de-emphasizing segments associated with quantified interest levels (18) not meeting a defined interest threshold.
11. The method (800) of claim 10, further comprising dynamically controlling an extent of file size or run time reduction of the video data (12) in dependence on a configurable value.
12. The method (800) of any of claims 1-11, wherein controlling the video operation comprises:
determining a reduction target indicating a desired reduction in file size or run time for the video data (12);
setting an interest level threshold in dependence on the reduction target; and
removing segments associated with quantified interest levels (18) below the interest level threshold.
13. The method (800) of claim 12, further comprising, for one or more remaining segments, setting the value of a time-lapse parameter for the remaining segment in dependence on the quantified interest level (18) associated with the remaining segment, and controlling a time-lapse effect applied to the remaining segment according to the value of the time-lapse parameter.
14. The method (800) of claim 12 or 13, further comprising, for at least one remaining segment:
evaluating gaze data (20) to identify a region of interest (22) within video frames of the video data (12) comprised within the segment; and
modifying the video data (12) comprised within the remaining segment to emphasize the region of interest (22);
wherein the gaze data (20) is synchronized with the video data (12) and acquired for the human participant while engaged in the activity represented in the video data
(12).
15. The method (800) of any of claims 1-14, further comprising, for each of one or more segments of the video data (12):
evaluating gaze data (20) to identify a region of interest (22) within video frames of the video data (12) comprised within the segment; and
modifying the video data (12) comprised within the segment to emphasize the region of interest (22);
wherein the gaze data (20) is synchronized with the video data (12) and acquired for the human participant while engaged in the activity represented in the video data (12).
16. The method (800) of any of claims 1-15, wherein the physiological data (14) comprises stored physiological data (56) corresponding to stored video data (52) as said video data (12), and wherein obtaining the physiological data (14) comprises reading the stored physiological data (56) from a data store (50).
17. The method (800) of any of claims 1-15, wherein the video data (12) is a video data stream and the physiological data (14) comprises one or more physiological sensor data streams incoming in real-time in association with live capture of the video data (12) via an image acquisition apparatus (70) outputting the video data stream.
18. An apparatus (10) configured to control a video operation bearing on video data (12), responsive to determining interest levels of a human participant engaged in an activity represented in the video data (12), said apparatus (10) comprising:
communication interface circuitry (30) configured to obtain the video data (12) and
further configured to obtain physiological data (14) acquired for the human participant while engaged in the activity; and
a processing circuit (32) operatively associated with the communication interface
circuitry (30) and configured to:
quantify interest levels of the human participant with respect to successive
segments of the video data (12), based on characterizing the physiological data (14) associated with each segment and thereby obtaining a quantified interest level (18) for the segment; and
control the video operation, in dependence on the quantified interest levels (18) associated with the segments.
19. The apparatus (10) of claim 18, wherein the communication interface circuitry (30) is configured to communicatively couple the processing circuit (32) to one or more data stores (50) containing one or more data files (52, 56) that contain the video data (12) and the physiological data (14), and wherein the processing circuit (32) is configured to control the communication interface circuitry (30) obtain the physiological data (14) from the one or more data files (52, 56).
20. The apparatus (10) of claim 18, wherein the communication interface circuitry (30) is configured to interface with one or more sensing devices (80, 90, 100, 110) configured to provide the physiological data (14).
21. The apparatus (10) of claim 20, wherein the communication interface circuitry (30) is configured to interface with an image acquisition apparatus (70) configured to provide the video data (12).
22. The apparatus (10) of claim 21, wherein the video operation is an image acquisition operation bearing on acquisition of the video data (12) by the image acquisition apparatus (70), and wherein the processing circuit (32) is configured to control the image acquisition operation via signaling (26) output from the communication interface circuitry (30).
23. The apparatus (10) of any of claims 18-22, wherein the processing circuit (32) is configured to characterize the physiological data (14) associated with each segment by evaluating the associated physiological data (14), or parameters derived therefrom, with respect to stored characteristics (46) corresponding to different interest levels.
24. The apparatus (10) of claim 23, wherein the processing circuit (32) is configured to identify which stored characteristics (46) best correspond with the associated physiological data (14), or the parameters derived therefrom, and to take the corresponding interest level value as the quantified interest level (18) for the segment.
25. The apparatus (10) of claim 23 or 24, wherein the quantified interest levels (18) correspond to different emotional states of the human participant, as explicitly or implicitly represented in the physiological data (14), and wherein the stored characteristics (46) for at least one interest level value are characteristic of any one or more of the following emotional states: a high arousal state, a high cognition state, an anxious state, and a flow state.
26. The apparatus (10) of any of claims 23-25, wherein the stored characteristics (46) at least in part comprise physiological sensor data (82, 98, 108, 118), or parameters derived therefrom, as previously obtained for the human participant in a prior calibration session.
27. The apparatus (10) of any of claims 18-26, wherein, at least where the physiological data (14) associated with a given segment indicates changing interest levels within the segment, the processing circuit (32) is configured to identify a dominant interest level for the segment, where the dominant interest level is the interest level of longest detected duration within the segment, and take the dominant interest level as the quantified interest level (18) for the segment.
28. The apparatus (10) of any of claims 18-27, wherein the physiological data (14) comprises any one or more of:
electroencephalography, EEG, data (98) indicating brain activity levels of the human participant;
audio data (108) indicating vocalizations by the human participant;
supplemental video data (82) indicating any one or more of facial expressions and pupil responses of the human participant; and
skin conductance data (118) indicating skin conductance of the human participant.
29. The apparatus (10) of any of claims 18-28, wherein, as said video operation, the processing circuit (32) is configured to control acquisition and/or processing of the video data (12).
30. The apparatus (10) of any of claims 18-29, wherein the processing circuit (32) is configured to perform at least one of the following operations as said video operation:
modify the video data (12) by retaining or emphasizing segments associated with
quantified interest levels (18) meeting a defined interest threshold; discard or de-emphasize segments associated with quantified interest levels (18) not meeting a defined interest threshold; and
determine an emphasis or de-emphasis modification to apply to the video data (12) for a given segment of the video data (12) in dependence on determining which one or more defined interest thresholds are met among a plurality of defined interest thresholds.
31. The apparatus (10) of any of claims 18-30, wherein the processing circuit (32) is configured to reduce a file size or run time of the video data (12), as said video operation, by discarding or de-emphasizing segments associated with quantified interest levels (18) not meeting a defined interest threshold.
32. The apparatus (10) of claim 31, wherein the processing circuit (32) is configured to dynamically control an extent of file size or run time reduction in dependence on a configurable value.
33. The apparatus (10) of any of claims 18-32, wherein the processing circuit (32) is configured to reduce a file size Or run time of the video data (12) as said video operation, based on being configured to:
determine a reduction target indicating a desired reduction in the file size or the run time of the video data (12);
set an interest level threshold in dependence on the reduction target; and
remove segments associated with quantified interest levels (18) below the interest level threshold.
34. The apparatus (10) of claim 29, wherein, for one or more remaining segments of the video data (12), the processing circuit (32) is configured to set the value of a time lapse parameter for the remaining segment in dependence on the quantified interest level (18) associated with the remaining segment, and control a time lapse effect applied, to the remaining segment according to the value of the time lapse parameter.
35. The apparatus (10) of claim 29 or 30, wherein, for at least one of the remaining segments, the processing circuit (32) is configured to:
evaluate gaze data (20) to identify a region of interest (22) within video frames of the video data (12) comprised within the segment; and
modify the video data (12) comprised within the Segment to emphasize the region of interest (22);
wherein the gaze data (20) is synchronous with the video data (12) and acquired for the human participant while engaged in the activity represented in the video data (12).
36. The apparatus (10) of any of claims 18-31, wherein, for one or more segments of the video data (12), the processing circuit (32) is configured to:
evaluate gaze data (20) to identify a region of interest (22) within video frames of the video data (12) comprised within the segment; and
modify the video data (12) comprised within the segment to emphasize the region of interest (22); wherein the gaze data (20) is synchronized with the video data (12) and acquired for the human participant while engaged in the activity represented in the video data (12).
37. The apparatus (10) of any of claims 18-32, wherein the video data (12) is stored video data (52) and wherein the physiological data (14) is stored physiological data (56), and wherein the processing circuit (32) is configured to obtain the physiological data (14) by re ding the stored physiological data (56) from a data store (50).
38. The apparatus (10) of any of claims 18-33, wherein the communication interface circuitry (30) is configured to receive a video data stream as the video data (12), and to receive one or more physiological sensor data streams incoming in real-time in association with live capture of the video data, as the physiological data (14).
PCT/SE2014/051368 2014-11-18 2014-11-18 Method and apparatus for video processing based on physiological data WO2016080873A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SE2014/051368 WO2016080873A1 (en) 2014-11-18 2014-11-18 Method and apparatus for video processing based on physiological data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SE2014/051368 WO2016080873A1 (en) 2014-11-18 2014-11-18 Method and apparatus for video processing based on physiological data

Publications (1)

Publication Number Publication Date
WO2016080873A1 true WO2016080873A1 (en) 2016-05-26

Family

ID=52144805

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2014/051368 WO2016080873A1 (en) 2014-11-18 2014-11-18 Method and apparatus for video processing based on physiological data

Country Status (1)

Country Link
WO (1) WO2016080873A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10478495B2 (en) 2012-10-01 2019-11-19 Mayo Foundation For Medical Education And Research Methods for treating cancer using nanoparticle complexes of paclitaxel, cetuximab, and albumin

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4931865A (en) * 1988-08-24 1990-06-05 Sebastiano Scarampi Apparatus and methods for monitoring television viewers
EP1533784A2 (en) * 2003-11-20 2005-05-25 Sony Corporation Playback mode control device and method
WO2006086439A2 (en) * 2005-02-09 2006-08-17 Louis Rosenberg Automated arrangement for playing of a media file
US20070104369A1 (en) * 2005-11-04 2007-05-10 Eyetracking, Inc. Characterizing dynamic regions of digital media data
WO2012170260A1 (en) * 2011-06-10 2012-12-13 Aliphcom Device control using sensory input
EP2562684A2 (en) * 2011-08-26 2013-02-27 Toyota Motor Engineering & Manufacturing North America, Inc. Segmenting spatiotemporal data based on user gaze data
US20140043224A1 (en) * 2012-08-08 2014-02-13 Pixart Imaging Inc. Input Device and Host Used Therewith
US20140161421A1 (en) * 2012-12-07 2014-06-12 Intel Corporation Physiological Cue Processing
US20140198034A1 (en) * 2013-01-14 2014-07-17 Thalmic Labs Inc. Muscle interface device and method for interacting with content displayed on wearable head mounted displays
WO2014144456A1 (en) * 2013-03-15 2014-09-18 473 Technology, Inc. System and methods for redeeming user activity level for virtual currency
WO2014183124A1 (en) * 2013-05-10 2014-11-13 Amiigo, Inc. Platform for generating sensor data
US20140336473A1 (en) * 2013-01-24 2014-11-13 Devon Greco Method and Apparatus for Encouraging Physiological Change Through Physiological Control of Wearable Auditory and Visual Interruption Device
US8902970B1 (en) * 2010-12-01 2014-12-02 Amazon Technologies, Inc. Altering streaming video encoding based on user attention

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4931865A (en) * 1988-08-24 1990-06-05 Sebastiano Scarampi Apparatus and methods for monitoring television viewers
EP1533784A2 (en) * 2003-11-20 2005-05-25 Sony Corporation Playback mode control device and method
WO2006086439A2 (en) * 2005-02-09 2006-08-17 Louis Rosenberg Automated arrangement for playing of a media file
US20070104369A1 (en) * 2005-11-04 2007-05-10 Eyetracking, Inc. Characterizing dynamic regions of digital media data
US8902970B1 (en) * 2010-12-01 2014-12-02 Amazon Technologies, Inc. Altering streaming video encoding based on user attention
WO2012170260A1 (en) * 2011-06-10 2012-12-13 Aliphcom Device control using sensory input
EP2562684A2 (en) * 2011-08-26 2013-02-27 Toyota Motor Engineering & Manufacturing North America, Inc. Segmenting spatiotemporal data based on user gaze data
US20140043224A1 (en) * 2012-08-08 2014-02-13 Pixart Imaging Inc. Input Device and Host Used Therewith
US20140161421A1 (en) * 2012-12-07 2014-06-12 Intel Corporation Physiological Cue Processing
US20140198034A1 (en) * 2013-01-14 2014-07-17 Thalmic Labs Inc. Muscle interface device and method for interacting with content displayed on wearable head mounted displays
US20140336473A1 (en) * 2013-01-24 2014-11-13 Devon Greco Method and Apparatus for Encouraging Physiological Change Through Physiological Control of Wearable Auditory and Visual Interruption Device
WO2014144456A1 (en) * 2013-03-15 2014-09-18 473 Technology, Inc. System and methods for redeeming user activity level for virtual currency
WO2014183124A1 (en) * 2013-05-10 2014-11-13 Amiigo, Inc. Platform for generating sensor data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A. GERBERA ET AL.: "An affective circumplex model of neural systems subserving valence, arousal, and cognitive overlay during the appraisal of emotional faces", NEUROPSYCHOLOGIA, vol. 46, no. 8, July 2008 (2008-07-01), pages 2129 - 2139
CARLSON, NEIL R.: "Physiology of Behavior", 2012, PEARSON PUBLISHING
M. MURUGAPPAN; N. RAMACHANDRAN; Y. SAZALI: "Classification of human emotion from EEG using discrete wavelet transform,", BIOMEDICAL SCIENCE AND ENGINEERING, vol. 3, 2010, pages 390 - 396
X. WANGA; N. DAN; L. BAO-LIAN: "Emotional state classification from EEG data using machine learning approach,", NEUROCOMPUTING, vol. 129, 10 April 2014 (2014-04-10), pages 94 - 106

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10478495B2 (en) 2012-10-01 2019-11-19 Mayo Foundation For Medical Education And Research Methods for treating cancer using nanoparticle complexes of paclitaxel, cetuximab, and albumin

Similar Documents

Publication Publication Date Title
US9712736B2 (en) Electroencephalography (EEG) camera control
US8125314B2 (en) Distinguishing between user physical exertion biometric feedback and user emotional interest in a media stream
JP6282769B2 (en) Engagement value processing system and engagement value processing device
WO2019120029A1 (en) Intelligent screen brightness adjustment method and apparatus, and storage medium and mobile terminal
JP5751648B2 (en) Method and apparatus for measuring video quality
US20120243751A1 (en) Baseline face analysis
KR20180028931A (en) System and method for processing video content based on emotional state detection
US9017079B2 (en) Information notification apparatus that notifies information of data of motion
CN101286196A (en) Image storage processing apparatus and method, image search apparatus, and image search method and program
US11141095B2 (en) Method and system for detecting concussion
CN104581127B (en) Method, terminal and head-worn display equipment for automatically adjusting screen brightness
US20240249554A1 (en) Method and device for determining visual fatigue occurrence section
CN110569826A (en) Face recognition method, device, equipment and medium
WO2023045626A1 (en) Image acquisition method and apparatus, terminal, computer-readable storage medium and computer program product
JP2014087464A (en) Skin evaluation method and skin evaluation device
EP3956748B1 (en) Headset signals to determine emotional states
KR102081752B1 (en) Video evaluation system and viedo evaluation method
WO2016080873A1 (en) Method and apparatus for video processing based on physiological data
JP6198530B2 (en) Image processing device
JP2017176580A (en) Emotion control device, emotion control method, and program
US11947722B2 (en) Devices and headsets
WO2024131407A1 (en) Facial expression simulation method and apparatus, device, and storage medium
Jayawardena et al. Automated filtering of eye gaze metrics from dynamic areas of interest
CN112488647A (en) Attendance system and method, storage medium and electronic equipment
JP2007074503A (en) Dynamic image editing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14816433

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14816433

Country of ref document: EP

Kind code of ref document: A1