WO1997042601A1 - Procede multimedia interactif integre - Google Patents
Procede multimedia interactif integre Download PDFInfo
- Publication number
- WO1997042601A1 WO1997042601A1 PCT/US1997/007359 US9707359W WO9742601A1 WO 1997042601 A1 WO1997042601 A1 WO 1997042601A1 US 9707359 W US9707359 W US 9707359W WO 9742601 A1 WO9742601 A1 WO 9742601A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- frame
- camera
- clip
- environment
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 449
- 230000008569 process Effects 0.000 title claims description 132
- 230000002452 interceptive effect Effects 0.000 title claims description 38
- 238000004519 manufacturing process Methods 0.000 claims abstract description 300
- 230000007704 transition Effects 0.000 claims abstract description 248
- 230000033001 locomotion Effects 0.000 claims abstract description 123
- 230000004044 response Effects 0.000 claims abstract description 38
- 230000002269 spontaneous effect Effects 0.000 claims abstract 2
- 230000000007 visual effect Effects 0.000 claims description 186
- 239000000872 buffer Substances 0.000 claims description 67
- 230000000694 effects Effects 0.000 claims description 40
- 230000008859 change Effects 0.000 claims description 39
- 238000004422 calculation algorithm Methods 0.000 claims description 27
- 238000003860 storage Methods 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 25
- 230000003993 interaction Effects 0.000 claims description 25
- 238000009877 rendering Methods 0.000 claims description 22
- 238000011068 loading method Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 11
- 230000001360 synchronised effect Effects 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 239000003086 colorant Substances 0.000 claims description 5
- 230000001934 delay Effects 0.000 claims description 5
- 238000003780 insertion Methods 0.000 claims description 5
- 230000037431 insertion Effects 0.000 claims description 5
- 238000005094 computer simulation Methods 0.000 claims description 4
- 238000004513 sizing Methods 0.000 claims description 4
- 238000007689 inspection Methods 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims 3
- 238000010348 incorporation Methods 0.000 claims 1
- 230000007613 environmental effect Effects 0.000 abstract description 14
- 230000003287 optical effect Effects 0.000 abstract description 13
- 239000011230 binding agent Substances 0.000 description 45
- 238000005516 engineering process Methods 0.000 description 33
- 238000010586 diagram Methods 0.000 description 27
- 230000006835 compression Effects 0.000 description 24
- 238000007906 compression Methods 0.000 description 24
- 238000004091 panning Methods 0.000 description 24
- 239000013598 vector Substances 0.000 description 24
- 230000008901 benefit Effects 0.000 description 22
- 230000009471 action Effects 0.000 description 18
- 230000014509 gene expression Effects 0.000 description 17
- 238000013459 approach Methods 0.000 description 16
- 238000005259 measurement Methods 0.000 description 16
- 239000000047 product Substances 0.000 description 13
- 230000002441 reversible effect Effects 0.000 description 12
- 239000010454 slate Substances 0.000 description 12
- 238000012876 topography Methods 0.000 description 11
- 238000007654 immersion Methods 0.000 description 9
- 230000004913 activation Effects 0.000 description 8
- 238000001994 activation Methods 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 8
- 238000012937 correction Methods 0.000 description 8
- 230000001965 increasing effect Effects 0.000 description 8
- 238000012360 testing method Methods 0.000 description 8
- 238000013461 design Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 7
- 238000012552 review Methods 0.000 description 7
- 230000001960 triggered effect Effects 0.000 description 7
- 230000006837 decompression Effects 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 230000006399 behavior Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 230000002708 enhancing effect Effects 0.000 description 4
- 238000013213 extrapolation Methods 0.000 description 4
- 239000003973 paint Substances 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 238000009825 accumulation Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- OSUHJPCHFDQAIT-UHFFFAOYSA-N ethyl 2-{4-[(6-chloroquinoxalin-2-yl)oxy]phenoxy}propanoate Chemical compound C1=CC(OC(C)C(=O)OCC)=CC=C1OC1=CN=C(C=C(Cl)C=C2)C2=N1 OSUHJPCHFDQAIT-UHFFFAOYSA-N 0.000 description 3
- 230000001976 improved effect Effects 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 101000686909 Homo sapiens Resistin Proteins 0.000 description 2
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 241000406668 Loxodonta cyclotis Species 0.000 description 2
- 102100024735 Resistin Human genes 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000001364 causal effect Effects 0.000 description 2
- 239000013078 crystal Substances 0.000 description 2
- 230000009849 deactivation Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 230000008570 general process Effects 0.000 description 2
- 230000008676 import Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000007257 malfunction Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000012092 media component Substances 0.000 description 2
- 230000005055 memory storage Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000010422 painting Methods 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 230000035899 viability Effects 0.000 description 2
- 239000011800 void material Substances 0.000 description 2
- 244000025254 Cannabis sativa Species 0.000 description 1
- 241000577979 Peromyscus spicilegus Species 0.000 description 1
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 238000011000 absolute method Methods 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000000881 depressing effect Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000002592 echocardiography Methods 0.000 description 1
- 238000011049 filling Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 238000005184 irreversible process Methods 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 239000003921 oil Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000036316 preload Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000008093 supporting effect Effects 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234318—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into objects, e.g. MPEG-4 objects
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/034—Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/102—Programmed access in sequence to addressed parts of tracks of operating record carriers
- G11B27/105—Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
- G11B27/32—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording on separate auxiliary tracks of the same or an auxiliary record carrier
- G11B27/327—Table of contents
- G11B27/329—Table of contents on a disc [VTOC]
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/34—Indicating arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47205—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/78—Television signal recording using magnetic recording
- H04N5/782—Television signal recording using magnetic recording on tape
- H04N5/783—Adaptations for reproducing at a rate different from the recording rate
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B2220/00—Record carriers by type
- G11B2220/90—Tape-like record carriers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/022—Electronic editing of analogue information signals, e.g. audio or video signals
- G11B27/024—Electronic editing of analogue information signals, e.g. audio or video signals on tapes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/80—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
- H04N9/804—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components
- H04N9/8042—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components involving data reduction
Definitions
- This invention relates to an improved process for the creation of interactive multimedia environments, called Productions, which use video annotated with position information in the creation and navigation of a virtual reality environment; and more particularly, it incorporates all stages of development of a Production from concept inception to the final delivered product.
- Interactive multimedia is a vague term which generally refers to a production playable on a computer in which multiple types of media are present, possibly including text, sound, graphics, animation and video. Such productions are contrasted to older style computer programs which are limited to text or to text and graphics. Hence the "multimedia” component ofthe name. Further, such productions allow for the course of the production to be affected or controlled to some extent by the user, hence the "interactive" component ofthe name.
- Products claiming to be interactive multimedia products may contain widely varying degrees of interactivity and multimedia. On one extreme a production may be full of sound, still pictures, animation and other video clips but offer the user very little in the way of interaction or control. On another extreme the production may be highly interactive with the outcome and course of the production greatly affected by user interaction but have little to offer in the way of video, animation or sound.
- virtual reality describes a subset of interactive multimedia productions that let the user navigate and interact with a three- dimensional computer-generated environment in real time.
- Such productions have three defining elements: interaction, 3-D graphics, and immersion.
- 3-D graphics lets the user see the environment.
- Real time updates to the graphics being displayed in reaction to the user's changing point of view contribute to the feeling of presence or immersion in the environment.
- the degree of immersion depends on the production.
- Some systems with motion sensing video helmets and glove input devices have a higher degree of immersion than 3-D productions where the 3-D graphics are displayed on a computer monitor and input is through a keyboard or mouse.
- Panoramic video uses computer-digitized and stored photos of real -world scenes which have been mapped into 360-degree panoramas enabling then to be presented on a 2-D screen without the optical distortion so common with photographic techniques.
- Panoramic video technology does not permit on-the- fly rendering of the environment. Rather, the user is permitted some degree of navigation within the previously stored panorama because the panoramic picture captures much more video information than can be placed on the display at one time.
- the computer can pan around the axis of the panorama, allowing the user a 360-degree turning motion in the horizontal plane. Depending on the viewer characteristics and the vertical size of the panorama, the viewer may be allowed some vertical panning as well.
- the developer of the panoramic video production may create some points in each panorama where the user can "jump" or transition to the next panorama.
- Panoramic video techniques create a sense of presence in the environment by allowing users to control, within limits, their point of view, to navigate to varying degrees, and to interact with a scene or environment perceived to be three-dimensional although presented on a two-dimensional computer or video display device.
- Panoramic video technologies also allow for the inco ⁇ oration of rendered scenes, but do not use the traditional virtual reality technique of computing the environment in real time in response to the user's changing perspective. Instead they allow the user some degree of navigation by panning the display left or right around the panorama in a complete circle, zooming in and out on user-selected areas of the displayed panorama. Productions made with panoramic video techniques may even allow the user to shift position to other view points at predefined hotspots where the developer has linked panoramas. As in computer generated virtual environments, panoramic video allows the user to interact with predefined hotspots which may generate dramatic video or audio responses and the like.
- the panoramic video technology also includes the ability to define hotspots on the panoramic and link such hotspots to views of an object. For example, if a statue is in a defined hotspot and that hotspot is clicked on, the activated hotspot can cause a special window to open on the screen, if the producer has inco ⁇ orated it into the work, the window may display an enlarged picture of the object which image can be rotated 360 degrees horizontally and almost 360 degrees vertically. However the object being examined is removed from the environment when it is examined and its degree of horizontal and vertical rotation is not related to the perspective ofthe viewer in the panoramic environment.
- Surround Video has the additional capability of allowing the multimedia maker to overlay scenes which have been shot against a "blue screen" on top of the panoramic video such as an overlaid video image of a narrator to a tour of the scene presented by the panorama.
- Surround Video has no capability of changing the scene in response to movement ofthe viewer in the panoramic environment.
- panoramic video Despite the advantages of panoramic video, its limitations limit its use in highly interactive productions. For example, user viewpoints, and transitions between viewpoints in panoramic video, are limited to those that have been previously defined by the developer and may only be to or from a position at which the currently viewed panorama or another panorama has been produced. Further, due to cost and storage issues it is impractical to generate a different panoramic video for every foot of space along the viewer's navigation paths through the environment. This limitation can introduce a perceived navigational discontinuity of the viewed environment and can detract from the feeling of presence in that environment. panoramas are very large images and require a significant amount of computer storage space.
- the rotatable objects in QuickTime VR must capture the screen to be viewed and, therefore, are not integrated into the environment in a way which enhances the viewer's immersion ofthe environment.
- QuickTime VR there is no way of adding objects into the environment which were not there when it was filmed, or removing objects from the environment which were there when filmed (other than simply filming the environment both with and without the object and replacing one panorama with the other at some point in the production). This limits the viewer's ability to interact with the environment.
- the rotatable objects are also storage intensive as they may require 500 views of the object to create.
- the blue screen techniques available in Surround Video which are likely to be available in later versions of QuickTime VR, allow some integration of objects into the environment, there is no simple way provided for the object to be examined by the viewer from various perspectives.
- Panoramic video has an advantage over rendered virtual reality systems in that storage and retrieval of pre-created panoramic images is less computationally demanding that on the fly recalculation of rendered scenes. Thus scenes which are not practical to render can be photographed.
- use of digitized pictures with high quality resolution and color places increased demands on the computer's storage capacity and transfer rates.
- the sense of immersion provided by panoramic video suffers in comparison to true virtual reality productions because of its significantly lower degree of navigational freedom.
- the present invention employs video digital compression technology to reduce storage demands.
- NTSC video has 480 lines per video frame and 640 pixels per line.
- Working with 8-bit color then requires 9.2 MB of raw data to represent one second of non-compressed video; 24-bit color requires 27.6 MB of raw data per second of video.
- a one-hour video presentation would require 99.5 GB.
- the multimedia PC/CD-ROM and related industries adopt a digital video compression and decompression standard.
- MPEG-1 standard is found in the publication "International Standard ISO/IEC 1 1 172, "Information technology - coding of motion pictures and associated audio for digital storage media at up to about 1.4 Mbits/s - Part 2: Video” which is inco ⁇ orated herein by reference.
- the MPEG-1 standard is discussed in inventor's co-pending U.S. patent applications filed on February 19, 1997 as serial numbers 08/802,870 and 08/801,254, which are inco ⁇ orated herein by reference and are referred to herein as the "Special Effects Applications.”
- the coded representation defined in Part 2 of MPEG-1 achieves a high compression ratio while preserving good picture quality.
- the algorithm is not lossless as the exact pel values are not preserved during coding.
- the choice of techniques is based on the need to balance a high picture quality and compression ratio with the requirement to make possible random access to the coded bit stream.
- a number of techniques are used to achieve a high compression ratio. The first is to select an appropriate spatial resolution for the signal. The algorithm then uses block-based motion compensation to reduce the temporal redundancy. Motion compensation is used for causal prediction of the current picture from a previous picture, for non-causal prediction of the current picture from a future picture, or for inte ⁇ olative prediction from past and future pictures. Motion vectors are defined for each 16-pel by 16-pel region of the picture. The difference signal, the prediction error, is further compressed using a discrete cosine transform (DCT) to remove spatial correlation before it is quantized in an irreversible process that discards the less important information. Finally, the motion vectors arc combined with the DCT information, and coded using variable length codes.
- DCT discrete cosine transform
- MPEG-1 uses three main picture types. Intra-coded pictures (I-Pictures) are coded without reference to other pictures. They provide access points to the coded sequence where decoding can begin, but arc coded with only a moderate compression ratio. Predictive coded pictures (P-Pictures) are coded more efficiently using motion compensated prediction from a past intra or predictive coded picture and are generally used as a reference for further prediction. Bidirectionally-predictive coded pictures (B-Pictures) provide the highest degree of compression but require both past and future reference pictures for motion compensation. Bidirectionally-predictive coded pictures are never used as reference for prediction. The organization ofthe three picture types in a sequence is very flexible. The choice is left to the encoder and will depend on the requirements of the application. The fourth picture type, the D-picture is provided to allow a simple, but limited quality, fast-forward playback mode.
- Part 2 (relative to video) of the MPEG-1 standard specifies a syntax for a coded video bit stream.
- This syntax contains six layers, each of which either supports a signal processing or a system function; the layers are hierarchical with the highest layer being the broadest in scope:
- the MPEG-1 video bit stream syntax follows the hierarchical structure of the layers discussed above.
- a sequence is the top level of video coding. It begins with a sequence header which defines important parameters needed by the decoder. The sequence header is followed by one or more groups of pictures. Groups of pictures, as the name suggests, consist of one or more individual pictures. The sequence may contain additional sequence headers. A sequence is terminated by a sequence end code. MPEG-1 allows considerable flexibility in specifying application parameters such as bit rate, picture rate, picture resolution, and picture aspect ratio. These parameters are specified in the sequence header. Following the Sequence Layer is the Group of Pictures Layer. Two distinct picture orderings exist, the display order and the bit stream order (as they appear in the video bit stream).
- a group of pictures is a set of pictures which are contiguous in display order.
- a group of pictures must contain at least one I-picture. This required picture may be followed by any number of I and P-pictures. Any number of B-pictures may be interspersed between each pair of I or P-pictures, and may also precede the first I-picture.
- Property 1 A group of pictures, in bitstream order, must start with an I-picture and may be followed by any number of I, P, or B-pictures in any order.
- Property 2. A group of pictures must begin, in display order, with an I or a B- picture. and must end with an I or a P-picture. The smallest group of pictures consists of a single I-picture whereas the largest size is unlimited.
- the GOP header contains a Time Code. This encodes the same information as the SMPTE time code discussed below.
- the time code can be broken down into six fields: FIELD BITS VALUES
- the Time Code refers to the first picture in the GOP in display order (as opposed to bitstream order).
- the SMPTE time code is included in the MPEG-1 standard to provide a video time identification to applications.
- the SMPTE time code present in an MPEG video stream may be created by the encoder or copied into the stream from the previously existing time code associated with the corresponding analog video frame before its digitizing and compression.
- the time code in the MPEG-1 data stream is used in the present invention to correlate camera tracking data with the resulting compressed digital video frames.
- time codes are well known in motion picture and video production. Generally time codes are electronic signals recorded on film or video tape. In video the time code is synchronized to the accompanying video signal. The pu ⁇ ose of time code is to uniquely identify each frame of video on a video tape or other video recording medium. This is done by assigning a number to each frame of video in an HOURS: MINUTES: SECONDS: FRAMES format.
- Time codes are accurately phase-locked to the video signals with which the codes are to be used. This is necessary to insure that each time code frame is properly timed with respect to the video frame it identifies. Production Authoring and Editing Tools.
- Map-based authoring and editing tools are commonly found in a genre of computer games having a maze metaphor.
- the general pu ⁇ ose of such map editors is to allow the user (meaning, as used here, the game developer) to "draw” the world of the game from a top-down view and to place “objects” within that world.
- objects can be walls, doors, monsters, foods, potions, or any other item that may be useful in creating a fantasy venue for the game being authored.
- map-based authoring tools for representing a correlation between a location on the map and a particular frame of video in a particular clip of video such as will be found in the present invention.
- the prior art teaches that the amount of motion video in a production is inversely proportional to the degree of interactivity of the production. It is against this trend that the present invention is directed; that is, a production made by the process of the preferred embodiment of the present invention will have mostly video as the source media but will still have a high degree of user interaction. While there is an abundance of tools and techniques for rendering of images and editing suites for creating
- Hotspots are but one of numerous ways to permit user participation in an interactive multimedia production. Hotspots allow the user to interact with objects and locations in the environment. As contrasted with receptacles which are discussed below, a hotspot does not require a graphical overlay on the scene.
- a hotspot is a defined location in a video frame. That is a location already present in the scene with some feature of interest to the player.
- a hotspot is usually defined by a human video editor or programmer who describes the location of the relevant feature as an area of the screen, where, upon acting thereon with a pointing device, such as clicking with a mouse, there is initiated the occurrence of a programmed action.
- the programmed action can be anything that can be caused to happen by a computer program.
- Hotspots have x and y dimensions and, in technologies preceding the current invention, a location on the screen that remained in a fixed position on the screen and relative to the image on the screen. It is important to note that hotspots in panoramic video productions have a fixed position relative to the panoramic frame rather than to a fixed position on the screen. Further, in true virtual reality productions the hotspot is a characteristic ofthe object rather than a screen or frame position.
- the means for moving the hotspot location in a video environment was simply not needed because the video used in multimedia productions was either dramatic video which was not navigable or was a single still frame used as a stationary bitmap.
- navigable spatial video a need has arisen for means to locate the hotspot on multiple frames of video and permit movement of the hotspot from frame to frame in coordination with the changing location of environmental features in the environment.
- Region hotspots Old type stationary hotspots which are defined by the static region of the display which they occupy are referred to herein as "region hotspots" to distinguish them from the hotspots of the invention which track environmental features from frame to frame. Region hotspots have some uses even in spatial video productions created according to the invention, especially in areas of the display outside the portion of the display in which the spatial video is running.
- Receptacles are another way to permit user participation in an interactive multimedia production.
- a receptacle is a location in a video scene into which or upon which graphical overlays or graphical "objects" may be placed.
- Receptacles in the prior art have a size and a position which are defined relative to absolute position and measurement in the scene for which the receptacle is defined.
- prior art receptacles are not adequate.
- a spatial video receptacle has a position, an angle of view, and a scale factor, all of which will vary from frame to frame in the spatial video.
- the position of a receptacle is recorded relative to the dimensions of each video frame in which it appears.
- the scale factor is a dimensionless number that will determine the ultimate size of an object rendered into the receptacle.
- Use of a scale factor allows objects of different absolute sizes to be placed in the same receptacle and retain their relative sizes while the actual size of the object image will increase or decrease depending on the receptacle scale factor. Further, to allow a receptacle to "hold" objects of different absolute sizes, the actual area of the display taken by the object in the receptacle will vary depending on the absolute size of the object and the receptacle scale factor and will be different for different sized objects.
- the angle of view is measured in degrees and allows different views (faces) of an object to be rendered into a receptacle when it is seen from different points of view in different video frames. For example, suppose a spatial clip recorded the view of moving down a corridor and passing by a table. Suppose the developer chose to render an object so that it appeared to be sitting on the table. In order for the rendered object to appear natural, it would have to grow in size as the camera gets nearer to the table. The rendered object would also have to move in each frame just as the image of the table moves in each frame. Finally the object would have to be rendered from different angles of view, just as the table is seen from different angles of view as one passes it.
- hotspots allow the user to interact with objects in the environment. Simply by clicking the mouse on the appropriate spot, usually on the object, the user can "pick-up" and examine the object, rotate the object, and view it from different angles.
- a special window opens on the screen, the object appears separated from the navigable environment, such as in a separate window, and you can rotate the image 360 degrees horizontally, and a lesser amount vertically.
- the points-of-view are all dependent on what has been photographed or video taped and then linked into the scene of the environment.
- Both Surround Video and QuickTime VR use panoramic video rather than rendered bitmaps. As discussed above, both of these systems allow for the use of a sequence of still panoramas to simulate movement. However, hotspots and receptacles must be defined manually in each panorama. Thus, like other prior art techniques, the panoramic still video technologies use individually defined hotspots and receptacles in fixed positions on the panoramic bitmap. Using the techniques of the prior art it was impossible to have objects (whose source media can be any media, not just photographs or video clips) placed into receptacles, and scaled and tracked across a large number of video frames during movement of both/either the perspective of the viewer and/or the scene (object) being viewed.
- pu ⁇ oses For illustrative pu ⁇ oses, assume a scenario where a motorcyclist, doing stunts with his motorcycle, is jumping off the elevated end of an inclined ramp. Further, for illustrative pu ⁇ oses, assume it is desired for an interactive multimedia participant viewer to be able to place a miniature guardian angel on the motorcyclist's shoulder just as the cyclist leaves the ramp. Next, assume that the angel is to remain on the cyclists shoulder as he executes the jump and safely proceeds on his way, The motorcycle, the rider, the ramp, and the surrounding environment are all part of a scene from a video tape recording.
- the guardian angel is a computer generated rendition and subject to placement and scaling under control of a computer program.
- a receptacle defined on his shoulder a guardian angel made to appear in that receptacle as the rider approaches the point of danger, all to remain in true spatial perspective as to size and viewing angle on every frame of the motion video being played as the rider approaches the viewer, executes the jump, passes the viewer, and proceeds on his way has not been possible with prior techniques and tools except by manual insertion, on a per frame basis.
- a user interface for the ultimate user, the person who will "use” the production.
- Use of the production could include such activities as playing of a game, "touring” of a museum, or “shopping” at an eclectic department store such as Harrods in London.
- a superset of this end-user interface is an interface to be used by the people that conceive and make the production.
- the superset user interface includes interfaces for pu ⁇ oses of authoring, capturing, editing, and binding the production.
- the process of creating spatial video according to the present invention can be enhanced by recording the coordinates of the camera used to shoot the film or video to be included in the spatial video environment and associating these coordinates with the video frames shot.
- virtual camera coordinates for each frame can be generated by the rendering program.
- Camera tracking is the detection and recording of the position and orientation of the camera in the environment where film or video is being shot (captured).
- the same or similar techniques can be used to determine the position of relevant features in the environment being captured with which hotspots, receptacles and the like are to be associated.
- One approach that could be used to provide at least a partial solution to the camera position and angle problem would be computer controlled camera mounts. These provide exact positional data but tend to be immobile, cumbersome and expensive. However, if the capturing is being done on a sound stage or other production facility equipped with such equipment, it certainly can be used to accomplish this feature of the specification. Use of such a camera mount to generate positional data is obvious. However the camera mount can also be used to generate positional data for features. This can be accomplished by making closeup shots of environmental features as the camera traverses the dimensions of the feature. The resulting position data can then be transferred to a feature position file for use in the production.
- Relative position measurement methods include dead-reckoning and inertial navigation.
- Absolute methods include active beacons, artificial landmark recognition, natural landmark recognition, and model matching or map-based methods.
- Dead Reckoning It is well known that dead-reckoning provides good short-term accuracy, is inexpensive, and allows very high sampling rates.
- the fundamental idea of dead-reckoning is the integration of incremental motion information over time, which leads inevitably to the accumulation of errors.
- the accumulation of orientation errors will cause large position errors which increase proportionally with the distance traveled by the object being tracked.
- Inertial Navigation Inertial navigation systems, also relying on integration of rate- data to yield position, similarly result in a small constant error increasing without bound over time. Efforts to minimize error sources result in high manufacturing and maintenance costs for these systems. However, in recent years, the prices for very accurate laser gyros and optical fiber gyros have come down significantly and the devices may be suitable for camera tracking applications.
- Active beacon navigation systems can be highly reliable, provide very accurate positioning information with minimal processing, and allow high sampling rates.
- Active beacon systems include radio frequency, laser and ultrasonic systems.
- One radio frequency based system which can be used to measure camera location is the Navstar Global Positioning System (GPS) which uses a constellation of 24 satellites orbiting the earth every 12 hours at a height of about 10,900 nautical miles.
- GPS Global Positioning System
- Error sources in the GPS include satellite position, ionospheric refraction, tropospheric refraction, multipath reflection and selective availability. All of these error sources, with the exception of multipath effects, can be virtually eliminated through the use of differential GPS (DGPS).
- DGPS differential GPS
- DGPS Various types of DGPS achieve various degrees of accuracy, but among them, the codeless phase differential reference using the phase of the two carrier frequencies achieves accuracy of about 1 cm.
- GPS is excellent for outdoor navigation, but is questionable for use in areas where RF reception difficulties may interfere with reception of its signals, such as indoor use. dense foliage or steep or mountainous terrain.
- Local active beacon systems generally require modification of the environment in order to put the required stationary sensors (or transmitters if the active beacons are fixed). Further, line-of-sight between transmitter and sensor needs to be maintained between the beacon on the vehicle and two or more ofthe sensors.
- Triangulation generally uses two or more active beacons at fixed locations a known distance apart.
- the moving object contains a directional receiver with which the bearings to the two beacons can be measured. This gives the values of one side and all angles of a triangle from which the position can be computed.
- Trilateration is the determination of a moving object's position based on measuring the time-of- flight of a pulse of energy traveling from a transmitter mounted on the vehicle to multiple receivers mounted at known locations in the environment. Conversely, there may be multiple transmitters mounted at known locations in the environment and one receiver mounted on the vehicle. Also, a transmitter and a receiver may be co-located on the vehicle and the time measured is a round-trip time-of-flight of the energy bouncing off multiple reflecting objects mounted at known locations in the environment.
- Time-of-flight systems include both laser-based and ultrasonic systems. Using time- of-flight information, such systems compute the distance between the moveable transducer on the vehicle and the stationary transducers mounted in the environment by measuring the time of flight of the light or sound wave. The relevant parameters involved in range calculation are, for ultrasonics, the speed of sound in air, and for lasers and radio frequency system, the speed of light. The speed of sound is markedly influenced by temperature changes, and to a lesser extent, by humidity. For example, an ambient temperature shift of just 30 degrees F. can cause a 0.3 meter error at a measured distance of 10 meters. Where ultrasound-based systems are used, significant temperature changes can be caused by use of movie lights during filming.
- Ultrasonic Trilateration As can be seen, there are several methods familiar to those skilled in the art which can be used to track the camera and record it's position. As will become clear as the invention is discussed, this data is useful for subsequent correlation with video (or other picture) frames and environmental features in order to accomplish the pu ⁇ ose of the present invention. Where the environment to be captured is relatively small and free from significant obstructions to interfere with wave propagation, ultrasonic trilateration is the preferred embodiment.
- An ultrasonic transmitter can be mounted on the camera, for camera tracking, or positioning wand, for recording positional data on relevant features of the environment. Receivers can be mounted in locations to minimize obstructions between the camera and sensors. Exact positioning of sensors will vary from environment to environment, however, where the environment being recorded is a room good results can often be achieved by mounting the sensors on the ceiling.
- Prior art algorithms for use in ultrasonic trilateration employ a least squares estimate and are valid only if the expected values of the errors in the measurements is zero.
- the positional data generated using prior art least squares algorithms contained sufficient false signals and large errors to seriously affect its viability in a camera tracking system.
- One aspect ofthe current invention is a more robust method of determining location under such circumstances.
- Today's multi-media productions such as interactive computer games, are typically created in three major steps: determination of content, production of content and assembly of production.
- the writer, art director, and the producer create all the pieces (film, art, sound), then pass them on to the programming team, which assembles the product for a specific system, such as Microsoft Windows or the Sega Saturn. If the final product does not sell well on the target system, the pieces must then be reassembled for another target system, often by different programmers and at great expense.
- the present invention provides such an integrated process.
- It is another object of this invention to provide an interactive motion video environment comprising a plurality of clip groups representing a plurality of paths through the environment wherein the user has the option to transition from a clip in one clip group to a clip in a second clip group at a point of intersection of the paths which said clip groups represent providing the navigating user with the visual impression of a seamless turn from one path in the environment to another.
- a system for creating an interactive multimedia production said production including an environment, displayed in high resolution video, which a user can freely explore and productions created using such a system.
- Table 1 is a table showing an example of disassembled code from the logic hunk of a production made according to a preferred embodiment of the invention with instructions loading the user interface ofthe production;
- Table 2 is a table showing an example of disassembled code from the logic hunk of a production made according to a preferred embodiment of the invention showing instructions running parallel with spatial video clip playing, illustrating certain types of interesting frames including hotspot activation instructions;
- Table 3 is a table showing an example of disassembled code from the logic hunk of a production made according to a preferred embodiment of the invention showing instructions running parallel with spatial video clip playing, illustrating other types of interesting frames including cache frame instructions;
- Table 4a is a table showing the flow language instructions used to make a simple flow routine according to a preferred embodiment ofthe present invention.
- Table 4b is a table showing the actual code a binder according to a preferred embodiment of the present invention generated from the flow language instruction of Table 4a;
- Table 5 is a table showing a disassembly of part of the hotspot data table from the Data hunk of a production made with a preferred embodiment ofthe present invention;
- FIG. 1 is a block diagram of a multimedia computer system for editing and displaying spatial video productions according to the present invention
- FIG. 2 is a block diagram of the generalized steps of the process of creating and running a spatial video production according to the present invention
- FIG. 3a is a side by side depiction of two pictures of adjacent areas of a room rendered using a standard rectangular projection 90 degree field of view representing a total horizontal field of view of 180 degrees;
- FIG. 3b is a portion of a cylindrical projection according to the invention showing the same 180 degree horizontal field of view of FIG. 3a;
- FIG. 4a shows the uncorrected distortion of vertical lines in the cylindrical projection image of a grid
- FIG. 4b shows an ideal cylindrical projection image of a grid
- FIG. 5 is a block diagram of the camera tracking apparatus and camera tracking control computer of a preferred embodiment of the present invention.
- FIG. 6 is a block diagram showing the capture and camera position data correlation apparatus and process of the present invention where a motion picture film camera is used in the initial capture;
- FIG. 7 is a block diagram showing the capture and camera position data correlation apparatus and process of the present invention where a video camera is used in the initial capture;
- FIG. 8 is a process flow diagram showing the iterative push-pull positioning process of the present invention.
- FIG. 9 is a process flow diagram showing the steps in the creation of an M-view according to the invention.
- FIG. 10 is an illustration ofthe catalog directory structure used to store M-views and visuals according to a preferred embodiment of the present invention
- FIG. 11 is an illustration of the inputs and outputs of the spatial video edit process according to a preferred embodiment ofthe invention.
- FIG. 12 is a general process flow diagram of the edit process according to a preferred embodiment of the invention
- FIG. 13 is an example of a spatial video map of an environment in a spatial video production as it might appear in the map editor or viewer according to a preferred embodiment ofthe present invention
- FIG. 14 is a process flow diagram showing the steps for laying out clips on a map according to a preferred embodiment ofthe process ofthe present invention
- FIG. 15 is a process flow diagram showing the steps for associating a video clip with a map according to a preferred embodiment ofthe present invention
- FIG. 16 is an illustration depicting possible transitions from various clips with a clip group and between two intersecting clip groups according to the invention.
- FIG. 17 is a process flow diagram showing the steps for detecting clip intersections according to a preferred embodiment ofthe invention.
- FIG. 18 is a process flow diagram of the runtime process in a production made according to a preferred embodiment ofthe present invention.
- FIG. 19 is a process flow diagram of the runtime flow inte ⁇ reter of a preferred embodiment ofthe present invention
- FIG. 20 is a process flow diagram of the runtime video inte ⁇ reter of a preferred embodiment ofthe present invention
- FIG. 21 is a process flow diagram of the DOFRAMES routine of the runtime video interpreter of a preferred embodiment ofthe present invention.
- FIG. 22 is a process flow diagram of the DOEVENTS routine of the runtime video inte ⁇ reter of a preferred embodiment ofthe present invention.
- FIG. 23 is a process flow diagram of the runtime asynchronous events handler process of a preferred embodiment of the present invention.
- FIG. 24 is a process flow diagram of the runtime visual load routine of a preferred embodiment of the present invention
- FIG. 25 is a relationship chart showing the relationships of the routines and processes of FIGS 18 - 24
- FIG. 26 is a block diagram showing the various types of assets to be identified and brought in to the spatial video edit process
- FIG. 27 is an illustration of a display screen showing the 7 navigational regions of a preferred embodiment ofthe invention
- FIG. 28 is a process flow diagram of the overall binder process of a preferred embodiment of the present invention
- FIG. 29 is a process flow diagram of the data hunk creation steps of the binder process of a preferred embodiment of the present invention.
- FIG. 30 is a process flow diagram of the logic hunk creation steps of the binder process of a preferred embodiment of the present invention.
- a navigable motion video production is a motion video-based representation of an environment in which the user ofthe production is provided with a degree of freedom of movement or freedom of viewing direction within the environment not previously available in motion video.
- the freedom of movement or viewing direction within a navigable motion video environment is characterized by the availability to the user of one or more navigation abilities not present in prior art motion video.
- Such abilities include, without limitation, the ability to travel along a path within the environment in either direction; the ability to pan left, or right beyond the initially displayed field of view while travelling along the path; the ability to pan up or down while travelling along the path; the ability to turn onto an intersecting path; the ability to change direction of travel along a path; the ability to stop at a point along the path and to rotate viewing up to all 360 degrees of the environment in the plane of rotation; the ability to interact with hotspots in the environment which maintain their proper associations with environmental features regardless of where on the display screen and what size the feature is at the particular location along the path and particular portion ofthe field of view being displayed; and the ability to interact with objects inserted in the video environment which objects maintain a proper size, angle of view and location within the environment regardless of the location along the path and the particular portion ofthe field of view being displayed.
- Spatial video is a preferred embodiment of navigable motion video which is a way of generating and playing motion video to give the impression of being immersed in a video world in which one can move around at will.
- the scenes presented in spatial video may show actual places, movie sets, or places that never existed but were modeled in a computer.
- the user's ability to move around at will includes the ability to stop or move ahead at user-controlled (or user-requested and computer-selected) speeds, the ability to look left and right, the ability to turn around, and the ability to turn left or right and follow a new path.
- the video When turning onto a new path, the video will show a smooth turning from the view looking along the first path to the view looking along the second, just as a person would see when making such a turn in the real world. Every video frame that the user sees will follow naturally from the preceding frame (no sudden jumps in the video) and will contain no discontinuities: no seam lines introduced in the video; no fractures where outlines of objects take unnatural bends.
- a spatial video clip is captured by moving a camera (or virtual camera) along a path through a world or environment.
- a lattice of intersecting paths will be captured in this way.
- the user will be limited to moving only along the lattice of video paths. At any intersection he will have the option of proceeding on the current path or turning right or left onto the intersecting path. If the lattice is dense enough, the viewer will have the ability of moving to virtually any point in the scene.
- the paths in this lattice are preferably placed on the natural navigational paths ofthe world or environment.
- the spatial video of the invention differs from normal linear video in several key aspects. Unlike linear video, spatial video is navigable and can include overlaid objects which retain their position and orientation relative to the features of the environment recorded on the video frames and the player's point of view. Unlike panoramic video, spatial video consists of motion video rather than photographic still shots. In the preferred embodiment, the navigable feature is coupled with the concept of capturing spatial video along the natural paths ofthe environment as discussed more fully below.
- the concept of natural paths is important to the preferred embodiment of the invention.
- the principle is to capture spatial video of the environment from the perspective of a person navigating through the environment. As a person navigating the environment will not view the environment from every conceivable point within the environment, spatial video does not need to be captured from the perspective of every single point in the environment.
- the environment is one suitable for walking
- the environment is preferably captured on film or video from the perspective of a person walking through the environment.
- the natural paths through the room might include a path from one door to the other around the table to the left and a path between the same two doors around the table to the right.
- the environment is preferably captured from the point of view of a person of ordinary height walking along the path. It would also be possible to capture the environment from the perspective of a 10 foot tall person or a 1 foot tall person. However use of such footage in the production would not contribute to the realism of the navigation of the environment and would consume storage resources.
- Field of view is not meant to be limited to the view captured by a single camera at a single time, such as the field of view captured by a wide angle lens, but is meant to encompass the total field of view captured from a single point within an environment regardless of the number of cameras used, the field of view of an individual camera lens or lenses used, or the number of camera headings or positions used to capture the environment from that point.
- Another feature of natural paths is that they often intersect. Thus, in more complex environments there may be numerous places where paths cross. Thus a top down map of the environment on which the natural paths are represented as lines may have numerous intersections and, consequently, may appear as a lattice of paths.
- An important feature of a spatial video environment using natural paths is to allow the navigator to change paths at any such intersection. Another important feature is to allow the navigator to not only turn onto any intersecting path, but also to turn around and retrace his/her steps at any point along a path.
- spatial video is created by first capturing the environment using the capturing system of the invention.
- While the system has great advantages in efficiency and cost savings when used to capture physically existing environments, it also provides advantages when some or all of the environment being captured is comprised of rendered scenes. Rendered scenes or portions of scenes can be easily integrated into a spatial video clip of the present invention. Physically existing scenes such as those shot on location or those created on sound stages can be captured on film or directly on video. In the preferred embodiment, as discussed more fully below, such scenes are captured on film using a motion picture camera running at 24 frames per second and equipped with a cylindrical projection lens. In the preferred embodiment spatial video clips are captured by moving the actual or virtual camera through the environment to be captured along the natural navigation paths of that environment.
- the benefit of using natural paths is that a significant reduction in the number of paths to be captured can be accomplished without the player experiencing any appreciable reduction in navigational freedom. This allows for a rich environment to be captured at a significant reduction in the video resource requirements.
- the natural paths must be captured on video clips representing a sufficient diversity of camera headings to capture all of the visual information which a person might view when negotiating the natural paths in different directions and looking in different directions. Less usual forms of navigation may or may not be allowed in a particular production. For example, capture of a natural path from point A to point B with the camera heading toward point A allows the user to "walk" from A to B looking toward B. Capture of the same path from B to A with the camera heading toward A allows the user to walk from B to A facing A.
- Capture left and right "side facing" views allows the user to face left or right at any point along the path.
- capture of these four headings along the path provide 360 degrees of visual information at every point along the path.
- this capture does not permit the user to travel from A to B facing toward A, that is backing up rather than walking forward.
- travel should be allowed, for example, where the program is simulating driving a tank driving in reverse would be desirable.
- to adequately capture a particular environment not only must the natural paths be determined, but also the nature of navigation to be provided in the particular circumstances.
- the turn transition is the turn transition, where the player makes a transition from one frame of a video clip shot on one path (the FROM frame) to a frame on another video clip shot by a camera (or virtual camera) traversing an intersecting path (the TO frame).
- the turn transition has the visual effect of a turn from one natural path to another.
- the second is the roundabout, where the player makes a transition from a FROM frame of a video clip shot on a path by a camera heading, for example, north, to a TO frame on another video clip of the same path shot by a camera heading, for example, east.
- the roundabout has the effect of a rotation where the player turns in place on a single path. Roundabouts may be repeated, for example, from east to south, followed from south to west and back to north, creating a 360 degree rotation. The mechanics of such transitions are discussed in more detail in the following sections of this disclosure.
- transitions-on-the-fly When the viewer tells the system to "turn right” onto an intersecting path, spatial video shows a smoothly turning angle-of-view that brings the viewer's viewpoint into alignment with the target path views. For example, when a viewer going north wishes to turn right (east), the system shows a succession of frames that all represent views from the viewer's current position, but looking in different directions. These directions start with north (zero degrees) and end with east (90 degrees).
- transitions-on-the-fly refers to a technique for generating this sequence of frames - in a format suitable for the playback engine - using only the beginning frame and end frame as inputs.
- All the intermediate frames in the sequence are combinations of the first and the last frame created according to the invention disclosed in the Special Effects Applications. Further, by caching TO frames, the transitions on the fly can be accomplished before or while the system is seeking and loading the TO video clip. Because the transition can be accomplished without use of video resources other than the FROM frame which is stored in a streamer or MPEG player buffer, the TO frame which is cached and then read into the buffer, and a short, preferably previously created generic transition sequence which is the same for all transitions of a particular type (such as left turn), the transition sequence can be used to mask seek time, loading time, clip set up time and the like with a meaningful transition.
- An additional use of the transition is to overlap the loading into memory (see runtime description below) of the cached TO frames of the destination clip with the transition.
- the frames representing the TO frames for all allowed transitions off of the clip being accessed are placed at the front of the clip. These are designated as cached frames and the MPEG frames are loaded into the cache memory before the starting frame of the clip is decoded and displayed.
- These loading tasks can be interleaved with the transition sequences being sent to the decoder.
- transition sequences can be used to mask delays caused by several different types of runtime operations.
- the most difficult aspect of the transition technique is ensuring that the generated frames use a format suitable for the video playing technology.
- the system must generate a sequence of valid MPEG frames containing the require views.
- This aspect of the technique for all digital encoding formats using reference frames and dependent frames is the subject of the Special Effects Applications inco ⁇ orated herein by reference.
- Another important aspect to this technique is the caching of "TO" frames.
- the possible "TO" frames which could be the subject of the clip currently being viewed are cached in memory as the clip is loaded.
- transitions can be immediately executed upon receipt of the instruction without the prior need to seek and load the "TO" clip.
- the system can prepare for playback of the frames in the new video path. This preparation may involve file seeking, buffer loading, video parsing, and the like.
- the present invention also includes enhancing technologies which allow for the creation of spatial video productions more efficiently and with more realism.
- the first such enhancement is the use of camera position data. If exact locations of the camera are known for every video frame, then automation may be used in locating intersection frames and in matching frames from different clips in a group of clips taken along a single path to generate smoother transitions and to generate transitions more quickly.
- a second enhancement is the use of wide angle lens projection. If each frame capture
- the entire scene may be captured in four frames. If two video paths intersect at right angles, and if each path is captured in both directions, using a 90 degree field of view lens, there is sufficient video for a full panorama (complete 360 degree turn-around) at the path intersection. Moreover if each path is captured four times (looking ahead, looking behind, looking right, looking left) with such a lens, then panoramas will be possible at any point along any path. These panoramas allow the viewer to turn and face any direction during playback, and to turn around and proceed along the path in the opposition direction.
- Capturing a wide field of view is useful for turnarounds and intersections as explained above. If the lens also captures an equivalent angle of height, and if traditional video aspect ratios (four by three) are being using then the lens might also be expected to capture about 67 degrees of height. (67 is : of 90). This is a large angle of height to capture. Such a view would usually include a lot of ceiling at the top and floor at the bottom, with the "interesting" parts of the scene compacted into the middle.
- a better solution is an anamo ⁇ hic lens that will capture 90 degrees of width but a "normal" viewing angle for height (say 34 degrees). This would capture all the width we need without capturing excess (and uninteresting) height. Of course the resulting anamo ⁇ hic distortion would have to be removed at playback time by stretching the video horizontally.
- Another enhancement is through the use of inte ⁇ olation and extrapolation techniques to place hotspots, receptacles and other types of overlays in the proper location with the proper size on each frame of a clip.
- inte ⁇ olation and extrapolation techniques to place hotspots, receptacles and other types of overlays in the proper location with the proper size on each frame of a clip.
- a further enhancement is through the use of angle inte ⁇ olation and the M-view structure, more fully described below, to have sprites inserted into the video maintain their proper size relative to the background on which they are superimposed as well as the appropriate face.
- the profile of the object visible to the player changes with the player's location in the spatial video environment.
- the sprite is a statue.
- the right profile may be visible.
- the front profile will be visible.
- faces of the statue between the right profile and front view will be visible.
- the Computer is designed to permit the efficient and cost effective development of spatial video productions and to permit the user to navigate freely within that environment along the natural paths contained therein, using a low-cost personal computer.
- the present invention may also be used, after porting to the appropriate software environment, to create and run spatial video productions on game consoles, high ⁇ speed workstations, mainframe computers or supercomputers, it was designed to provide this capability in the personal computer environment.
- the preferred embodiment was designed for operation in the Microsoft Windows 95TM operating environment and for use on computers compatible therewith.
- the invention is preferably used on any personal computer having a CPU clock speed of at least 90MHz, at least 8 Mbytes of RAM, at least 50 Mbytes of available hard disk space, and a CD-ROM running at a minimum of 2X speed.
- the computer also preferably contains a hardware decoder either on its motherboard or as an expansion board. Referring now to FIG. 1, there is shown a block diagram of a video system comprised of a multimedia computer system 10 which could be employed to implement the present invention.
- a computer system 10 having a Pentium7 lOOMhz CPU based computer 1 with an 850MB internal hard drive, a 4X CD ROM drive 8, a disk drive 9, a SVGA monitor 2, speakers 3, a keyboard 6, and a pointing device such as a mouse 7 with two selection keys 7a and 7b which may be used in conjunction with the mouse to allow a user to navigate through the spatial video environment and to interact with receptacles, hotspots and objects in that environment.
- the computer could be connected to a local area network and/or modem for accessing resources not located within the computer's local drives. Implementation of other user interface devices, add-ons, operating systems and peripherals would be obvious to those skilled in the art and will not be discussed further herein.
- the Video Decoder In the preferred embodiment the computer system also inco ⁇ orates a software MPEG decoder, such as SoftMotion from SAS Institute Inc. of Cary,
- MPEG-1 was the only compression technology available at the time the choice was made which could compress an hour of high-quality video onto a single CD. Since the time this development decision was made, other compression technologies have been developed. The processes ofthe present invention work equally well with most of these other technologies.
- Hardware players excelled at high frame rates, video quality and sound sync, but only occasionally did stretching (with panning), seldom did flicker free overlays (M-views), never supported seeking to a specified frame, and never reported exact frame numbers.
- Software players usually support stretching with panning and are usually arc able to report which video frame is being displayed at the moment. However, software players are not as proficient at playing video at high frame rates with synchronized audio. With a fast PentiumTM processor a software player, such as SoftMotion, will play audio-synchronized video with satisfactory results but not with the proficiency of a hardware player.
- Software players other than SoftMotion can be customized to support frame accurate seeking M-view overlays without flicker, though commercially available ones seldom support these capabilities.
- productions made with a preferred embodiment of the present invention are capable of achieving optimum results in systems which have a hardware player by using both a software player and the hardware player.
- One player (the software player or spatial video player) is optimized for sequences of spatial video, and the other (the hardware player) for "dramatic video".
- the runtime system switches between the two players to match the characteristics of the video currently being played.
- the spatial video player is chosen to excel in stretching, panning, knowledge of exact frames, and flicker-free M-view overlays (things a software player is good at). It is used to show free roaming sequences like moving down a corridor.
- the runtime system can switch to a hardware player that is better at showing high frame rates with synchronized audio.
- the switch is accomplished with the same flow routine which launches the dramatic video.
- Dramatic clips usually show brief scenes in which user interaction is disabled or restricted. They lack hotspots and M-views, and therefore allow a hardware player to be used.
- software players get better and PC processors get faster, we expect to be able to achieve audio-synchronized dramatic video playback comparable to that available with a hardware player using a software player. This would allow a single software player to be used for all video, and eliminate the need for two players.
- the inability of most MPEG players to perform frame-accurate seeking in MPEG streams is handled through use of the streamer component of either the system or the SoftMotion decoder available from SAS Institute Inc., Cary, North Carolina.
- the system can present a typical player with a single continuous stream of MPEG to play. As far as the MPEG player is concerned, no seeking is ever requested.
- the streamer component of the SoftMotion decoder or of the spatial video runtime engine manufactures this continuous MPEG stream by assembling MPEG data from selected segments of the MPEG clips, and from transition data. In effect, the streamer does the seeking required outside of the MPEG player.
- the streamer joins the new MPEG frames to the old ones so that the MPEG player sees what looks like a single MPEG stream with no interruptions.
- the streamer also injects the "synthetic" MPEG frames generated during transitions into its output stream.
- the MPEG streamer function is an important component of the preferred embodiment of the invention. Its relationship to the MPEG Player is discussed in the co- pending Special Effects Applications which are inco ⁇ orated herein by reference. It essentially constructs a virtual MPEG stream out of the various source MPEG streams, including the various video clips of the production, the dramatic video and the transition sequences to send to the decoder. As a result of the use of the virtual stream the MPEG decoder never "sees" a clip end. Consequently, the MPEG decoder never needs to initialize a new clip as it otherwise would have to do every time the system changed from one clip to another. Decoder initialization sequences may cause the skipping of some frames of the new clip and will cause a delay as the player decodes initial frames before beginning to display video from the new clip.
- the MPEG streamer has the ability to search or seek clips or other video sources, duplicate, omit, or reorder pictures present in the base video stream, and the ability to insert pictures into the video stream from other sources.
- the functionality of the MPEG streamer used in the preferred embodiment of the invention is shown in the annotated source code for the interface definition of such a streamer contained in Appendix C and inco ⁇ orated herein by reference. Further information regarding the capabilities of the MPEG streamer is contained in the Special Effects Applications and in the section below on transitions.
- the process of the present invention comprises five major steps. These steps have well defined components and well defined interactions between the components. We will present an overview of the five major steps including a description of the components of each step and how the components interact in the process of generating and playing an interactive multimedia production. It is useful to realize that when we say components we are referring to either hardware or software, or both, to accomplish a function. In all cases it will be obvious to one skilled in the art when the component is necessarily hardware or necessarily software or can be either hardware or software or a combination thereof.
- This step covers many of the traditional film/video production tasks 17 associated with any traditional movie or video segment or computer production, such as an educational program or game, including: writing, set, scenery, dialog, costume, and lighting design.
- the script may call for the use of many conventional elements, including musical scores, dramatic video clips, static bitmaps, such as pictures or text, and other similar features common in the art.
- the concept stage may require use of some of these conventional elements in a unique way in that their inco ⁇ oration into the production may be through the use of some of the processes ofthe invention disclosed herein.
- a conventional dramatic video clip may be played because the user has clicked on a spatial video hotspot or picked up an object contained in a spatial video receptacle.
- an animated video or dramatic video may be associated with a hotspot or contained in a receptacle.
- the concept stage results in a spatial, map-oriented script 18 including many conventional elements related to each other in ways which are often unique to the spatial video technology disclosed herein.
- the concept step also includes the substep of laying out the spatial scenes, determining the spatial, temporal and logical characteristics of various components of and objects in the spatial scenes and spatially, temporally and logically relating the scenes to each other.
- the spatial video script 18 is drawn in or otherwise transferred into and refined and augmented in the editor represented by the Edit step 14 shown in FIG.2.
- Concept also involves the determination of the various video, sound and other production assets 17 which need to be captured.
- Creation of spatial video preferably involves first the layout of the environment which is to be shown in spatial video. This layout is an important part of the spatial video script 18.
- Laying out or designing the environment involves creating one or more environmental scenes or navigational units. These scenes or navigational units are map- based rather than temporal, although elements within the scene may have logical or temporal attributes.
- the scenes may be derived from an existing structure or topography or created on a soundstage or in computer graphic renderings. However, a principal task to be accomplished in the concept stage is the inputting of the topography of the scenes into the editor.
- the scenes or navigational units of the spatial video production can be virtually any imaginable environment, such as rooms, the entire interior of a house, a floor in a multi-floor building, a level in a ship, or space station, a chamber or set of chambers in a cave or mine, a lake spotted with islands or a forest glade. All spatial video scenes have certain common characteristics relevant to the invention.
- the environment should be one which lends itself to navigation along a number of natural paths. It should also have environmental features with which the user can interact, entrances, path exits as well as scene exits, path intersections and map-based connections with plot elements and with other navigable units. The creation and characterization of these additional scene characteristics is a further task to be initially accomplished during the concept stage.
- a map-based spatial video script 18 may call for use of an existing location, such as the rooms and grounds of a castle or mansion, or the script may require the creation of rooms on a soundstage.
- time based navigational units can be rendered with computer graphics rather than being physically created. Objects, people and the like can be inserted into a spatial video production using known blue screen techniques and the like.
- the creation of the spatial script is substantially assisted by the authoring support provided by the spatial video editing system 14 (FIG. 2) of the present invention.
- the editing system allows the author to place the script into the map-based editor of the edit system 14, test and verify the scene and object relationships, and to verify that multiple-path problems have been identified and solved.
- the scenes in an interactive spatial video production might never be played back in the same order twice.
- Virtually every scene is inter-related with every other scene within the same act.
- the author can draw the outlines, proposed paths, hotspots, receptacles and relevant environmental features of the spatial scene in the map editor and use the editor to assist in final conception of the project.
- the author can use text substitutes for missing components and for missing logic elements, can define relationships and model all of the "what-ifs" and "if-then's" of the final production. Once this is done, the production can be test played with text as the medium. The text can be changed or replaced at any time and other media can be added.
- the Edit 12 step looks to the Capture 13 step and the captured assets stored during that step to verify that the necessary assets have been captured. Where such assets are not among the stored assets, Edit 14 generates capture requests for such missing assets. For example, in the initial spatial video script 18 inputted into the edit system of the
- Edit 14 step the interactive multimedia author may have described a scene which would be represented in the edit system as a text statement and flagged as incomplete with a note that further design must be done before the scene can be filmed. Later, an art director may substitute part of that text with a sketch and the costume designer may replace the sketch with a different sketch showing a character in costume. These substitutions will update the representation of the missing final asset.
- a script writer may associate a particular spoken line with the character. Subsequently, the sketch may be replaced by a video clip accompanied by an audio clip. Ultimately, all the text and sketches may be replaced by video clips, logic elements, and other media components as they become available. It is one of the advantages of the present system that, during the edit process any one with access to the spatial video map based editor of the edit system could click on the area where the scene is to be and see the scene or its earlier stage representation.
- the edit system of the present invention is a powerful authoring tool for creating and refining the spatial multimedia script. Just as scenes or dramatic elements can be sketched out and defined in the editor prior to the capture of the related asses, hotspots, receptacles, and objects can be planned, defined, and placed in the environment of a particular act and scene, and experimented with at any stage of their development. This allows the authors and editors to test and refine the logic elements of the script before capture of the assets. At any time during this concept step, the production can be test played, perhaps partially with text and partially with video and modified to make the production more appealing to the end user of the multimedia production.
- the script of a video game might call for an elevator to "take the player to a safer level" when an elevator call button is clicked on with a mouse.
- the "safer level" can be determined in real-time when the player clicks on 1, 2, or n after the elevator arrives in answer to the call button. This action creates a multiple- way branch which can be tested for action and for completeness (no missing elements). By test playing the game during the authoring step and possibly before any video has been included in the production, missing elements can be identified and additional shooting can be scheduled as necessary.
- the script becomes the storyboard which becomes the layout of the environment, which becomes, ultimately, the production.
- the spatial video script although primarily map-based, may still have time sensitive components. For example, a bookshelf may convert to a door when the user discovers a button or solves a puzzle. Further, a lamp could be "clicked” on by the user without activating the lamp hotspot unless the user has lamp oil in his inventory.
- elements of the spatial script will have conditional characteristics which must be associated with them in the logic of the production.
- the production may include conventional dramatic sequences which are subject to conventional scene scripting, shooting (or animation) and editing, It is emphasized that the spatial video environment of the invention expands the developer's options rather than contracting them. Consequently many pre ⁇ existing, conventional multimedia components are capable of being integrated into spatial video productions with little change, albeit with enhanced effectiveness, which is an additional feature and benefit ofthe system ofthe present invention.
- step 12 of scripting identifies many components which must be "captured” by the system for use in the spatial video production.
- conventional assets such as dramatic video segments, musical performances, voices, environmental sounds, bitmaps and the video production tasks associated with any traditional movie or video production, such as shooting film footage elements, taping of video elements and recording of audio elements must also be performed and captured in the capture step 13.
- spatial video is created by first creating the environment in the concept 12 and edit 14 stages, then "capturing” 13 the various navigable units of the environment. While the system has great advantages in efficiency when used to capture physically existing environments, it also provides considerable advantages, particularly at runtime, when some or all of the environment is comprised of rendered scenes.
- Optical Enhancement While this step is not necessary in practicing the invention, the decision whether to use optical means to enhance the scenes being captured is preferably made prior to filming. The decision can be postponed for rendered scenes where the graphics can be more easily adjusted later.
- an anamo ⁇ hic lens that is, a lens that produces a different magnification along horizontal lines in the image plane than along vertical lines in the image plane. More particularly, such a lens inco ⁇ orates a cylindrical element in which the image is distorted so that the angle of coverage in a direction pe ⁇ endicular to the cylinder is different for the image than for the object.
- the preferred lens is an anamo ⁇ hic, cylindrical lens which compresses the horizontal image features 2X relative to the vertical features.
- Other lenses which could be used would be conventional wide angle lenses, fish-eye lenses and the like.
- Yet another possible means for capturing extra picture width and/or height might be the use of multiple cameras mounted in tandem either horizontally or vertically, or both, and then stitching the captured scenes together during the editing phase. It is duly noted that such stitching is not a trivial task and achieving a seamless appearance in the results would require substantial skill and/or technique.
- scene stitching techniques are well-known in the art and are included in Apple's QuickTime VR program.
- the environment is captured using a wide angle lens, preferably a cylindrically projecting anamo ⁇ hic lens with a 90 degree horizontal field of view.
- a non-anamo ⁇ hic lens would yield about a 45 degree field of view.
- use of the preferred lens results, after correction, in twice the horizontal field of view of an ordinary lens.
- Horizontal compression is introduced intentionally thorough the choice of an anamo ⁇ hic lens to capture more scene width to support horizontal panning during playback, or to allow playback in a letterbox format. It is usually introduced by the optical elements of the anamo ⁇ hic lens, but can also be introduced post capture or by available optical simulating algorithms if the scene is rendered.
- a "normal" (spherical) wide-angle lens is used to capture the video. It must capture 90 degrees of width. Since the lens is not anamo ⁇ hic, it will also capture twice the needed height.
- the video is subsequently edited to discard half the captured scan lines, and to expand the remaining scan lines so that they fill the frame.
- the resulting image is essentially the same as the one captured with the anamo ⁇ hic lens, except it will contain less vertical detail.
- the reduction in vertical resolution is usually masked by the MPEG encoding which greatly reduces both vertical and horizontal resolution anyway.
- the horizontal distortion is dealt with at playback time when the picture is stretched horizontally by the MPEG player.
- Distortion is also introduced by the use of a cylindrical projection lens (or lens simulation for rendered video).
- This distortion gives us scenes which fit together seamlessly to produce smooth panoramas.
- the scenes will appear to be somewhat distorted, however.
- a straight horizontal line near the top of the scene will appear to curve downward toward the left and right edges of the scene.
- there is no correction of this type of distortion however it can be removed at playback time by expanding columns of video pixels in each frame, with greater levels of expansion applied to the columns near the left and right edges of the frame.
- the correction must be done on-the-fly during playback, at least during transitions, or there will be unrealistic bends at the seams where frames join. See text accompanying FIGS. 3a, 3b.
- a final type of distortion is spherical distortion which is due to inherent spherical imperfections in the commercially available cylindrical lenses. Rendered video can be free of this type of distortion. In a pure cylindrical projection, all vertical lines in the scene would show up as vertical lines in the image. The spherical distortion present in the lens results in vertical lines near the left and right edges ofthe scene bowing outward. We correct for this type of distortion by using standard video special effects editing equipment, as discussed below. Use of the preferred wide angle anamo ⁇ hic video allows for limited instantaneous navigation of the environment by panning to the non-displayed portions of the bitmap.
- anamo ⁇ hic lens also allows for realistic transitions between adjacent frames, provided the frames are shot where the nodal point of the camera lens is rotating about the same vertical axis point on the natural path.
- the spatial video along a particular path is taken by a plurality of camera trips along the path, as it is in the preferred embodiment, certain precautions must be observed to ensure that the camera nodal point always passes through the same points as it traverses the path.
- the edges of the FROM and TO frames could be intentionally blurred to hide discontinuities, or these edges could be manually edited to match better using standard computer-based image manipulation techniques.
- the nodal point In the preferred embodiment which uses a cylindrical projecting, anamo ⁇ hic lens, the nodal point must be at the same position at a given point on the path regardless of the direction the camera is facing to allow for seamless transitions as discussed below.
- the FROM and TO frames represent adjacent portions of a cylindrical projection image.
- FIG. 3a A comparison of FIG.3a and FIG. 3b demonstrates the advantages of using the preferred cylindrical (anamo ⁇ hic) lens for capture.
- Paired cylindrical projection images produce a more natural spatial video transition than a pair of rectangular projection images.
- FIG. 3a and FIG. 3b represent images with a total horizontal field-of-view of 180 degrees with the same center of projection.
- FIG. 3a is a pair of 90 degree rectangular projection images placed side by side
- FIG. 3b illustrates a pair of 90 degree cylindrical projection images placed side by side.
- Imagery may assist in understanding the benefit of cylindrical projection technology for transitions. Imagine being at the center of a large transparent cylinder whose axis is vertical.
- FIG. 4b illustrates a cylindrical projection of a grid of lines in a plane pe ⁇ endicular to the gaze vector.
- straight lines in any direction remain straight.
- the two images are the FROM and TO images of a transition, such as those disclosed in the co-pending Special Effects Applications inco ⁇ orated herein by reference, the result will be a seamless transition. If the same is attempted with two rectangular projection photographs taken from the same point the straight edges that straddle the two photographs bend at the seam between the images as illustrated in FIG. 3a.
- anamo ⁇ hic lens used in the preferred embodiment of the system does not yield a perfect cylindrical projection, but possesses some spherical distortion.
- the images are scaled horizontally such that the horizontal field-of-view at the horizon is 90 degrees, the field-of-view along the top and bottom of the frame is slightly greater than 90 degrees.
- Vertical lines bow outward as depicted in the grid picture of FIG. 4a. The amount of bowing in FIG. 4a is exaggerated to illustrate the effect.
- FIG. 4b illustrates how the FIG. 4a grid picture would look after digital video processing. Notice that the vertical lines are substantially corrected, resulting in a more accurate cylindrical projection.
- Frames taken with the preferred cylindrical anamo ⁇ hic lens are distorted horizontally to fit a 90 degree (horizontal) field of view on the frame which would yield a 45 degree field of view with an ordinary lens.
- a corrective lens is used on the projection equipment which reduces this distortion.
- digital effects are employed to make similar optical corrections to each picture after the anamo ⁇ hic image is placed on video, but prior to MPEG encoding.
- a sine wave distortion is applied using the DPM-1 Kaleidoscope brand digital video effects machine available from the Grass Valley Group. The exact setting of the controls is determined empirically by manipulating a picture containing vertical lines near the left and right edges. The amount of sine wave correction is adjusted until these vertical lines appear to be straight, thus more closely approximating a true cylindrical projection. The images are then scaled horizontally such that the field of view at the horizon is 90 degrees and sent to the MPEG encoder.
- the center of projection should be the same for the FROM and TO frames. This means that the nodal point of the camera lens should be located at the same point for the frames that will become the FROM and TO frames in the transition. Another way to say this, is that the camera must be placed in a position when filming the two transition frames to avoid parallax error.
- roundabout transitions are transitions where the user is able to rotate at a single position in 3-D space.
- the user can rotate 360 degrees by doing a transition, for example, from a forward to a left facing frame, then to a rearward facing frame, then to a right facing frame, finally back to a forward view frame.
- all four heading pictures taken at a roundabout point on the path should be taken with the camera lens nodal point at the same 3-D coordinate.
- the requirement of the preferred embodiment that the center axes of the cylinders coincide usually means that the camera must be level at all intersections and roundabout points. It is possible to satisfy this requirement without being level if, for example, the FROM frame was taken with the camera pitched down by, say, 10 degrees and the TO frame was taken with a camera roll angle of 10 degrees. This would be the case if the center axis was tilted 10 degrees from the vertical. (This example assumes a 90 degree field of view as in the preferred embodiment).
- the vertical field-of-view should be the same and the horizons should line up. This means that the vertical framing adjustments during the telecine process should be the same for a given pair of FROM and TO frames. Normally, the vertical framing will be constant for all spatial video in a production. If a different lens was used to film the FROM frame as compared with the TO frame, one must compensate for any optical differences.
- the requirement that there be no overlap or underlap at the common edge means that the camera paths should intersect at an angle corresponding to the horizontal field-of-view of the resulting MPEG video bitstream (90 degrees in the preferred embodiment). Although the intersecting paths should, therefore, be pe ⁇ endicular at the point of intersection in the preferred embodiment, there is no requirement that the paths be straight. However, when the environment is to be captured by multiple passes of the camera on a single path, as in the preferred embodiment, it is important that the paths be set out in a way to facilitate accurate retracing of the path.
- positional data associated with clips and frames as well as positional data associated with various significant features of the environment being captured can be used in conjunction with a top down map ofthe spatial video environment to plot camera tracks and features.
- the availability of positional data on video clips sought to be imported into the production provides the editor the ability to quickly identify the relevant clips, frames and features needed to quickly create intersection and roundabout transitions as well as hotspots and receptacles.
- visual verification of the frames located by use of tracking data is always desirable.
- a second advantage to the use of camera position tracking and the positional data generated hereby is in keeping the camera on the proper path.
- the ability to make smooth transitions between adjacent video clips in navigation along and between natural paths is an important aspect of the present invention and crucial to the impression of immersion in the environment which spatial video provides.
- the nodal point ofthe camera lens must follow precisely the camera path being traversed regardless of the direction the camera is pointing.
- the "camera path" is a widthless line through the 3-D space of the environment.
- a third advantage to the use of positional data is to assist in determining frame accurate intersection locations on the video clips of the intersecting clip groups.
- the location of an intersection of two camera paths using positional data can be accomplished by overlaying the various camera paths and identifying the coordinates of the intersection. Knowing the coordinates of the intersection, the frames in each clip with spatial coordinates closest to those of the intersection can be quickly identified and retrieved. These frames can be visually checked and fine tuned using visual means to select the FROM and TO frames with the best match, and the best matched frames then specified as the FROM and TO frames for the desired transition.
- the positional data is also helpful in identifying the best frames for roundabout transitions between clips in the same clip group.
- a fourth advantage is in placing "hotspots" or "receptacles" associated with significant features of the environment.
- the 3-D coordinates of the feature are known either through manual measurement or through location sensing means, the positional data associated with such features can be compared with the positional data of the various clips and frames. This will efficiently allow the project editor to identify camera clips and frames on which the feature is likely to appear in order to superimpose hotspots or receptacles on such features.
- One means is to use ultrasonic transducers which enable one to calculate the camera position by pinging a pulse from a transmitter co-located with the camera and measuring the time of flight between the transmitter and multiple stationary receivers which have been placed strategically within a camera tracking area.
- a camera tracking area is defined as that area which can be covered by one placement of the fixed transducers. Multiple camera tracking areas are combined to make up the total environment of a production.
- Another means to achieve camera position encoding could be a measurement system based on an eye-safe laser strobe and networked sensors system which records when the rotating beacon arrives at each sensor. Data from this approach would then be processed by a computer to determine the location of the camera within a pre-determined camera tracking area.
- Other methods to determine position of the camera could include any mechanical, sonic, radio frequency or optical means for tracking movements of the camera base, mounting head, and stand provided such means produces a required degree of accuracy and repeatability, or by using a computer motion controlled camera system.
- the positioning system as used in one preferred embodiment of the present invention consists of a moving remote sensor co-located with the camera whose position is to be determined, a number of stationary receivers, an ultrasonic controller unit which outputs the data and a camera tracking control computer.
- the positioning system is available from Intelligent Solutions, Inc. of Marblehead, MA.
- the camera sensor uses an electronic compass and a two axis clinometer to measure azimuth, pitch, and roll. This information is ultimately transferred to the camera tracking control computer.
- a transmitting ultrasonic transducer located in the camera sensor. This transducer emits a burst of ultrasonic energy (a "ping") upon receipt of a start command from the central computer. This ping is detected by the receivers which then send a signal back to the ultrasonic controller unit.
- the interval between the ultrasonic transmit start time and the time of the received signal from the receivers provides a measure of the distance from the moving transmitter to the fixed receivers.
- a preferred embodiment of the present invention employs the ultrasonic position tracking system from Intelligent Solutions, Inc. and the push-pull positioning process of this invention which is described below to determine the X, Y, and Z coordinates of the camera in the navigable space.
- FIG. 5 illustrates the components of the camera tracking apparatus according to the preferred embodiment of the present invention.
- the apparatus includes a camera sensor means 24 co-located with either the motion picture camera 26 or the video camera 28, and connected to an ultrasonic controller unit 22. Inside the camera sensor means 24 is an electronic compass 30 and an ultrasonic transmitter 32 attached to a sensor base. Also, up to eight ultrasonic receivers 36 may be connected to the ultrasonic controller unit 22 by means of cables 38 and channels 40. In the alternative, radio links may be used to make set-up easier and more flexible. Ultrasonic receivers 36 are strategically placed at various locations throughout the environment with sufficient distance between each receiver and the ultrasonic transmitter 32 and sufficient distance between each of the several ultrasonic receivers 36, to permit a time of flight diversity to be measured.
- ultrasonic hand-held wand 42 Also connected to the ultrasonic controller unit 22 is an ultrasonic hand-held wand 42, said connection being made by cable 44.
- the ultrasonic hand-held wand 42 is also an ultrasonic transmitter. It contains a button 43 which, through communications cable 44 or alternatively through a radio link, is connected to the ultrasonic controller unit 22.
- the ultrasonic wand can be used in a manner analogous to that of the camera transmitter to determine, when actuated adjacent to such item, the X, Y, and Z coordinates of any item in the environment.
- Ultrasonic controller unit 22 communicates with camera tracking control computer
- Camera tracking control computer 46 by means of serial communications cables 48 and 50.
- Camera tracking control computer 46 also accepts serial input via a third serial port (not shown) from the time code translator 54 (FIG. 6 and FIG. 7).
- Ultrasonic Controller Unit upon command from camera tracking control computer 46, causes ultrasonic transmitter 32 to emit a ping. In the case of the hand-held wand 42, the command is triggered initially by the depression of the button 43 on the wand 42.
- the ultrasonic ping from the camera mounted transmitter 32 or the hand-held wand 42 travels to the receivers 36.
- Some filtering of the received sound occurs in the receiver, which reduces the receiver's output to a binary "yes" (receiving 40 KHz sound) or "no" (not receiving 40kHz sound).
- This binary yes/no goes to the ultrasonic controller unit 22 through the cables 38 and the connecting channels 40.
- the controller 22 uses a digital signal processor (not shown) and other circuitry (not shown) to sample the inputs and measure the duration of the yes signal for each reporting receiver. The duration measure is used as a filter to eliminate false yes reports from the ultrasonic receivers 36.
- the controller's 22 processor also measures the time delay from the start of the transmitted ping to the leading edge of the received signal. This delay measure is proportional to the distance from the ultrasonic transmitter 32 or wand 42 to the reporting receiver 36. Finally, the controller 22 sends the time-of-flight data for each receiver to the camera tracking control computer 46 over an RS-232 serial cable 12. While many acceptable communications protocols are known in the art which are easily adaptable to this task, in the preferred embodiment the ultrasonic controller unit 22 available from Intelligent Solutions, Inc., receives the following commands from and provides the following responses to camera tracking control computer 46 in performing the above summarized tasks:
- T indicates the trigger command which causes an ultrasonic ping to be emitted from the camera sensor 24, and, FFFF is a 4 digit hexadecimal number that defines the length of time, in units of sample time, that the ultrasonic controller unit 22 is to wait for signals to be returned from ultrasonic receivers 36 before sampling is stopped.
- the sample time is 29.053 microseconds in the preferred embodiment.
- ⁇ CR> is carriage return.
- FF is a 2 digit hexadecimal number that defines duration, in units of sample time, of ultrasonic pings emitted from the camera sensor 24.
- the ping duration is normally set to 172 sample times, or 5 milliseconds.
- FF is a 2 digit hexadecimal number that defines the minimum width, in units of sample time, for a valid received ping.
- FF is a 2 digit hexadecimal number that sets the sound amplitude to be transmitted by ultrasonic transmitter 32 and ultrasonic hand-held wand 42.
- Set Command SFF ⁇ CR>, where S indicates the set command, and FF is a 2 digit hexadecimal number that sets a mask to enable sampling a channel 40. If the corresponding bit is set, the channels is enabled. If the bit is cleared, the channel is ignored even though it may have received a signal.
- R indicates the range data response
- X is a digit from 0 through 7 indicating which channel 40 has detected a ping
- FFFF is a 4-digit hexadecimal number that indicates the time delay in units of sample time, from the start of the transmitted ping to the leading edge of the received signal in units of sample time.
- the ultrasonic handheld wand 42 is a manually triggered ultrasonic transmitter manufactured by Intelligent Solutions, Inc. of Marblehead, MA that can be used to verify system operation, assist in measuring the position of receivers 36, and assist in measuring positions of path intersections, turns, corners, and other features within the environment as input to determining topography of the environment.
- the wand transmitter operates in a fashion similar to that of the camera sensor transmitter. The only differences are that the wand emits a ping in response to depression of the wand button 43. Depressing the button also signals the controller unit 22 over cable 44 that a wand ping was started.
- Ultrasonic Receivers Ultrasonic receivers 36, also available from Intelligent
- Each receiver sends a binary signal over a cable 38 to the ultrasonic controller unit 22 indicating whether or not they are receiving 40 KHz sound at any given instant.
- Camera sensor 24 contains compass 30, ultrasonic transmitter 32, a cone (optional and not shown) and a circuit board which is contained in the sensor base 34.
- the cone (optional and not shown) converts the ultrasonic beam from the ultrasonic transmitter 32, into an omni-directional pattern in the horizontal plane. Such a cone may be helpful to obtain more reliable receipt of the transmitter signal where the receivers are in horizontal orientation with the transmitter. The cone is not needed where the transmitter's 32 natural dispersion pattern provides sufficient signal strength to the receivers.
- the circuit board in the sensor base 34 receives RS-422 differential signals on cable
- ultrasonic controller unit 22 (preferably 5 milliseconds of 40 KHz) from ultrasonic controller unit 22 on cable 66 and converts it to a single ended signal for driving ultrasonic transmitter 32.
- Electronic compass 30 is preferably the TCM2 Electronic Compass Module from Precision Navigation, Inc. of Mountain View, California.
- the compass is mounted in camera sensor 24 and outputs heading, pitch, and roll measurements via cable 66 to ultrasonic controller unit 22 which, in turn sends the measurements to camera tracking control computer 46 via an interface conversion function described below.
- Compass data is derived from the manufacturer's proprietary tri-axial magneto-inductive magnetometer system and a biaxial electrolytic inclinometer, and contains no moving parts.
- the inclinometer permits electronic gimbaling for the magnetometers as well as providing information on pitch and roll angles.
- the compass 30 is activated by a start signal sent by the camera tracking control computer 46 along RS-232 serial cable 48, to the ultrasonic controller unit 22 where it is converted to an RS-422 signal and forwarded on cable 66 to the sensor base 34 portion of the camera sensor 24.
- the sensor base 34 reconverts the signal to an RS-232 signal and passes to the connected compass 30.
- the compass module sends RS-232 data to the connected sensor base 34 where it is stepped up to RS-422 and sent to ultrasonic controller unit 22 by cable 66 over an RS-422 interface using ASCII coded characters at a baud rate determined by the user.
- the signal is converted from RS-422 back to RS-232 and forwarded over cable 48 to camera tracking control computer 46.
- the standard output format may be configured to provide all of the sensor data parameters available, or only those parameters required by the user. Assuming all parameters are to be sent, the standard output format is:
- the ultrasonic TOF data captured by the camera tracking control computer 46 and the compass bearing data must be converted into position and orientation (6-D) coordinate data and associated with the specific video frames as they appear in the MPEG video bitstream to be most helpful in the spatial video production process. This is accomplished through converting the TOF data into positional data and correlation the positional data with the SMPTE time codes.
- the camera tracking control computer 46 is discussed in more detail below in the discussion relating to FIG. 6 and FIG. 7.
- Wlrier FIG. 5 shows two serial inputs to the camera tracking computer, the compass data and the ultrasonic data inputs, not shown in FIG. 5 is the third serial input which comes from the time code translator.
- the source of the time code data varies depending on whether the digital video assets to be captured originate as motion picture film or are originally shot as video. We will first discuss the process of correlation of time code data with video frame data and with camera position data where a motion picture film camera is used.
- Correlation of Film Frames and Camera Position Data The method for correlating the camera tracking data to frames of an MPEG video bitstream is slightly different for a film camera vs. a video camera. But in each case, a SMPTE time code signal is integral to the process. The hour, minute, second and frame components of the SMPTE time code have a one to one correspondence with each frame of film or video.
- a time code translator 54 is connected to camera tracking control computer 46, by means of cable 52.
- the time code translator 54 translates SMPTE time codes into an ASCII format which can be communicated to a PC, such as the camera tracking control computer 46 through a serial connection 52.
- the time code translator 54 receives the time code signal through its connection to a master SMPTE clock 60.
- the master SMPTE clock is jam- synched to the time code writer portion 62 of the motion picture camera 26 using the techniques well known in the art.
- a smart slate 64 may also be jam-synched with the master SMPTE clock 60 to be used as a backup to the time code writer 62.
- the output of the camera tracking control computer 46 is compass and TOF data. This data is time-stamped as received by the computer using the computer's internal clock. The time-stamped data is stored in the CTJ (camera tracking journal) file 68.
- CTJ camera tracking journal
- the motion picture camera 28 is equipped with a time code writer 62 which optically exposes a time code on each frame of film as it is shot. Prior to filming the time code writer 62 is jam-synched with the master SMPTE clock 60, thus the time code exposed on the film is the same as that being produced by the master SMPTE clock 60.
- the master SMPTE clock 60 is also connected to the time code translator 54, which in turn is connected to the camera tracking control computer 46.
- the time code received by the camera tracking control computer 46 is also time-stamped using the computer's internal clock before it is entered into the .CTJ file 68.
- the .CTJ file data is converted into camera position data with the CAMTRAK program processing 70 which results in a .CPA (camera position ASCII) file 72.
- This data associates 6-D camera position measurements with SMPTE time codes.
- the .CPA position data is then entered into a clip position table in the master database file 74 of the spatial video edit system 14 where it awaits synchronization with the film after its conversion to video.
- the developed negative 76 is converted into video by use of a telecine machine 78 equipped with a time code reader 80, this device reads the optical SMPTE code off of the film during the telecine process. It is important to note that time codes associated with the various frames of the developed negative differ from the time codes of the corresponding video frames. However, with the use of the time code reader 80 a table can be constructed which provides for each frame, the film time code and corresponding video time code from the resulting video tape 82. This information is recorded in the sync point database 84.
- the time code data can also be created using a smart slate 64 as described below.
- the frames component of the SMPTE time code for film repeatedly increments through the range of values from 0 through 23 (signifying 24 frames in each second).
- video is shot at approximately 30 frames per second, with the frame portion of the time code incrementing from 0 through 29.
- film time codes will not correlate 1 : 1 with the converted video time codes, due to the use of the 3-2 pull down process to create 30 frame-per-second video from 24 frame-per- second film.
- the 30 frame-per-second video is converted to 24 frame-per-second MPEG during the encoding process by means of a reverse telecine process performed by the MPEG encoder.
- the telecine and reverse telecine processes and the time code correlation process are well known in the art and will not be further discussed here.
- the camera tracking control computer 46 is not directly connected to the camera 26, at the beginning of each camera run the camera operator verbally or visually notifies the camera tracking control computer operator that filming is about to begin and the computer operator signals the computer to begin tracking.
- the camera operator may also run film to test lighting and equipment while the camera is not in motion. Consequently, there will be time codes in the .CPA files 72 for which no film frames correlate, and there will be film frames for which there is no positioning data.
- the time codes in the .CPA files and those associated with the film will not match up exactly, it is the function of the sync point database to assist in eliminating superfluous camera tracking data and film frames by matching up the common master time codes.
- the sync point database 84 and the camera position/time code table from the edit system database 74 are the inputs to the encoding assistant 86.
- the encoding assistant 86 is a program which presents the user with side-by-side lists of clips from the sync point database 84 and the Edit system database 74. These lists are sorted according to the master SMPTE clock 60 time code and clips are automatically paired up when an overlap of time code ranges is detected. The user then selects from the set of clips with paired sync point and camera position time code ranges to create a list of clips that will be encoded by the MPEG encoder 90. This list is written to the encoding batch file 88 in a format appropriate for the particular MPEG encoder 90 being used.
- each batch entry includes a time code corresponding to the master SMPTE clock 60 time for the first frame of the clip as determined by the encoding assistant.
- the MPEG encoder 90 is instructed to use this time code for the time code field in the GOP headers of the bitstream.
- the MPEG encoder 90 When the MPEG encoder 90 is operated, it uses the video tape in and out points to automatically cue, start and stop the VTR (video tape recorder) 92.
- the time code field from the first GOP header is used to compute an offset between the first frame of camera position/orientation data and the first frame of MPEG video, which completes the correlation process.
- An alternative way of associating the master SMPTE clock 60 with the film is to use an apparatus known as a smart slate 64.
- This apparatus contains a crystal-controlled SMPTE time code clock like the time code writer 62, and is synchronized to the master SMPTE clock with the same jam-synching method.
- the smart slate with its illuminated time code display, is held in front of the camera for a brief period each time the camera is started.
- the video tape is viewed by a human operator after the telecine process and the in point out point and smart slate time code for each clip are recorded and entered into an alternative form ofthe sync point database. Correlation of Video Frames and Camera Position Data. If a video camera is used instead of a motion picture camera, the process is simpler.
- the video tape recorder portion 56 of the video camera 28 contains an internal SMPTE clock. This clock associates the time code with the video tape 82 and also provides a time code signal to the time code translator 54 through cable 96 which, in turn provides an ASCII version of the time code to the camera tracking control computer 46 over cable 52. Because the VTR portion 56 of the video camera 28 acts as the master time code clock and as the camera tracking time code source, there is no need for a sync point database. The encoding assistant 88 is able to generate the required batch files with just the .MDB input. Use of Rendered Frames.
- a .CPA file with virtual camera position/orientation information is supplied by the rendering process, and correlation is straightforward.
- the starting time code for a rendered clip is arbitrary, as long as the same time code is used in the .CPA file 72 and the encoded MPEG bitstream 94.
- Time Code Writer The film camera 28 is outfitted with a Time Code Writer apparatus such as the AatonCode available from Aaton Camera, Grenoble, France. This apparatus contains a crystal-controlled SMPTE time code clock and it optically exposes the time code right on the film as the camera is operated.
- Telecine Machine The film is converted into video using a telecine machine such as the BTS Quadra Telecine.
- the time code reader used with the telecine machine is part ofthe Aaton Keycode System, also available from Aaton Camera, Grenoble, France.
- Time Code Translator In order to later associate the master SMPTE or VTR time code with the camera tracking data an off-the-shelf device called a "time code translator" is used to convert an SMPTE time code signal to an RS-232 format that is read over a serial port.
- An example of such a device is the "EASY READER I” product from Telecom Research, Burlington, Ontario, Canada.
- Time code translator 54 reads the 80-bit time codes of the Society of Motion Picture and Television Engineers (SMPTE) and the European Broadcasting Union (EBU). This code, when recorded on video or audio tape permits exact addressing of points on the tapes for precise editing, synchronization, dubbing, and splicing. For every frame of video, there is a corresponding Time Code Address. The address is an 8- digit number representing HOURS: MINUTES: SECONDS: FRAMES where the FRAMES number represents the n th frame recorded during the current SECOND.
- the master SMPTE clock 60 is preferably the
- Aaton OriginC manufactured by Aaton Camera of Grenoble, France.
- the time code writer on the camera is synchronized with the master clock using the standard jam-synching procedures. This synchronization is repeated at regular intervals, such as once every eight hours during filming to ensure that the clocks do not drift apart.
- Smart Slate Any smart slate is usable.
- the Smart Slate, DCODE TS-2 Time Code Slate from Denecke, Inc., North Hollywood, California is such a device.
- the smart slate 64 is used to display the time when a scissor-like bar (not shown) is shut or "clapped" to a display panel (not shown), thus allowing the displayed time to be captured on video tape or film.
- a smart slate it is prudent to use a smart slate as a backup in case of malfunction ofthe time code writer or reader. Often a malfunction of the time code writer might not be detected until telecine time. Further the slate provides a way to verify the calibration of the time code reader.
- the Camera Tracking Control Computer 46 computer can be a standard PC (personal computer) clone with the addition of a multi-port serial card in order to accommodate the high volume of interrupt-driven serial port traffic.
- the computer 46 runs a custom program called CAMTRAK that, in addition to controlling the Camera Tracking Apparatus, reads the Master SMPTE Clock via the time code translator 54.
- the camera tracking computer records and time stamps the time code information from the Master SMPTE Clock along with the other camera tracking data generated by the ultrasonic positioning system and the electronic compass.
- the CAMTRAK program requires the following serial I/O capabilities: 1. Interrupt-driven receive with buffering
- each byte received on each serial port is time stamped by the serial port interrupt handler.
- the time stamp of the first byte of a message is saved with the message.
- the PC's timer interrupt increments the system clock tick counter at a rate of approximately 18.2 Hz. This is not sufficiently fine-grained to permit accurate correlation of all of the data.
- serial port interrupt handler reads to most significant byte of the Programmable Interval Timer (either Intel 8253 or 8254) counter and combines this with the three least significant bytes of the system clock tick counter to create a time stamp with a resolution of 215 microseconds which is accurate enough for correlation.
- the CAMTRAK program's analysis mode is used to convert the .CTJ file into a .CPA (Camera Position ASCII) file 72. This analysis can be performed on the camera tracking control computer as the filming is occurring or by any PC at a later time.
- the .CPA file has X, Y, Z, heading, pitch and roll values to completely define the camera's position and orientation for each frame of film as identified by the time code from the Master
- the analog video sequences from the video tape 82 are encoded by an MPEG Encoder 90, such as the RTS-3000 MPEG Encoder from Sony Co ⁇ oration.
- the encoder uses the video tape recorder's (VTR) inputs and outputs to automatically cue, start and stop the VTR.
- VTR video tape recorder's
- the MPEG encoder preferably performs a reverse telecine so that the resulting MPEG stream is 24 frame-per- second MPEG.
- the time code field from the first GOP header is used to compute an offset between the first frame of camera position/orientation data and the first frame of MPEG video, which completes the correlation process.
- Push-Pull Positioning Algorithm Prior art algorithms for use in ultrasonic trilateration employ a least squares estimate and are valid only if the expected value of the error of each measurement is zero.
- the positional data generated using prior art least squares algorithms contained sufficient false signals and large errors to seriously affect its viability in a camera tracking system.
- One aspect of the current invention is a more robust method of determining location under such circumstances.
- the system disclosed herein employs a "push-pull" method which determines the position of the ultrasonic transponder by iteratively calculating and weighting force vectors using the TOF data collected by the ultrasonic receivers in the environment.
- the push pull method of the invention employs fractional force vectors computed from the collected ultrasonic TOF data to provide accurate positional data.
- a single ultrasonic TOF value defines a sphere about a fixed ultrasonic receiver, said sphere being the set of all possible locations for the transmitter.
- TOF values from a pair of receivers define the intersection of two spheres, which is either a circle, or, if the transmitter is collinear with the receivers, a single point.
- the sphere around a third receiver will intersect said circle at two points, in the general case. These two points are a mirror image of each other in the plane defined by the three receivers.
- the frequency of pings used in the preferred system is under user control. In most environments the preferred frequency is eight times per second in the environments used by inventor, with frame position data for frames shot between pings created by linear inte ⁇ olation of position data calculated from the bracketing pings. However, the ideal frequency varies as a function of the acoustic characteristics of the environment and the speed with which the camera traverses the environment. The goal is to ping as often as possible in order to reduce the amount of inte ⁇ olation used, especially if the camera is traveling quickly, but to keep the ping frequency low enough so that ping echoes have faded between pings.
- FIG. 8 illustrates an embodiment of the position calculating process of the invention where the push-pull processing is not being done in real time, but in post-shooting processing.
- the TOF values being accessed immediately after "start" are previously recorded and contained in a .CTJ file 68 (see FIG. 6).
- the process also adjusts TOF values for the inherent system delay (this adjustment step is not illustrated).
- an initial assumed position of the transmitter is chosen and the distance from this point to each receiver is computed. This distance is compared with the TOF value between the transmitter and the receiver.
- a three-dimensional "force vector" is computed based on the assumed position chosen for the transmitter for each receiver for the first "pings" TOF data.
- the calculated force vector either "pushes away” from the receiver (if the TOF value is greater than the computed distance) or "pulls toward” the receiver (if the TOF value is less than the calculated distance).
- the force vectors from all receivers are first weighted, as discussed below and then combined into a single vector, the value of the combined force vector is normalized as described below and the assumed position of the transmitter is adjusted by some fraction of this vector. Then the process is repeated until the magnitude of the "push” or "pull” of the force vector is less than a pre- chosen value which was chosen based on the required accuracy ofthe result. Only a fraction of the vector is applied for each iteration in order to reduce the likelihood of falling into a false, local minimum. For the preferred embodiment the empirically chosen fraction is 2.
- a local minimum is a measurement point in which, due to anomalous acoustic properties of the environment, the ultrasonic trilateration system computes a combined force vector of zero, although in actuality the calculated point is not its true position .
- the system continues to iteratively calculate 3-D force vectors based on the first ping TOF data and to update the assumed position ofthe camera or wand until the combined force vector is less than epsilon.
- the empirically chosen value of epsilon is 1/100 centimeter.
- the first step of the weighting process is to multiply a receiver's force vector by the reciprocal of the receiver's TOF, consequently the weight given the force vector is inversely proportional to the TOF and to the distance from receiver to transmitter.
- the second step ofthe weighting function is to decrease the weight of a receiver based on the number of bad TOF values in the vicinity of the pulse in question. Bad TOF values are those which have been identified by various manual or automatic techniques as being invalid.
- the TOF value is actually for a path between the transmitter and receiver which includes one or more reflections off various surfaces in the environment.
- the quality of the TOF data is graded and weights assigned, the weights are first normalized and then multiplied by their respective force vectors.
- the method employs iterative fractional application of force vectors to increase the reliability ofthe resulting positional data. Still, there is a risk that the iterations of the method may settle into a local minimum. This is more likely when the receivers are coplanar or nearly so, such that there is a local minimum on the wrong side of the plane.
- One way the method minimizes the probability of a local minimum is through the choice of the initial position. In the coplanar receivers case, the initial position is chosen to be much farther away from the plane than would be possible due to physical constraints of the environment and/or the range of the ultrasonic equipment.
- the preferable initial assumed position would be a position with the Z (vertical) coordinate well below the floor of the room.
- the computed position moves towards the plane by a fraction of the combined force vector it is more likely to settle into the actual position rather than some local minimum.
- the algorithm uses the previous camera position as its assumed starting position. Operation of the Positioning System. Assume the equipment is in place, has been initialized, calibrated and is ready for use. If film is being shot, an operator, while holding a smart slate 64 which has previously been jam-synched with the production time source 60 or 56 in view of camera 28, activates the smart slate 64. Smart slate 64, upon being activated, displays the clock time. The displayed time is thus captured in the picture as recorded on film and can later be associated with a particular frame or picture.
- the Master SMPTE clock 60 or the VTR 56 used as the production time source, also sends the time over cable 98 or 96 respectively to time code translator 54 which converts the SMPTE time in the form of HOURS:MINUTES:SECONDS:FRAMES to an RS-232 format for forwarding to camera tracking control computer 46 over cable 52.
- This time based data stream becomes the first of three serial inputs which the camera tracking control computer 46 receives.
- camera tracking control computer 46 activates electronic compass 30 in camera sensor 24 by sending a command by way of the ultrasonic controller unit 22. This command initiates a continuous data stream from the electronic compass 30 which gives heading, roll, pitch, three axis (X, Y, Z) magnetic field information, temperature, and error code (as applicable) and a checksum.
- This data stream then, becomes the second of the three serial inputs received by camera tracking control computer 46.
- Camera tracking control computer 46 also sends a command to ultrasonic controller unit 22 to start an ultrasonic measuring cycle.
- ultrasonic controller unit 22 sends a signal to ultrasonic transmitter 32 which emits a burst of ultrasonic energy.
- the ultrasonic signal is received by ultrasonic receivers 36 which send signals back to ultrasonic controller unit 22 by means of cables 38.
- Ultrasonic controller unit 22 measures the time delay between the start of the triggering output pulse of ultrasonic energy and receipt of the leading edge of the return signal from each of ultrasonic receivers 36.
- Ultrasonic controller unit 22 then sends data to camera tracking computer 46 which, for each ultrasonic receiver 36, provides the time delay as calculated by ultrasonic controller unit 22.
- This input from ultrasonic controller unit 22 becomes the third of the three serial inputs received by camera tracking control computer 46.
- Each of the three serial inputs is time-stamped as it is received by tracking control computer 46. These time stamps then become the key to allow camera tracking control computer 46 to correlate the above three serial input data streams with the SMPTE code.
- the positions of relevant features of the topography of the camera tracking area are placed in the map-based editor.
- the paths can be drawn manually such as in a draw program or created from tracking data supplied by the tracking computer.
- a map displaying the drawn or otherwise recorded features is then generated by the system for use with the map-based editor.
- the features recorded could include, the location of all corners, turns, intersections, and end-points of all paths in the environment and the location of all features with which a user may interact while playing the production.
- the location of these features can be determined by any means known in the art.
- Such features can be recorded by use of a wand-type transducer according to the invention using, in the preferred ultrasonic embodiment, the following steps: 1. Activating a wand type ultrasonic transducer at the position of the feature whose location must be determined,
- the wand tracking data is generated using the same method as used to calculate the X, Y, and Z coordinates ofthe camera as described above. Alternatively, these positions can be measured by any other means known in the art and the topographical measurements recorded in the map-based editor. Special Filming/Taping Camera Path Techniques
- the most important additional task that should be done in the capturing step is to insure that there is adequate coverage by the camera of the environment being filmed or video taped. Besides recording the position and orientation of the camera for each frame of video, it is equally important that there be camera shots taken from sufficient headings along each path so that seemingly seamless transitions can be constructed as viewed by a player of the production while navigating the environment. Special shooting guidelines should be followed if these transitions are to look realistic.
- any other shooting technique available in the art would also fall within the invention.
- any other workable combination of lens angles and camera headings could be used.
- lens angle and camera heading combinations In order to provide 360 degree seamless transitions such lens angle and camera heading combinations, when combined together, must provide a full 360 degree coverage of the environment at any point.
- the capturing step is also where all of the various media elements (also known as media assets) called for in the script, including video, movie, graphic, animation, and audio segments are digitized, compressed, converted to an internal format, and stored in the computer.
- media assets also known as media assets
- One particular type of asset described below is called a multiple-view object or M-view.
- the capturing step also includes conversion and storage ofthe topography data.
- An important feature of the invention is the ability to augment a video scene with bitmaps of objects. Footage of a bare tabletop can, at run time, show a table with a vase on it, or a frog or any other bitmap image. Information about what object to draw, where to draw it, and how to draw it must be entered into the system during the edit step so it can be fed to the run time engine ofthe final production when needed.
- the logic uses object oriented terminology.
- the physical item to be entered into the production is referred to as an object.
- Particular depictions ofthe object are referred to as instances.
- the concept ofthe vase is an object.
- Various instances of the "vase" object could be empty vase, vase with red flowers, vase with wilted flowers and broken vase. Each of these instances would have a different M-view associated with it, but would share many properties with the other instances in the object class.
- object classes are not discussed and the word "objects" may refer to an object class or an object instance, depending on the context. The relationship between object classes and instances is discussed in more detail below.
- the invention includes a process for creation and insertion of objects into video that were not there when the video was created. Traditionally these objects have been called “sprites.” Sprites are typically small bitmaps which can be overlaid and moved on other bitmaps. Sprites may be a single representation (one face, one size); but sometimes have multiple bitmaps associated with them representing multiple faces with one size; sometimes animated, one size; sometimes a few discrete sizes.
- the process of the invention creates a new type of sprite, a multiple view object, called an M-view which has the ability to be inserted into the spatial video environment with the appropriate scale, angle and view for the place of insertion in the environment and the perspective ofthe viewer.
- M-views unlike prior art sprites, allow scaling to the desired size, rotating to appropriate face and moving to the right position.
- M-views grow, shrink, rotate and move such that they appear to be part ofthe scene.
- M-views may be created from a real object by capturing and digitizing images of the object from various angles or from computer models that may be rendered from various angles. There is no single preferred way as what is preferable will depend on the size, and complexity of the object as well as its importance to the production. There are numerous capture and rendering programs available which are capable of capturing or generating the required images. Regardless of the capture technique used, preferably at least one image is obtained for every 6 degrees of rotation about each axis around which the object is to be rotated. The angles need not be uniform or about any particular axis. The needs of a particular production will dictate which views are to be captured. Such needs include the scripted location and behavior of the object for which the M-view is being created.
- Such considerations include, for example, whether the object is above or below the plane of the viewer, and whether the object rotates or moves in any other manner.
- the M-view will change size and orientation in response to the receptacle in which it is placed.
- shadows and perspective are not adjusted.
- the number of angles and the resolution at which the M-view is rendered is chosen by the editor. Choices are made as a function of the importance of the M-view in the scene, and the importance of high resolution or precise angle control. The importance varies depending on the shape, size and color ofthe M-view.
- M-views are created as assets inco ⁇ orated into the production at edit time, they are not created on-the-fly.
- some or all of the faces or resolutions stored in the M-view catalog of the preferred embodiment could be eliminated by storing the computer model of the M-view object and rendering the object on-the-fly at runtime. Caching or previous frame renderings might reduce the computer rendering required in such a system.
- the developer could control additional elements such as lighting, which could vary for the same object in different receptacles. Further, one could use all 6 camera position values to produce a correct image, rather than an approximation.
- the bitmaps for the M-view views are created, it is important to create a transparent area around the object (unless the object is rectangular from all sides and always exactly fills the bitmap). This could be done with an alpha channel that tells the relative opacity or transparency of each pixel, or it can be done with a reserved color or color index that means the pixel is to be treated as transparent. In any case, the pixels at the edge of the object should not have anti-aliasing with the background color that is not part ofthe object or the background color will be visible later, tingeing the edge color ofthe object. It is difficult to remove the blue screen color from the edge of real object.
- Aliasing is an artifact well known to those skilled in the art which occurs due to limitations in the sampling frequency which reduce the clarity of the picture, creating a noticeable edge effect.
- Anti-aliasing is a way to make images smoother by choosing a pixel's color to be a proportional average ofthe colors that it might have been. Normally use of anti-aliasing algorithms will reduce edge effects; however when used with blue-screened overlays, the edges of the M-view remain tinged with the replacement color. Even the best techniques currently available often leave a blue tinge to the object's edge during filming of actual objects on a blue screen, and object edges are often tinged with blue even without use of anti-aliasing techniques. Therefore, where possible computer generated objects are the preferred objects for M-views.
- the preferred embodiment uses a reserved color technique where magenta is reserved for transparent parts ofthe bitmaps.
- the bitmaps that are digitized from the real or modeled object may be kept in any convenient format; the preferred format is that provided in the Targa graphics file format defined by Truevision, Inc. of Indianapolis, IN.
- the Targa format was chosen because of its compatibility with all of the various post-processing steps applied for smoothing, sizing, cropping, and the like, and in particular compatible with the preferred Softimage rendering software and PBM file conversion utilities used.
- the Softimage software is published by Microsoft Co ⁇ oration of Redmond, Washington, PBM utilities are published by Jef Poskanzer Shareware, available from Jef Poskanzer at jef@well.sf.ca.us apple!well!jef and are generally available to those skilled in the art.
- the M-view compiler reads all of the bitmaps that belong to a single object and binds them into a single file for the benefit of the runtime system. Also in order to save storage space it is recommended that a suitable compression technique, such as run length encoding, be used to compress the images. Further, the preferred embodiment of the editing system allows the user of the editing system to chose the resolution at which each M-view is to be retained.
- the preferred compression technique uses run length encoding using the principles set out in Foley & van Dam. I he steps of the preferred method of creating an M-view are shown in the process flow diagram of FIG. 9.
- the first step is often to build a 3-D model of the object, preferably using a computer workstation, such as sold by Silicon Graphics, Inc., programmed with a computer aided design program such as Softimage from Microsoft Co ⁇ oration.
- a computer workstation such as sold by Silicon Graphics, Inc.
- Softimage from Microsoft Co ⁇ oration
- the capture of faces around the axis of rotation at 6 degree increments is sufficient.
- the resulting 60 bitmaps representing the 60 faces stored in .TGA files are them moved from the rendering system to the PC.
- the files can be further manipulated using the preferred PBM Utilities or similar graphics utilities.
- an M-view compiler program which takes the 60 (or however many) designated faces of each M-view and loads them in a data structure that can be loaded from a disk file to memory and quickly accessed to get any desired face.
- this same structure is used for accessing the faces for placement on frames of video.
- This directory structure is shown in FIG. 10. Virtually the same data structure is used in the edit system as well as at runtime for performance reasons.
- the access mechanism uses map-based data to determine what view to overlay on the display.
- the M- views themselves remain in the catalog data structure, but are represented by references in the edit time database which ties M-views to specific video clips and frames.
- a reduced version of the M-view catalog database structure is copied into the runtime data hunk (see below). Unused M-views are not copied over.
- the editor accesses the M-view directory as being under the catalog directory.
- Other subdirectories include the Album Directory which is also shown.
- the edit system user using the process of FIG. 9 can decide on the resolution and angles which will be a part of the final runtime production.
- the M-views relevant to a navigable area are preferably kept in memory, therefore, there is a trade off in quality of realism versus memory consumption on number of views and resolution of the M-view image.
- the goal is to keep the amount of data in memory as small as possible to save memory resources for other parts of the system.
- M-view resolution must be high enough to satisfy the production's visual requirements.
- the resolution and number of faces for an M-view can be chosen on an individual M-view basis so that more important M-views can have better resolution.
- low resolution M-views are kept in memory to allow for rapid navigation of the environment, however, if a user pauses to explore a particular area, the M-views in that area are updated with high resolution M-views from the storage medium, thus allowing for rapid navigation and higher quality reproduction without taxing the memory system where the user might stop to examine an M-view more closely.
- the editor examines the available M-view angle faces at certain points along the camera path and designates the desired M-view angle. For example, the editor can designate a starting angle for the first frame in which the M-view appears, one intermediate angle near the midpoint of the M-view's passage through the frame sequence, and ending angle for the last frame and allow the system to inte ⁇ olate the angles for the intermediate frames.
- the inte ⁇ olation used is simple linear inte ⁇ olation between the user-selected angles.
- the M-view access program chooses the nearest available face to the requested one.
- the user of the edit system can assign an angle oi ' 10 degrees at the beginning of a path and 50 degrees at the last frame on the clip where the M-view is visible.
- the total number of frames in which the M-view is visible might be 1000 frames. In these 1000 frames the angle of the M-view will adjust between 10 degrees and 50 degrees in increments chosen by means of linear inte ⁇ olation.
- the M-view that was available at edit time may have had 60 faces, so the angle of 10 degrees was represented by the 2nd face (12 degrees) and the angle of 50 by the 8th face (48 degrees).
- the editor may have decided to reduce the storage requirements of the M-view be changing the number of faces from 60 to 12, i.e.
- the face at the beginning ofthe clip will be represented by the 12th face (corresponding to 360 degrees which is the same as 0 degrees which is as close to 10 as we get with only 12 faces) and the angle of 50 degrees is represented by the 2nd face (60 degrees).
- the first face will be used in the middle of the clip as well (30 degrees).
- the system chooses the nearest available angle to represent the requested angle and the system adjusts automatically to the developer's reduction of the number of available angles (faces) to save memory or to the developer's increasing the number of faces to improve the angular resolution.
- M-views have associated with them not only angles, but also a size.
- an M-view is associated with a receptacle which is constructed during the editing process discussed below.
- a receptacle has associated with it a size and a position.
- the M-view size and the M-view sizes are combined by multiplying the two sizes together.
- the M-view size increases proportionately. It may be inaccurate to speak of an M-view being "in" a receptacle.
- the M-view gets its size and origin from the receptacle, but, depending on the size of the M-view, the M-view bitmap could be much larger than the receptacle dimensions.
- the receptacle can be sized to any size using the receptacle edit tool in the spatial video edit system. Its initial size is set by the developer depending on the use for which the receptacle is intended and the environmental feature with which the receptacle is associated. It is often useful to scale all receptacles in a production the same. That is, make all receptacles the size they should be to contain a figure or feature of known size. For example, all receptacles could be scaled to contain a one-foot object. In this way, the size of M-views relative to the video environment will remain consistent no matter in which receptacle they are placed. As the size of the M-view face actually displayed is determined by multiplying the
- the M-view size is a function of the size of the receptacle.
- the M-view bitmap is scaled according to the multiplication to the proper size. As it is important in a production to create receptacles of a constant size within the video environment, it is also important to provide various M-views with size numbers which reflect the relative sizes of the object in the M-view. For example, where the receptacle is sized to reflect a one-foot square, the M-view of an object which is one foot in height should have a size number of 1. An M-view object which is three feet in height should have an M- view number of 3.
- an M-view object which is half a foot in height should have a size number of 0.5. Therefore not only will the M-view grow or shrink as the receptacle enlarges or becomes smaller, but also an M-view with a small size will appear smaller in a given receptacle than an M-view with a larger size and both will be properly scaled for the environment in which they are depicted. This allows different sized M-view objects which are placed into the same receptacle to keep their relative size and to appear true to their absolute size in the video environment.
- the system When the M-views are overlaid on the video, either at edit time for placement or at runtime, the system must be able to both translate (position) and scale (size) the bitmaps as well as render just the non-transparent pixels to the screen.
- the M-view software implements all three operations in a single step for efficiency, although they could be done one at a time on fast enough hardware.
- Run length encoding compression and decompression used for the M-view faces lends itself well to rapid decompression, scaling, and transparency; however almost any bitmap compression/decompression format could be used as long as decompression is fast enough and memory requirements are small enough to fit with the resources devoted to M-views in a particular production.
- a number of techniques are available for updating an M-view or other type of sprite animation on screen; typically a double or triple buffer is used.
- a double buffer technique one buffer is under construction while the other buffer provides a stable image for the display hardware.
- the double buffer technique is preferable to writing directly to the screen because it provides a more stable image for display.
- the scene under construction is completely constructed in memory rather than in the screen buffer, then the completed image is copied in its entirety into the display buffer.
- use of video memory storage and video page swapping obviates the need to construct the scene in memory and then copy to the display buffer.
- every new frame completely replaces the previous frame because of the way the preferred SoftMotion player works.
- the M-view system overlays it with one or more M-views. The contents ofthe display buffer are then sent to the display.
- M-views may use animation as well as static bit-maps. This can be done by choosing bitmaps that reflect the changes in position and posture ofthe object in the M-view.
- M-views and receptacles may also have a logic flow associated with them. For example, a flow program may allow a receptacle to accept or reject different objects or allow different objects to operate on each other.
- the system provides the ability for the user to "pick up" an object represented by an M-view by creating a special cursor.
- the system stops overlaying the M-view on the video so it appears to be removed.
- the mouse cursor changes to a representation of the object picked up to give the user feedback that the user is now carrying the object.
- the cursor/object can be dragged across the screen off of the video portion and, for example, into a container shown on another portion of the display.
- Visuals are used in the preferred embodiment of the system to refer generally to the preferably rectangular areas of the display in which data in the form of bitmaps (still pictures), text block, label, video and the like are displayed. Of course, while the preferred shape is rectangular, any other shape could be used, although at the cost of use of significantly greater processing resources.
- Visuals are the primary building blocks for building the spatial video production display. The visuals in which spatial video or dramatic video is played are called “viewports.” Visuals which frame bitmaps are called “pictures.”
- User written embedded, external programs, preferably OCX programs, are displayed in windows called “custom.” OCX's are further discussed below in the section on embedded external programs. All other visuals are called “windows,” including the window in which the entire production resides.
- a display of a spatial video production made according to the invention would have a window which fills the entire display screen, centrally located within this window might be a large viewport in which the spatial video is playing.
- a window could be located which could contain inventory, a dialog window, or similar constructions useful for interacting with the program.
- any such window might overlap one or more other windows at certain times during the production. For example, in a production of a museum, clicking on a hotspot covering a statue might cause a flow which in turn displays a picture showing a large bitmap of the statue. Visuals can be placed one within another in a "parent-child" relationship.
- the entire spatial video production preferably plays within a window.
- This window contains various "child” visuals, such as pictures in which bitmaps are displayed, surrounding the viewport in which the spatial video runs.
- the various bitmaps in the visuals surrounding the video window may react to events within the video window by altering contents in response to events occurring within the viewport.
- a custom window might display a user created OCX which, for example, may display a map visual of the video environment on which a red dot appears representing the position of the player within the environment.
- the OCX could take the camera location data from the runtime video intc ⁇ reter (see below) and, as the player moves, cause the red dot in the visual to change its location on the map visual to reflect the change in position.
- a picture visual of a button might have a flow associated with it which, when the button is clicked on, causes the bitmap showing a red button to be replaced with a bitmap showing a green button.
- the bitmap pictures which may be placed in visuals are also preferably stored in the catalog structure in the album directory, as shown in FIG. 10.
- the treatment of visuals in a preferred embodiment of the system is further illustrated in the database documentation contained in Appendix B and the Flow Language documentation contained in Appendix A.
- the connection of visuals and other assets to the spatial video production through the use ofthe database and the spatial video edit system is a unique aspect of the system as is the fact that the viewport is the visual in which the spatial video of the present invention runs. However the concept and mechanics used for displaying visuals by the system is not unique.
- Spatial Video System Database Every production made with a preferred embodiment of the spatial video production system has two major components: media and data.
- the media component includes video, audio, bitmaps, cursors, etc.
- the data component is the complex set of pointers and characteristics, logic flows and other items that combines the media assets into a production. Every hotspot, for example, may have position information (X, Y, width, height) for every video frame in which it appears.
- the spatial video edit system is simply a software tool that allows users to enter, review, and tune the data in the production.
- Data fields used in the spatial video database cover all the "standard" types (e.g., boolean values, numbers, strings). More exotic data types like memo and large binary object are not used in the system database. In this respect, the spatial video edit system is very undemanding as almost any available database technology supports the data types used in the invention.
- the spatial video edit system permits multiple members of the development team to be editing the data simultaneously. Changes made by one editor are instantly visible to all other editors. This multi-user simultaneous access is an important aspect of the preferred embodiment of the spatial video edit system. While the spatial video edit process does not require custom database technology, it preferably requires a database technology which supports multiple user simultaneous access.
- a video asset may contain frames which have been designated by the edit system as a video clip associated with a clip group associated with a path.
- the frame may have a hotspot associated with a defined area of the frame.
- the hotspot when clicked on, may run a flow (user-specified game logic) that activates a receptacle located elsewhere on the frame and inserts an M-view into that receptacle. All of these entities (video asset, video clip, hotspot, flow, M-view, and receptacle) are described in records in the spatial video database.
- each entity description in the spatial video database refers to other entities.
- the database contains many references from one entity to another, it is important that a consistent reference form is established in the database structure. For example, the database description of a ROOM will identify the MAP in which the ROOM appears. Similarly a CLIP will list the ROOMs that appear in the CLIP.
- names of specific entities are associated with unique id numbers contained in internal database. Every time an entity is created it can be entered into a master registry and assigned an id. The id can be an arbitrary number with enough bits that its uniqueness can be assured. Preferably the id is a 32 bit number.
- the preferred database is a relational database, which must deal with references which are lists. Sometimes a reference from one database record to another is unique in the sense that only one item is referenced per record. For other references, such as OBJECTS in the production, there may be a list of properties which the object uses.
- the preferred approach to this problem is to define a separate table that records which objects have which properties. Each record in this set lists exactly one OBJECT and exactly one PROPERTY. There may be many records with the same value for the OBJECT field and many records with the same value in the PROPERTY field. However, for each combination of OBJECT and PROPERTY, there is at most one record.
- a database engine that comes with an interactive data viewing tool, and also allows one to generate programs that access the engine directly.
- the preferred database engine is Microsoft's JetTM engine, which supports data viewing through Microsoft's AccessTM program. Jet has sophisticated features to ensure the integrity of the data, and to repair damaged databases. It allows efficient access from Visual Basic. And the Access program can be used to examine and repair individual values in the database.
- Appendix B a typical edit database record format list is shown in Appendix B which is inco ⁇ orated herein by reference. As can be seen in Appendix B, the database edit information is collected in logically organized tables which can be worked upon by multiple users simultaneously.
- the production binder is the system which, among other things, creates a finished production from the assets, edit-time database and other edit-time structures.
- the editing step includes means for combining and managing assets captured during the capture step for inco ⁇ oration into a spatial video production.
- the edit system is the basic means of creating database entries which relate the various assets to the script and to each other in a spatial video production.
- the inco ⁇ oration task is performed by the spatial video editor. As shown in FIG. 11, the inputs to the editor 14 are the spatial multimedia script 18 and the captured assets 100.
- the output from the editor is a database 102 describing all of the relationships and behavior of the assets used in the production. This output and the media assets 100 are used by the binder 104 to create a final production.
- the editor 14 and the database it creates 102 provide the edit system user the ability to define clip exits, transitions, roundabouts and the like, as well as user interaction mechanisms such as buttons, hotspots, receptacles, menus, and animated cursors, to create logic flows and to assign these flows to various events.
- Events as used herein refers to a variety of synchronous and asynchronous aspects ofthe production.
- Mouse events include the standard asynchronous interactions the user has with the system by means of the mouse where the production developer has chosen to allow such actions to cause the system to respond. For example, a mouse event in which the cursor controlled by the mouse moves over an active hotspot or button or the like is such an event. Further, a mouse click while the cursor is over an active hotspot may be another such "mouse event.”
- the invention of spatial video as disclosed herein also introduces a new class of events sometimes referred to herein as "frame events.”
- Such events are various actions programmed into the production which are associated with a particular video frame.
- An example of such an event would be the definition of a hotspot or receptacle in a particular video frame. If a transition has been defined for a particular frame, a transition event would be another frame event associated with that frame. Similarly, a transition event should be paired with a clip exit event which would be another frame event.
- Creation of frame events is discussed in the sections of this document dealing with the edit system and the binder. Inte ⁇ retation and execution of frame events are further discussed in the section of this document dealing with the runtime system.
- a primary task of the edit system is to create an intuitive interface whereby the developer can construct the various database entries which associate the assets and logic flows in a meaningful way. While there are many tasks which the edit system and database perform which are not unique to spatial video productions and many aspects of such productions which are not unique, one unique aspect of the system is the idea of map-based editing where the map is actually a method of visualizing camera paths through the environment. As will be discussed in more detail below, one of the first tasks of the developer is to lay out camera paths on the map (called clips) and to associate video with these clips. This is done through use of the database to identify which video stream is associated with which map clip and which frame on the video stream corresponds to the beginning point of the clip and which frame corresponds to the end point.
- clips camera paths on the map
- the map actually becomes an association of video frames to two dimensional space.
- this revolutionary concept allows for many efficiencies in editing. It also gives rise to several interesting associations.
- the concept of the beginning and ending of a map associated video clip (the Markln and MarkOut points discussed below) are end-of-clip exits.
- intersections between two paths allow for transitions between a clip associated with one path (e.g. the north-south path) and a clip associated with another path (e.g. the east-west path).
- the point of such intersection shown on the map represents collectively the frames of each clip in the intersecting paths which were shot from the same physical camera nodal position.
- the editing of an intersection is actually the definition of a clip exit from one video clip to another at a specifically defined frame.
- roundabout where the player rotates to a different heading while maintaining the same position in three dimensional space, is actually a series of clip exits from one clip in a group of clips shot along the same path to another clip in the group which other clip was shot along the same path with a different camera heading.
- roundabouts are also frame specific as well as clip specific, they are defined in the database as designated "FROM" and "TO" frames on the specific clips. Once defined the associated clip exit can be used as a clip entrance as well if the player chooses to make the reciprocal turn or roundabout from the associated clip.
- clip exit is a record in the database used to define the frame in the FROM and TO clips where the user can execute a transition from one clip to the other as well as the type transition associated therewith.
- common transitions include left turns and right turns which in the context of transitions from a clip in one clip group to a clip in an intersecting clip group are called intersection transitions and in the context of a transition from one clip in a group to another clip in that same group are called roundabout transitions.
- Clip exits can also be associated with other types of transitions.
- a clip exit can be defined on a frame where the next scene to be viewed is from the same clip or from a different clip but at a point not spatially associated with the FROM clip. These are called “teleport” transitions and could occur when the player solves a puzzle or is otherwise allowed by the program logic to enter into a different environment. Other exits could be to a dramatic video sequence or some such event which could be called by the system's flow logic. These exits are called flow exits. Functionality of Edit Process
- Assets are any data (files) in a production which are generated outside of the edit system. These files remain outside of the Edit database because many of them can be quite large.
- the types of assets which the edit process typically utilizes are shown in FIG. 26: a. Video Assets - Files which contain digital video, with and without audio data, such as spatial and dramatic MPEG video streams, MPEG audio streams and MPEG system streams. b. Image Assets - Files which contain single images (bitmaps or pictures) for use in constructing the visible user interface c.
- Objects M-view Catalog Assets - Files which contain multiple bitmaps representing Multiple views of an object. d.
- Audio Assets - Files which contain MIDI data or files which contain digitized WAV or other digitized audio data.
- Position Data which contain camera tracking coordinates for a video clip.
- Cursors which contain single cursor images or animated cursor images.
- the edit system In order for the edit system to construct the spatial video environment, the edit system must know information about the assets. This is accomplished through the Asset Identification Process by which location and characteristics of the various assets are entered into the edit system. Such associations can be carried out by any method known in the art.
- the method of a preferred embodiment uses a Microsoft Windows compatible graphical user interface.
- the interface characteristics conform to the Windows conventions set out in the Microsoft Developer Network Development Platform: SDKs and Operating Systems, January, 1996, published by and available from Microsoft Co ⁇ oration.
- the various menus used in the preferred embodiment ofthe edit system are attached hereto as Attachment 1 , and inco ⁇ orated herein by reference.
- Attachment 1 and inco ⁇ orated herein by reference.
- the concepts and functions set out below are independent of the Microsoft paradigm and could be adapted to any operating system.
- the Edit system asset management is accomplished using the library window.
- the library window contains a "tab" control with one tab for each of the asset types which are utilized in the editor.
- the screen representing this window is shown on the fifth sheet of Attachment 1.
- the Library window displays a listbox of all the assets of that type.
- the listbox contains a representative thumbnail for the asset, The name of the asset and some of the significant properties of the asset.
- the representative thumbnail may be symbolic (e.g., A representation of a sine wave for digital audio) or it may be a reduced thumbnail image extracted from the asset (e.g., The first frame of a video file).
- the asset identification process consists of three steps: user's identification of an asset file through the system's file dialog window; system's examination of the file to determine its attributes; and system's update of the database with records identifying the asset selected by user.
- User begins the process by identifying the file sought through the system file dialog window. The user activates the editor's main window menu item for files which in the preferred windows environment is titled "File,” and clicks on the "Import Media" command which is then displayed. See Attachment 1 , sheet 1. Clicking on import media displays a system File Open dialog which allows the user to select a file. In the next step the edit system examines the selected file to obtain its attributes. Based on the type of the file, different relevant information is gathered.
- MIDI data For MIDI data, the relevant attributes are:
- the relevant attributes are: The number of frames (if the cursor is animated)
- the SMPTE time code ofthe source video to which this data corresponds is the SMPTE time code of the source video to which this data corresponds.
- the edit system updates the database (described below) with new records to identify the assets. These new records are created to record the data selected by the user. An ID is allocated for the asset and the file location is stored in the Library table. Other relevant information is stored in the respective asset table. All references to these assets within the software is through the ID.
- ID is allocated for the asset and the file location is stored in the Library table. Other relevant information is stored in the respective asset table. All references to these assets within the software is through the ID.
- Camera tracking data it is necessary to read the camera position records and store them as records in the Frame position tables. This involves creating a record in the Frame position table for each ASCII record in the camera track file and copying the values into the table. See text accompanying FIG. 7. Organize Video Clips on a Map.
- a central task in creating a spatial video production is to organize portions of the motion video captured by the camera which portions reflect the movement of the camera along a natural path in the environment. These path associated video segments are called clips. In every spatial video clip the camera moves along a path. It is a useful aspect of the Edit system for the system user to be able to see these paths plotted on a map.
- FIG. 13 shows a portion of such a map showing walls 126 as solid lines and a camera path 124 as a dotted line. Pages 6 and 7 of Attachment 1 show the preferred interface for displaying and drawing a map.
- the display also makes it easy to find desired frames in a clip. To see the frame where the camera passed the suit of armor, click on the camera path where it comes nearest to the icon representing the suit of armor receptacle.
- the map display also makes it easy to identify path intersections. How to identify path intersections will be discussed in more detail subsequently.
- the user may zoom in or out, and pan (scroll) horizontally and vertically, and rotate the map as desired. Zooming allows one to see more or less detail in the map. Panning shifts the view to a different portion ofthe map. Rotation could be used to make the top of the map correspond to a direction other than north. Straight lines may be drawn on the map to represent walls, furniture, or other objects for pu ⁇ oses of giving context to the camera paths.
- a clip group typically contains all the clips that share the same path.
- the group is represented in the map by a path of double thickness, as opposed to the thin line representing the single camera path.
- the user of the map editor may click on a double thick path representing a clip group and then select any of the individual clips in the group.
- the edit system uses a map tool which is a draw-like tool, driven both by user input and by the camera positional data in the system database.
- the functions of zooming, panning and rotation are standard, as is the ability to draw and display the wall segments (straight lines).
- the map tool creates a two dimensional representation of the environment wherein lines represent the path of the camera through space as well as other significant features of the environment.
- the edit system map tool allows the user to draw camera paths on the map. This is useful for planning shots before any actual camera position data is available. It also supports the required functionality for these planning sketches to be used through to the completion of the production even when camera position data is not available.
- All geographic coordinates in the map are stored in units or millimeters measured from an arbitrary origin.
- the drawing of camera paths is straight forward in those cases where the database has been populated with frame position data for every video frame. In such circumstances the primary task is to normalize coordinate systems so that the coordinates of the frame position data place the clips on the map at the proper map coordinates.
- the map tool allows path drawing in two formats: a series of smooth curves or a series of straight lines. In the case of straight lines, the line segments are joined end to end and the corners where they meet may be hard points or rounded into smooth curves, with the degree of rounding controlled by the user.
- the edit system can determine which camera path the click is on; when the user clicks on a path, the system can determine which frame number of the clip (or frame numbers of the frames in each clip of the group of clips represented by the path) was captured at the location of the click; and, given a clip and frame number, the system can find the pixel in the map display corresponding to this clip and frame number.
- clips are associated with camera paths by drawing the clip paths on the map (or associating camera tracking data) and associating the appropriate video asset with the clip and identifying its Markln and MarkOut points. These Markln and MarkOut points are recorded in the database.
- FIG. 14 The steps for laying out clips in the map editor is illustrated in FIG. 14. It is important to understand closely related concepts relating to video productions. Dramatic video is normal, lip-synched non-navigable, linear video. Video assets as usually used herein refer both to dramatic video assets, which may also be included in a spatial video production, and to spatial video assets. Spatial video assets may or may not be associated with a camera path on a spatial video map. A spatial video asset is inco ⁇ orated into a production when it, or a portion of it, is associated with a "clip.” A "video clip" is a logical exce ⁇ t from a spatial video asset which has been associated with a map clip.
- a clip or map clip is a camera path line on the map ofthe invention with which a spatial video asset is to be associated.
- a clip group is a group of clips which share the same camera path location on the map, but have different camera headings, such as a forward facing clip, a rearward facing clip, a left facing clip and a right facing clip. Consequently, in the edit system a clip may begin as a camera path line on a spatial video map. At the time it is created it usually incomplete. Clips have a property which requires the association of a spatial video asset. Thus, a completed clip will always have a spatial video asset associated with it.
- STEP ONE Create a Map.
- the clip layout process is iterative with the various paths being laid out on the map and the video clips being associated with their respective paths.
- the user creates a Map by clicking on the New Map command button on the Main menu and toolbar.
- the new map button is the second button from the left on the primary screen and contains a symbol resembling a compass rose. This results in the allocation of a new ID and the creation of a new record in the MAP Table of the database.
- the Map Editor tool see Attachment 1, page 7, is then displayed and initialized to edit the video clips for the newly created Map record.
- STEP TWO Draw the first clip in a group.
- the user then creates a new clip by clicking on the New Clip command button on the Main menu and toolbar of Map Editor, Attachment 1, page 17.
- the map must be initialized with an empty Frame position record to start drawing the camera path so a new record for the video clip is added to the Frame Positions Table with null values (0) for X, Y, and Z and passed to the map tool.
- the map tool is then set in the "draw new clip" edit mode.
- the user clicks on the points to define the camera path and right mouse clicks to terminate drawing.
- All smooth curves used for the camera path sketches are preferably Catmull-Rom splines. Curves drawn with a drawing tool which generates Catmull-Rom splines are preferred because use of
- the corner can be rounded to varying degrees as follows. First map the degree of rounding desired into a fraction between 0 and 0.5. Zero means no rounding; one half means maximum rounding. Call this fraction R. As the two line segments converge, erase the part of each that is closest to the intersection. R determines the fraction of each line to erase. Fill in the resulting gap with a Catmull-Rom spline. Each such spline requires four control points. Use the four endpoints of the (partially erased) line segments. Note that this technique is used to generate curves within a continuous line at places other that line intersections.
- STEP THREE Generate the Inverse Clip.
- an inverse clip can be generated by duplicating the first clip and then changing its properties.
- An inverse clip is a clip which shares the same definition points as another clip but in reverse order. For example if clipl starts at point A and goes to B facing point B, the inverse of clipl starts at point B and goes to point A facing point A.
- a video clip must be associated with the inverse clip. This process is described in detail below.
- the video clip selected should be the one which corresponds to the camera traveling opposite the forward direction.
- the appropriate (backward facing) video clip should be associated with the inverse clip and edited.
- STEP FOUR Generate Left Facing Clip.
- the left facing clip can also be generated by duplicating an existing clip in the group of clips and then changing its properties. In order to do so one selects an appropriate member of the group (which has the same direction as the Left facing video) from the list of clip members in the right mouse button pop-up menu and then select "Create Duplicate Clip" from the right mouse button pop-up menu. If the Selected clip is not in a group already, this command allocates a groupID and adds a new record to the Clip Group Table. The original forward clip is then assigned as a member of the group.
- a new record is added to the Video Clip and Spatial Clip Table for the duplicated clip, new FramePosition records are created and added to Frame Position Table (based on the same frame positions for the source clip), and the new duplicate clip is set as the active (displayed) clip in the group.
- a video clip must be associated with the left facing clip. The video selected should be one which corresponds to the camera traveling on the same path as the forward clip but with the camera facing left.
- STEP FIVE Generate Right Facing Clip. Select an appropriate member of the group (which has the same direction as the right facing video) from the list of members in the right mouse button pop-up menu and then select "Create Duplicate Clip" from the right mouse button pop-up menu. If the Selected clip is not in a group already, this command allocates a groupID and adds a new record to the clip group table. The original forward clip is then assigned as a member ofthe group. A new record is added to the VideoClip and SpatialClip table for the duplicated clip, new FramePosition records are created (based on the same frame positions for the source clip), and the new duplicate clip is set as the active (displayed) clip in the group. As in the other cases the appropriate right facing video clip must be selected and associated with this clip in the now completed clip group.
- STEP SIX Are all paths done?
- the final step in the clip layout and video clip association process is to determine whether all members ofthe clip group are associated with various paths. If all paths in the video environment are defined and the video clips associated with each clip in each clip group, then this part of the process is completed. Otherwise repeat the steps, beginning at "Draw the first clip in a group" as shown on FIG.14, until done.
- STEP ONE Select Clip. To select a clip on the map left-mouse-click on the line on the edit system map which represents the clip. The edit system responds to a successful clip selection by positioning a small red block which represents the camera position on the path. Since spatial video edit system only displays one clip in a group at a time, alternate members of the group may be selected by selecting the appropriate member, from those displayed in the right-mouse-click pop-up menu.
- STEP TWO Invoke Property Sheet. After selecting the clip, invoke the Clip property sheet by right-mouse clicking on the clip and selecting the Property command in the pop-up menu.
- STEP THREE Select Video Asset.
- the property sheet which is displayed by selecting the property command shows all of the fields in the Video clip table which correspond to the selected clip. See Attachment 1, page 19. In place ofthe video file field is a drop down list representing all the video assets currently identified in the asset database. Select the appropriate video asset which corresponds to this path. When the video asset is selected the other clip properties (Markln and MarkOut) are updated accordingly. Select the Apply or OK commands and the changes made to the property sheet are updated in the database.
- STEP FOUR Invoke Video Editor. To invoke the video editor (If one is not already displayed), right-mouse-click on the clip and select the Show Video command from the pop ⁇ up menu. See Attachment 1, page 20. This command displays the spatial video edit system's video editor.
- STEP FIVE Select In Frame. Use the video editor Play, Stop, Rewind, Step (Forward, or backward) command buttons to find the appropriate first frame in the clip which corresponds to this path. In this mode the video editor invokes the MPEG player to play the video, allowing inspection ofthe video and visual selection ofthe appropriate frame.
- STEP SIX Markln and Cut. Use the Markln button or the Markln dial to set the
- STEP SEVEN Select Out Frame. Use the video editor's Play, Stop, Rewind, Step
- STEP EIGHT MarkOut and Cut. Using the MarkOut button or use the MarkOut dial to set the MarkOut to the currently displayed video frame. Use the CutAfter MarkOut Button to set the MarkOut for the video clip relative to the video asset. This updates the database with the new MarkOut for the clip. This completes the process for associating and editing
- Intersections are points on the map where 2 or more video clips or groups of clips intersect and it is desired to define video transitions which allow the player to traverse from one clip in the intersection to another during the spatial video production. Once such transitions are defined, the user of the final production has the option to execute the transition (usually a turn) at the intersection when playing the production.
- FIG. 16 depicts an intersection of a north south clip group and an east west clip group. In a true spatial video map, the four paths of the group members of each group would be depicted by a single double thickness line. In FIG. 16 the various clips in the clip group are expanded for pu ⁇ oses of illustration. A single clip group member could be accessed for editing or viewing as described above. The four component lines F (forward) L (left) R (right) and B (backward) are shown as four individual lines for each clip for ease of comprehension.
- FIG. 13 shows an intersection 118 as it would appear on a map in the map editor.
- the spatial video that the edit system preferably uses records scenes captured by a moving camera. For example, one clip might show the camera's view when crossing a room from north to south, while another clip may show a walk around the perimeter of the same room.
- the map window in spatial video edit system can represent each video clip by drawing the path that the camera took in capturing the clip. Of key importance to the production are the intersection points of these camera paths: places where one path crossed or intersected with another. Each such point is an opportunity for the production to allow a smooth transition from one clip to another. If two camera paths intersect at right angles as in the north-south and east-west example illustrated in FIG. 16, the process herein described can allow a player moving north along the first path to turn right, and find himself going east along the second path. Of course, as seen in FIG. 16, there are a host of other turns which the editor could enable and allow a runtime user to choose.
- intersection clip exit where the player makes a turn transition from one member of one clip group to a member of a different group.
- the other is the roundabout clip exit, where the player makes a turn transition from one member of a clip group to another member of the same clip group.
- turn transitions are the 2B-1B transition 110 and the 2F-1B transition 112 shown in FIG. 16.
- roundabouts are transitions 1F-1L 114 and 2B-2R 116.
- a turn transition such as the 2F-1B right turn 112 in the example discussed above, will be executed only when the player has expressed a desire to turn right.
- Identifying Clip Intersections A number of approaches can be used for identifying the intersection points in a production. Since the camera paths are not constrained to be straight lines or simple curves, it is difficult to predict intersection locations without an exhaustive search. Quad tree technology would have allowed us to subdivide the area covered by the map into successively smaller rectangles, and continue subdivision until each rectangle met one of the following criteria: the rectangle contained at most one camera path, or - the rectangle was sufficiently small that all its points could be searched for intersections.
- This quad tree technology produces a comprehensive database of all intersections in the production, but it requires considerable compute time. Moreover the resulting database would be invalidated every time a new clip was added, or an existing clip was moved. Such an invalidation might necessitate a repetition of the time-consuming process of searching for intersections: an outcome that might inconvenience the system user.
- the preferred approach is a dynamic one: identify only those intersections currently on the screen, and do it quickly enough that the system user does not have to wait.
- identify only those intersections currently on the screen and do it quickly enough that the system user does not have to wait.
- repeat the search for intersections in this new view (again so quickly that the user does not wait).
- a clip is moved, or a new clip is added, repeat the search again.
- the search is done by successively drawing each clip into off-screen memory, using an additive coloring process that ensures any pixel that was colored twice ends up a different color from those colored once or not at all.
- Step ONE Allocate Image Buffer.
- the edit system preferably uses twice the current resolution of the screen, so if the current map window is 400 x 300 pixels, the edit system will allocate a memory buffer big enough to hold an image 800 x 600 pixels.
- STEP TWO Clear Image Buffer. Second, clear the image buffer to all zeros.
- STEP THREE Draw first path. Third, draw the first path visible in the current view into the buffer using the first color, (say color "1"). All pixels in the buffer should now be zero if they are not on the first path, or "1" if they are. For each remaining clip in the current view, repeat the following steps: STEP FOUR: Draw new path. Draw the new path using a second color (say "2"). Use an additive graphics mode that adds (or otherwise combines) the new color onto the old. The result should be that each pixel in the buffer is zero if on neither path, one if on only an old path, two if on only the new path, but a different value ("3" in this example) if on both old and new paths).
- STEP FIVE Search Buffer. Search the image buffer for colors 2 and 3. Change all pixels that are 2 to 1. (They now represent an "old" path). Each pixel that is 3 indicates a new intersection. Add a description of this intersection location to a list of intersections found, then change the pixel back to 1.
- STEP SIX Repeat. Repeat steps four and five for the next path.
- the list generated by the above process will likely contain duplicates. Depending on the current drawing scale and the widths of the camera paths, the overlap between two paths may comprise several contiguous pixels. Duplicates can be pruned from the intersection list either at the end, or each time a new entry is added. Most of the compute time consumed by the above process is spent searching the image buffer for pixels of colors 2 and 3. It is therefore desirable to make this code efficient.
- processor architectures support fast searching for bytes or words of a particular value.
- SCASB In the Intel family, for example, the SCASB instruction results in a quick search and is a preferred embodiment. SCASB stands for "scan string bytes.” It scans consecutive bytes in memory looking for a specified value.
- a pop-up menu populated with all the reasonable choices of intersection transitions for all the video clips involved in the intersection is displayed. This list is computed as all the 2 pair combinations of video clips for each member of the video's in one group to members in the other group.
- the edit system looks in the ClipExits Table to determine if an intersection already exists between these two clips at a nearby frame. If an intersection already exists, then the user is prompted by a message box which pops up if the user wants to edit the existing intersection, add a new intersection, or cancel. If the user selects "Edit”, then the intersection editor is initialized with the corresponding clipexit. If the user selects "AddNew" or if no intersection exists then the intersection editor (called the ClipExit Editor in the preferred embodiment) is initialized to edit a new intersection using the requested two video clips.
- the ClipExit Editor is seeded with the two ClipIDs for the clips in the requested intersection, FromClipID, ToClipID.
- the ClipExit Editor queries the map tool to determine the Frame numbers for the two clips at the intersection point.
- the frame numbers are used to initialize the clipexit editor which displays the two video clips, side-by-side with the "suggested" frames.
- There is a command button to swap the positioning of the two video frames so that the alignment of left or right turns produces a continuous image from one video to the next.
- the user then can fine tune or adjust the suggested frames by using positioning controls so as to produce the optimal image continuity.
- the user must also select the appropriate transition from the list of transitions. Typically this is the push left transition for left turns and the push right transition for right turns. The specific manner in which the runtime system creates this transition is disclosed in the co-pending Special Effects Applications.
- teleportation type jumps or other requirements lead to opportunities for using other transitions as well. While some exits are seamless transitions, teleportation jumps occur when there is no smooth visual transition, such as between two spatially unrelated frames.
- the user must also select the appropriate signal to trigger the clipexit. This is specified in the ExitDirection Field. For example if the field is set to ExitLeft, then in the production a left turn signal (either using the mouse, keyboard, or onscreen control) will match with this clipexit and it will be activated when the number of the From Video frame is encountered.
- the runtime system has a concept of QueuedExitlD. This is a variable which indicates whether a clipexit should be taken when it is encountered.
- QueuedExitlD When the user moves the mouse to the left or right of the screen, the runtime system constantly updates a global property named QueuedExitlD.
- QueuedExitlD When the mouse is on the left of the screen Queued ExitID is set to LEFT; when it is on the right, it is set to RIGHT, and when it is in the middle it is set to DEFAULT.
- the clipexits all have a fixed unique ID and a common symbolic ID.
- the symbolic ID's refer to symbolic directions (left, right). These are usually mapped by flow definitions in the production from some user signal (e.g.
- roundabouts are another type of clip exit.
- a roundabout is a location on the clip path where the player may execute a rotation.
- the player can pan anywhere within a 90 degree frame anywhere along the path. However, should the player wish to rotate or pan further than 90 degrees, visual information is needed from a second clip.
- a roundabout is a clipexit transition where the transition is from one clip of a group of clips to one of the two clips of that group of clips which contain visual information immediately adjacent to that contained at the left or right margins of the clip being navigated.
- roundabout transitions can occur from the forward clip to either the left or the right.
- backward clip roundabout transitions can occur to either the left or the right clip.
- the forward clip will be panned fully to the right and the first transitional frames will contain information from the far left margin of the matched right clip frame.
- FIG. 13 shows a roundabout 120 as it would appear on a map in a map editor of the preferred embodiment.
- a roundabout symbol is also visible on page 18 of Attachment 1.
- intersection clip exits by definition can only occur at the intersection of at least two paths. Roundabout clip exits, in contrast, can occur anywhere along the clip path, provided the developer defines them.
- the actual transitions in either case are performed according to the combination panning and push transition methods disclosed in the co- pending Special Effects Applications previously mentioned.
- Upcode or recode video Another task which is preferably accomplished at edit time when MPEG or a similar reference frame system of digital video is being used is to upgrade all FROM frames in the video stream to I or P frames, and to create an I frame copy of all TO frames. If strict conformance to MPEG standards is desired, FROM frames must be I or P frames because they are used as reference frames during the transitions. The actual frame in the video stream must therefore be modified to be I or P if it was initially encoded as a B frame. If strict conformance to MPEG standards is not necessary, the method set forth in the co-pending Special Effects Applications can be used.
- the copy ofthe TO frame that is present in the TO frame cache must be an I frame so that it can be decoded independently at the time the transition is to occur.
- the actual TO frame in the target video stream need not be changed, since the frame-accurate stream positioning technique described in the co-pending Special Effects Applications can be used.
- the TO frame for one transition is likely to be the FROM frame for another transition. This is true, for example, of all frames involved in roundabout exits or intersection exits.
- some MPEG frames will need to be converted from B or P to I frames. This conversion process is called upcoding.
- Upcoding can be performed by decoding the selected frame into its luminance and chrominance values, then encoding those luminance and chrominance values as an I frame.
- the video stream may need to be fully or partially re-encoded. Changing one frame from B to I or P will change the reference frames used to decode all frames between the changed frame and the following I frame, so the frame cannot just be modified and rewritten blindly.
- a software re-encoder could analyze the stream, decode the appropriate frames, and re-encode them, or a list of FROM frames could be produced and the entire stream could be re-encoded by an encoder that accepts such a list.
- the preferred embodiment does neither of these, instead choosing the first of the three alternative implementations described below.
- the edit system could restrict the user from specifying a B frame as a FROM frame. This is not considered acceptable in most instances since it would be virtually impossible, given the desired B frame frequency, to assure that all four frames on all four clips involved in a roundabout or intersection would have I or P frames at the appropriate points.
- video clips are generated by rendering computer models of a three- dimensional scene, the exact positions of each frame will be determined by the model and path used, so the technique could be applied.
- upcoded I frame copies ofthe FROM frames could be added to a FROM frame cache.
- the copy in the FROM frame cache could be used for the transition instead of the actual frame.
- This approach greatly increases the memory storage requirements at runtime, the overhead time spent on the clip, since the cached upcoded FROM frames would have to be read from the stream, and the size of the stream on the disk.
- the second and third alternatives are not preferred.
- Smooth transitions accomplished by the method of the co-pending Special Effects Applications required use of an MPEG player which can accept input from a buffer, rather than merely from a disk file. Further, they preferably are implemented through the use of a streamer, such as the MPEG streamer discussed above which is defined in Appendix C hereto.
- the MPEG Streamer function to play a transition is implemented in such a streamer by first requiring the streamer to execute an MPSChopStream call to terminate the MPEG stream at the current point. Next a valid GOP header is copied into the output buffer area. Third, the cached TO frame data is copied into the output buffer area. Finally the B frame data for the specified transition is copied into the output buffer area. This allows the streamer to create the "synthetic" MPEG stream of the transition as described in the co- pending Special Effects Applications. Hotspots
- Hotspots are an important way to support user interaction in the product generated by the spatial video edit system.
- a hotspot is a region of the screen that the player can click on to make something happen. (Click on the vending machine and it produces a gum ball, click on the statue and the genie appears).
- region hotspots are often used in the user interface outside the video window to define exit buttons, inventory buttons and the like. They can also be used within the video window for navigation pu ⁇ oses. See, for example, FIG. 27 and the text accompanying this figure.
- the screen navigation regions are essentially region hotspots which relate to panning and turn transitions. Unless specifically referred to as region hotspots, the references to hotspots in this specification are to spatial video hotspots.
- hotspots are rectangular in shape, however hotspots can be any shape desired and remain within the invention.
- triangular hotspots are an obvious variation.
- it is usually more computationally demanding to create hotspots of other shapes, therefore unless the production demands it, rectangular hotspots are preferably used.
- If one wants to respond to clicks on the statue one defines a rectangular area that approximates the shape and size of the statue. Since the statue is not rectangular, this will result in small imperfections: either there will be small pieces of statue that are outside the hotspot, or there will be pieces of non-statue that are inside the hotspot (or both). In practice this seems to be unimportant: users tend to click the middle of the statue anyway.
- the spatial video edit system user can define a series of hotspot rectangles that collectively cover the desired shape.
- the edit system user will want to indicate where in each scene the hotspots are, and what should happen as defined in the act/scene editor when the user clicks on each one. Since spatial video uses moving cameras, this task has the potential to be much more difficult than it is for still-camera productions.
- a moving camera produces video clips in which everything changes position and size from one frame to the next. As the camera approaches the statue, the statue grows in every frame. As the camera moves left to pass the statue, the image of the statue moves more to the right in each frame. For a hotspot to track the statue, the hotspot must change in size and position in each video frame.
- the spatial video edit system user needs a simple way to make the hotspot follow the statue, without having to manually position it in every video frame.
- both the spatial video edit system and the binder engine support the same inte ⁇ olation/extrapolation algorithms that will take one or more user-specified hotspot frames, and "fill in” the missing ones to allow smooth size and position changes between the user-edited frames.
- the edit system user must manually position the hotspot in the first frame that it appears in.
- the edit system user must also identify the hotspot rectangle in the final frame (for the current clip) in which the hotspot appears. Editing of other frames for the hotspot is optional: the system will use linear inte ⁇ olation for hotspot position and size between the user-edited frames.
- the edit system supports viewing the frames in which the hotspot appears, with the hotspot rectangle overlaid on the video in a bright color.
- these frames are played successively (as motion video), it is easy to visually check the motion of the hotspot rectangle and see how well it tracks the clickable object (the statue in this example). If the rectangle wanders away from its target, the user can specify its position and size in any frame. Inte ⁇ olation will then change to accommodate this new value.
- the spatial video edit system user edits only a small fraction of the hotspot frames, finding that inte ⁇ olation produces satisfactory hotspot sizes and locations in the remaining frames.
- the binder performs the linear inte ⁇ olation/extrapolation of the hotspots and creates as frame by frame list of hotspot locations and dimensions in the runtime data hunk.
- An alternate implementation lets the runtime system perform the inte ⁇ olation rather than the binder. This requires increased CPU time during runtime, but reduces the memory needed for the list.
- the system allows the play of the hotspot frames to be played backwards or forwards using the methods disclosed in the co- pending Special Effects Applications.
- Storage of rectangle positions and sizes are done using the coordinate space of the source video stream. For example, if the frames in the source are 352 x 240 pixels, then all rectangle measurements will run 0-351 in the X dimension, and 0-239 in the Y dimension. Relating the measurements back to the source (rather than to the current display, for example) produces numbers that are still useful when the video is later displayed at a different magnification.
- the edit system uses the SoftMotion player available from SAS Institute Inco ⁇ orated.
- the edit system fetches from the database all information regarding hotspot rectangles currently defined in the clip. This information is kept in memory, where it is available for quick access any time the video editor is showing a frame that may contain hotspots.
- the in-memory copy of the hotspot data is searched for rectangles that need to be drawn.
- the position and size of any hotspot rectangle in the frame will be determined in one of two ways. If the rectangle was edited by the user for that frame, then exact values, as specified by the user, will have been stored in the database, and are available in the above mentioned in-memory copy of that data. Other frames will require inte ⁇ olation. Intc ⁇ olation will be explained below.
- the rectangle may be drawn on top of the video, optionally using one color for user-specified points, and a different color for inte ⁇ olated points.
- the color of that path may optionally highlight those frames that contain rectangles for the current hotspot. Since the database records the first frame and last frame for each hotspot in each clip and each point in the camera path is drawn in one color if its frame number lies between the first and last hotspot frames and a different color otherwise, this is a straightforward operation.
- the video editor supports a variety of editing modes.
- the current mode may be changed by the user through the edit system's push buttons and menu system as described above and further illustrated in Appendix A.
- hotspots arc not being edited. They are all drawn in the same color, regardless of whether their rectangles are interpolated or user-specified.
- no new hotspot can be added.
- the user may select a hotspot by clicking on its rectangle.
- the name of the currently selected hotspot is shown in the window caption, and the segment ofthe camera path that contains that hotspot is colored in the map (if that option is enabled).
- a special edit mode is used for changing existing hotspots. For this discussion it will be called the "change hotspot" mode.
- the system tracks one hotspot as the one currently being edited. All other hotspots are drawn in the same color as they were in the default edit mode.
- the rectangle for the selected hotspot is drawn in one color for a user- specified rectangle, and a different color for an inte ⁇ olated rectangle.
- the user can drag any edge or corner of the rectangle to change its size, or drag any interior point to move the entire rectangle without changing its size. Any rectangle that is so adjusted becomes a user- specified rectangle (not an inte ⁇ olated rectangle).
- the database is updated to reflect the new size and position, and the color of the rectangle is changed (if necessary) to the color used for user-specified rectangles.
- the user can use the standard video player controls to move to any other frame of video, thus allowing him to see the hotspot rectangle in any frame. If the user positions to a frame outside the range of frames currently containing the hotspot, the hotspot's range will be extended to include that frame. The user can also play the frames as full motion video, with the hotspot rectangle overlaid in each frame. This allows the user to determine whether the rectangle moves appropriately to track the scene element that the user will click on.
- a third edit mode is selected to add a new hotspot to the video. For this discussion it will be called the "add hotspot" mode. It is entered when the user requests it with no hotspot currently selected.
- This mode is used to establish the first data point for a hotspot.
- the user drags out a rectangle on the video to define the hotspot location in the current frame.
- the rectangle When the rectangle is completed the user will automatically drop into the previous ("change hotspot") mode where he can adjust the new rectangle and extend the range ofthe hotspot to other frames.
- Inte ⁇ olation is an important aspect of this process.
- the database contains descriptions of only the user-specified frames, the editor preferably keeps a table in memory with hotspot size and position information for all video frames. Because this table describes every frame in a hotspot's range, no new calculations need be done when drawing hotspot rectangles on a new video frame: the answers are already in the table. This quick response is important when playing full motion video.
- Each frame description in the table contains a boolean field to distinguish user-specified values from inte ⁇ olated values. This is important because a single user edit operation (like moving a hotspot rectangle in one frame) could invalidate the table entries for a whole series of frames. After every such action, the edit system re-inte ⁇ olates the hotspot data and updates its table in memory.
- the first step in preferred re-inte ⁇ olation process is to discard all values that were not user specified (i.e. that were the results of previous inte ⁇ olations).
- the edit system then processes the user-specified frames in pairs, considering the first and second, then the second and third, then the third and fourth, and so on. If the pair represent adjacent frames of video (frame numbers differ by one), then there are no intermediate values required between them. In that case the system moves on to the next pair of user-specified frames. Otherwise the system generates values for all the frames between the current pair as follows. Simple linear inte ⁇ olation is used for the X- and Y-coordinates of the rectangle center. Simple linear inte ⁇ olation is also used for the height and width.
- object has been used herein according to its ordinary meaning to describe objects in the video environment as well as objects overlaid on the video environment, such as M-views.
- object class and “object instance” as they arc used in the preferred spatial video edit time and runtime systems to describe properties of overlays. It is important to appreciate the difference between the concepts of object class, object instance and M-views.
- the edit time system supports the concept of an object class.
- An object class is a definition of a collection of variables that describe the attributes and behaviors of a user-defined object such as a "ball” or "inflatable object.” Characteristics common to all members of the class are described in the object class definition in the edit time database. Such characteristics arc described by the flows that control the object's behavior and state variables that describe the object's current state.
- An object instance is created from the object class definition and, in addition to containing a reference to the object class, is represented by its own collection of variables which are distinct from any other object instance created from the same class. For example, there can be multiple ball instances created from the ball object class; each ball instance can have its own appearance, location and other attributes. While the flows associated with an object are generally contained in the object class definition, object instance attributes which may be unique to a particular object instance can be used by the object class flows to determine how the particular object instance reacts. All required object instances for a production are specified during editing time and space for their variables, including pointers to the object class, is allocated in the data hunk at bind time. The object class definition and flows associated therewith are in the data hunk and are available to the object instances.
- Object instances can be placed in a receptacle.
- a state variable in the object instance tells which receptacle.
- Object instances have an appearance.
- Another state variable in the object instance tells which M-view represents the current appearance.
- Object instances may have other state variables defined by the editor such as "current inflation status.” Flows change state variables by assigning new values to them. For example, when two object instances interact, such as the ice pick in the users "hand" and the basketball in the table receptacle, both the ice pick and the basketball have flows which run and cause one or both object instances to change state.
- the ice pick may have a "sha ⁇ ness" state that decreases and the basketball's inflation state and appearance may change.
- Flows can also be associated with an object to govern various types of interactions with other objects or with receptacles. Flows can be triggered by synchronous or asynchronous events.
- An object instance is the individual object class member that may appear in and/or interact with the production.
- an object instance may refer to the object class variable for various characteristics.
- the object class variable information is copied directly into the runtime object instance variable.
- An object instance is a created object with an assigned initial location, flows and other variables.
- Object instances consist of a collection of data items ("properties") which specify such things as the M-view associated with the object when it is displayed in the spatial video, cursors used to represent the object after the user has picked it up, flows to run when the object interacts with other objects, and so forth.
- data items such things as the M-view associated with the object when it is displayed in the spatial video, cursors used to represent the object after the user has picked it up, flows to run when the object interacts with other objects, and so forth.
- the system allows the user to add data items associated with objects.
- An object instance variable either at edit time or at runtime has several properties associated with it, including the M-view that is used to represent its current state.
- the same action which causes a change in state could trigger a flow which changes the M-view associated with the object instance.
- a single M-view could be associated with several different object instances. For example, there could be several inflated basketballs at different locations in the environment. Each individual basketball would be an object instance. However, a single inflated basketball M-view could be associated with all of these object instances. This also allows for computational efficiency.
- a different M-view may be associated with another object instance of the same general object.
- the basketball object class could contain a blue basketball subclass and one or more blue basketball object instances as well as an orange basketball subclass and one or more orange basketball object instances.
- Blue basketball M-views could be associated with the various states of the blue basketball object instances.
- Orange basketball M-views could be associated with the various states of the orange basketball object instances.
- object instances As an example of how object instances are used, suppose a production has a basketball object instance in an inflated current state with a current position on a table an a room of the production environment. The M-view associated with the object instance depicts an inflated basketball. Further, suppose the user has previously picked up an ice-pick object instance. The user's cursor would be a representation ofthe ice pick. If the ice pick is clicked on to the inflated basketball, the basketball should deflate.
- the user would hear a hissing sound, caused by a flow associated with the interaction of the ice pick and the basketball which triggers the playing of audio clip of a hissing sound stored as a .WAV file or other sound file, and the basketball object instances' M-view would be replaced by a burst basketball M-view.
- the change in the basketball's state would probably need to be noted elsewhere, probably in some sort of state variable in the basketball object instance.
- the basketball object instance would be changed in that the state would now be set at burst, the M-view and cursor associated with that object instance would be correspondingly altered.
- An important feature of the spatial video production system is the ability to augment a video scene with bitmaps of objects. Footage of a bare tabletop can, at run time, can show a table with a vase on it, or a frog, or any other bitmap image. Information about what object to draw, where to draw it, and how to draw it must be entered during editing so that it can be fed to the run time engine when needed. The entire operation would be simple were it not for the fact that spatial video uses moving cameras. As the camera approaches the table, the viewer expects the object on the table to grow in size on each frame, just as the image of the table grows in each frame. And as the camera begins to go past the table to one side, the angle that the object is viewed from must continually change.
- the moving camera makes the job more difficult in two ways. First, we must be able to draw each object in varying sizes and from varying points of view. This aspect is covered in another section of this document. Second, we must supply unique location, scale, and angle data for each object in each frame.
- the spatial video edit system supports the creation, review, and modification of this data.
- receptacle is used to refer to a location in a video clip where an M-view may be drawn.
- both spatial video edit system and the binder support the same inte ⁇ olation algorithms that will take one or more user-specified receptacle frames, and "fill in” the missing ones to allow smooth scale factor, position, and angle changes between the user-edited frames.
- the spatial video edit system user must manually position the receptacle in the first frame that it appears in.
- the spatial video edit system user must also identify the receptacle location in the final frame (for the current clip) in which the receptacle appears.
- Editing of other frames for the receptacle is optional: the system will use smooth inte ⁇ olation for receptacle position, scale factor, and angle between the user-edited frames.
- the edit system supports viewing the frames in which the receptacle appears, with the receptacles represented in the video both by a rectangle, and by an arbitrary M-view object. When these frames are played successively (as motion video), it is easy to visually check the motion of the M-view object and see how well it tracks the background. If the M-view wanders with respect to the video background, the user can specify its position, scaling factor, and angle in any frame. Inte ⁇ olation will then change to accommodate these new values.
- the edit system user edits only a small fraction of the receptacle frames, finding that inte ⁇ olation produces satisfactory receptacle appearance in the remaining frames.
- a receptacle will be represented on the screen in two ways: both as a rectangle, and as an M-view.
- the M-view lets the system user see the end result (what the ultimate product user will see).
- the rectangle gives a convenient graphical control for changing location and scaling factor. (The size of the rectangle is proportional to the current scaling factor).
- the spatial video edit system provides these services:
- Scale is stored as an arbitrary multiplication factor, and angle is stored in degrees. It is important to choose for this operation a video player that supports the ability to draw rectangles and M-views on top of each video frame, both while paused, and while playing full motion video.
- the preferred player is the SoftMotion player previously described.
- the in-memory copy of the receptacles data is searched for rectangles and M-views that need to be drawn.
- the position, scaling factor, and angle of any receptacle in the frame will be determined in one of two ways. If the receptacle was edited by the user for that frame, then exact values, as specified by the user, will have been stored in the database, and are available in above-mentioned in-memory copy of that data. Other frames will require inte ⁇ olation. Inte ⁇ olation will be explained below.
- the rectangle may be drawn on top ofthe video, and an M-view may be rendered on the frame using these values.
- the color of that path may optionally highlight portions of the path corresponding to frames that contain the current receptacle. Since the database records the first frame and last frame for each receptacle in each clip, this is a straightforward operation.
- the video editor of the system supports a variety of editing modes.
- the current mode may be changed by the user through the menu system.
- In the default edit mode it is assumed that receptacles are not being edited. Their rectangles are all drawn in the same color, regardless of whether their values are interpolated or user-specified. In the default mode no new receptacle can be added. In this mode the user may select a receptacle by clicking on its rectangle. The name of the currently selected receptacle is shown in the window caption, and the segment of the camera path that contains that receptacle is colored in the map (if that option is enabled).
- a special edit mode is used for changing existing receptacle positions and scaling factors.
- the system tracks one receptacle as the one currently being edited. All other receptacles are drawn as they were in the default edit mode.
- the rectangle for the selected receptacle is drawn in one color for user-specified values, and a different color for inte ⁇ olated values.
- the user can drag any edge or corner of the rectangle to change the receptacle's scaling factor, or drag any interior point to move the entire rectangle without changing its scaling factor. Any rectangle that is so adjusted becomes a user-specified rectangle (not an inte ⁇ olated rectangle).
- the database is updated to reflect the new size and position, and the color of the rectangle is changed (if necessary) to the color used for user- specified rectangles.
- the user can use the standard video player controls to move to any other frame of video, thus allowing him to see the M-view and rectangle in any frame. If the user positions to a frame outside the range of frames currently containing the receptacle, the receptacle's range will be extended to include that frame. The user can also play the frames as full motion video, with the M-view and rectangle overlaid in each frame, this allows the user to determine whether the M-view moves appropriately to track the video background.
- a third edit mode is selected to change the angle of view.
- the mouse is dragged to the left or right to select different angles of view for the M-view.
- the screen is updated accordingly so that the user can see the results.
- a fourth edit mode is selected to add a new receptacle to the video.
- the first click on the video window will be taken as the initial location for the receptacle in that clip.
- An arbitrary scaling factor will be chosen, and the user will automatically drop into the previous ("change") edit mode where he can adjust the new rectangle and extend the range of the receptacle to other frames.
- Interpolation is an important aspect of this process.
- the spatial video edit system keeps a table in memory with location, scale factor, and angle of view for all video frames. Because this table describes every frame in a receptacle's range, no new calculations need be done when drawing a new video frame: the answers are already in the table. This quick response is important when playing full motion video.
- Each frame description in the table contains a boolean field to distinguish user-specified values from inte ⁇ olated values. This is important because a single user edit operation (like moving a receptacle rectangle in one frame) could invalidate the table entries for a whole series of frames. After every such action, spatial video edit system re-inte ⁇ olates the receptacle data.
- the first step in re-inte ⁇ olation is to discard all values that were not user specified (i.e. that were the results of previous inte ⁇ olations).
- the spatial video edit system then processes the user-specified frames in pairs, considering the first and second, then the second and third, then the third and fourth, and so on. If the pair represent adjacent frames of video (frame numbers differ by one), then there are no intermediate values required between them.
- spatial video edit system moves on to the next pair of user-specified frames.
- spatial video edit system generates values for all the frames between the current pair as follows.
- Simple linear inte ⁇ olation is used for the X- and Y-coordinates of the receptacle.
- Simple linear inte ⁇ olation is also used for the scale factor and angle of view. For example, suppose the X-coordinate of the center moves from 11 to 21 when going from frame 100 to 105. spatial video edit system allocates the total movement (10) evenly over the intermediate frames, producing values of 13, 15, 17, and 19. So the entire sequence for frames 100 to 105 would be 1 1, 13, 15, 17, 19, and 21. While more advanced forms of inte ⁇ olation (like splines) may produce smoother motion, linear inte ⁇ olation requires less computation, and produces good results. Linear inte ⁇ olation is an easy concept to explain to a new spatial video edit system user. It does not have the (possibly su ⁇ rising) side effects that spline-based interpolation would have.
- Flows are used in spatial video productions to allow the developer to specify execution logic for the product.
- a flow may say what happens when the product user clicks on hotspot X, or when the video player gets to the end of clip Y.
- the logic in a flow is specified in a simple programming language that may be entered and changed in the logic editor portion of the Spatial Video Edit System. Once entered the flow logic will be scanned by the edit system for obvious errors or omissions. Further error checking will be done by the system binder (discussed below) when the executable final production is generated. At bind time the flow language is translated into its final form to allow efficient playback.
- the preferred spatial video edit system provides the user with the ability to define global variables for use within flows. These variables are called “properties" and are defined and initialized in the edit-time database.
- the preferred system allows four types of properties: numeric, boolean, string and memo. Numeric properties are four-byte signed integers. Booleans are one byte yes/no values, with 0 meaning no. String properties are preferably fixed length 255 byte strings of characters. Strings shorter than 255 bytes are padded to 255 bytes with blanks. Memo properties are variable-length strings preceded by a two-byte length code. The user ofthe system can write flows which refer to these properties.
- the logic edit steps of the edit process are those which associate various logic flows with the components of the production.
- the logic programming is text based, it can be edited with any simple text editor. Such an editor is built into the preferred Spatial Video Edit System. Embedded external programs. There are certain types of programming that are not efficiently written in the flow language.
- An alternative is to construct a separate special purpose program which "owns” a region of the display screen for some period of time. The region "owned" by the special pu ⁇ ose program may overlap with other visuals and a priority means may determine which visual is visible at what time. Such programs can be debugged and tested separately from the overall production.
- Such a program might be a visual which portrays an automobile instrument cluster in which the speedometer and tachometer reflect changes in ⁇ m and speed. This program could occupy a lower part of the display while a video window was showing motion video through the "windshield" above the instrument cluster.
- Such special pu ⁇ ose programs should be constructed with a well defined interface which allows them to interact easily with the flow system.
- such programs can be constructed as OCXs. These program types are documented in the Microsoft Developer Network Development Platform SDKs and Operating Systems documentation previously cited. Similarly, OCX's can be used for programs without visual presentation, such as subroutine libraries.
- the bind step invokes the binder to create the final multimedia spatial video production.
- the binder is the component of the system which reads the database produced by the spatial video editor and writes disk files in a format that can be quickly and easily loaded and inte ⁇ reted by the runtime system. It is similar in function to a compiler and linker in a traditional edit-compile-link-run software development environment.
- the binder reads the database and collects the data therein into three main data files, as well as some smaller auxiliary files.
- the three main files are known as the "data hunk,” the “logic hunk,” and the "clip hunk.”
- the FIG. 28 illustrates the overall binder process, containing the steps of data hunk creation, logic hunk creation, initialization file creation, clip hunk creation, and perform relocations.
- the clip hunk preferably is a single large MPEG file consisting of all the spatial video in all the clips in the production.
- Each clip is preferably preceded by a number of I frames which represent other frames that can be reached by means of clip exits present on the clip. These frames are present to allow the runtime system to cache them before beginning to play the clip.
- Each clip is preferably followed by a single copy of the last frame of the clip, but the copy is always an I frame (regardless of the actual type of the last frame).
- the data hunk preferably contains all the information the runtime system needs to know to set up the hotspots, receptacles, visuals, global properties, music, and asynchronous (keyboard and mouse, joystick, etc.) input for the production.
- the data hunk is derived from the edit-time database and is laid out by the binder in a fashion which makes it easy for the runtime system to allocate and initialize the memory needed to store and represent this data while the production is running.
- the binder must do some special work to assign each property an offset in the data area of the "runtime machine" which is described below.
- the property offsets are used to refer to the properties in the runtime flow language inte ⁇ reter.
- FIG. 29 illustrates the steps of the data hunk creation process.
- the logic hunk (runtime logic file) is a collection of instructions in a low-level language that is inte ⁇ retable by the runtime system, and a list of events that should occur when a given frame of video is displayed (known as "frame events").
- the instructions implement general-pu ⁇ ose programming commands for moving and comparing data, performing arithmetic, playing audio and video, and so forth.
- the frame events tell the runtime system when to activate and deactivate hotspots and receptacles, when to play certain music files, when to activate flows and clip exits, and so forth.
- the binder generates the logic hunk by reading the flows written by the spatial video edit system user and generating the low-level runtime language instructions that implement the user's commands.
- the logic hunk creation process is illustrated in FIG. 30.
- the binder creates an initialization file which tells the runtime system where to find the other files involved in the production. For example, if cursors, music, digitized audio files or dramatic video files are used, the initialization file gives the path name to allow the runtime system to find these files.
- the binder also copies over to the production certain files containing other assets used in the production, such as dramatic video files, audio files and MIDI files. When instructed to do so, the binder can also copy these various assets into the same directory tree in the file system that the binder-created files occupy. This has the effect of removing all dependencies on files outside the binder-created directory tree, which makes it very convenient to create a CD-ROM that will run the production without encountering problems due to different disk names or drive letters. When the binder runs in this mode, only path names relative to the program executable are used.
- the binder makes use of a technique used commonly in the art of linker design called "relocation.”
- Relocation allows the various parts ofthe data and logic hunks to refer to items in the other hunks by means of a numerical offset into the hunk, even though the offset is not known at the time the reference is written. This is accomplished by storing a list of positions in each hunk that are in fact references to another hunk, then rewriting the data at those positions after the other hunk has been created. For example, if a hotspot in the data hunk refers to a flow in the logic hunk, it cannot know the offset in the logic hunk at which the flow begins; therefore, a zero is written to the data hunk as a placeholder and the offset is noted.
- the data hunk is reopened, a seek is used to position the data hunk file to the correct location, and the correct logic hunk offset is written to the file, overwriting the zero that was previously written as a placeholder.
- One portion of the bind process is that used for creating frame events on a specific clip:
- Step One If the current clip has a MIDI file associated with it, generate a PLAY MIDI frame event for the first frame of the clip.
- Step Two For each clip exit which specifies the clip as its FROM clip, do the following: a. If the clip exit specifies a flow to run, generate a CLIP EXIT event for the frame associated with the clip exit that calls the given flow; b. If the clip exit specifies a valid TO clip, do the following: (1) Generate a CACHE FRAME event associated with the first frame of the clip that caches the TO clip target frame;
- Step Three For each receptacle which is present on this clip, generate a RECEPTACLE ACTIVATE and a RECEPTACLE DEACTIVATE event for each time the receptacle becomes active or goes inactive. The events should be associated with the frame on which the receptacle becomes visible or becomes invisible, respectively. If the receptacle is visible when the clip starts, the RECEPTACLE ACTIVATE should be associated with the first frame of the clip, similarly, if it is still visible when the clip is exited, the RECEPTACLE DEACTIVATE should be associated with the last frame ofthe clip. Step Four: For each hotspot which is present on this clip, generate
- HOTSPOT_ACTIVATE and HOTSPOT DEACTIVATE events similar to the RECEPTACLE_ACTIVATE and RECEPTACLE DEACTIVATE events of step three; Step Five: Generate a CLIP END event associated with the last frame; Step Six: Sort all frame events based on the frame with which they are associated. If two events arc on the same frame, make sure that RUNTIME TRANSITION events immediately precede their associated CLIP EXIT event and that the CLIP END event comes last.
- the binder could change some basic parameters of the production.
- the binder could, as part of the bind process, translate all MPEG video to AVI, Indeo, or some other format. If a new version of the runtime system were to be implemented using the new format, the user could generate working productions for both formats by simply binding the same database in two different ways.
- the binder could bind the production differently for different target platforms. For example alignment and byte ordering are different on computers based on the Motorola series of processors than they are in the Intel processor series, but a binder running on an Intel platform could reorder bytes as appropriate and create a production that would be easily readable by a runtime system running on a Motorola platform. Neither of these translations, nor any others that the binder might implement, would require changes in the Spatial Video Edit System database.
- the runtime system engine must be able to coordinate playing video with overlaying of M-views on the video, playing sounds, and interacting with the production with mouse or keyboard.
- the runtime system is preferably an inte ⁇ reter-based system. While a compiled system would also work, an advantage of using an inte ⁇ reter system is its ability to represent known classes of complicated ideas very compactly. At runtime, this can provide significant efficiencies. Whichever type of system is used, it must be able to run flows and play video sequences with overlaid hotspots and receptacles.
- the runtime system also coordinates external events from keyboard, mouse, or other input devices.
- the runtime system's primary inte ⁇ reter is the flow inte ⁇ reter. There is also a video inte ⁇ reter as will be described below.
- the flow inte ⁇ reter and video inte ⁇ reter work in parallel with the MPEG stream in a logic stream which allows the runtime system to determine, on a frame by frame basis, whether anything should be overlaid on the frame about to be displayed.
- Each frame of video is decoded from the MPEG stream into a buffer. After decoding, the buffer holds a bitmap of the image that is about to be displayed. Before the image is displayed any pending frame events or other events are handled.
- the classes of events are described in this runtime section and the appendices inco ⁇ orated herein. They are illustrated in the figures accompanying the text.
- any active receptacles are handled by determining for each active receptacle whether there is an M-view in the receptacle and, if so, determine the receptacle's position and the M-view's size and angle from the active receptacle list.
- the M-view is then rendered into the buffer holding the bitmap Of the image to be displayed.
- the transparent parts of the M-view are not drawn, only the opaque parts. While the M-view could be rendered directly onto the display, the decompression, scaling, locating and rendering of the M-view bitmap is not instantaneous. Consequently, drawing them directly on the display could result in screen flicker. Therefore, in the preferred embodiment, the M-views are rendered on the bitmap of the decoded video frame while it remains in the buffer and before it is displayed.
- the overall runtime system process is shown in FIG. 18.
- the run time system begins it loads the data hunk using the steps described below, and then begins inte ⁇ reting the initial flow.
- the flows all come from a file called the logic hunk.
- the logic hunk contains a number of programs that control the course of the production. These programs are sometimes referred to as flows and were the programs created by the logic editor of the edit system.
- the data hunk contains information about locations of hot spots, and receptacles, within frames.
- the data in the data hunk and the logic flows in the logic hunk were inco ⁇ orated into the runtime data hunk and logic hunk during binding as described above.
- An MPEG file called the clip hunk, contains the spatial video footage.
- An M-view file contains the bitmaps (faces) of the objects which are inserted into the video. Other files include midi music, dramatic video sequences, and digitized sound.
- the initialization process preferably loads the data hunk into memory using the following steps.
- MIDI file information is loaded and indexes assigned based on the MIDI file information in the data hunk.
- Visual information is loaded, namely, Windows are loaded and indexes assigned, Viewports are loaded and indexes assigned, Pictures, or visuals which frame bitmaps, are loaded and indexes assigned and, finally, custom OCX visuals are loaded and assigned indexes. All index assignments are based on the order of the indexed data in the data hunk.
- receptacle information is loaded. First receptacle lists and associated flows, if any, are loaded and indexes assigned. Then the list of receptacle positions, viewing angles, and scaled sizes on each active frame and any associated flows is loaded and each list is associated with a receptacle. After receptacle information is loaded, hotspot information is loaded. First the list of hotspots and their associated flows are loaded and assigned indexes based on their order in the data hunk. Then the list of hotspot positions and sizes on each active frame is loaded and each list is associated with a hotspot.
- hotspot information information on how to handle asynchronous user input is loaded, this information consists of a list of input events and a flow associated with each event.
- global properties, constants, and objects are loaded.
- the logic hunk is much simpler to load, since it merely consists of a large block of logic instructions. It can be loaded in one pass by allocating a single block of memory and filling it with the contents of the logic hunk file. As the logic offsets are included in the various pointers in the data hunk, no index need be constructed for the logic hunk. Execution proceeds by allowing the first logic hunk flow to run. See FIG. 18. In the preferred embodiment this program builds a user interface and starts a video running.
- the user interface is built by loading a series of visuals and making them appear on the screen.
- the visual typically starts the first spatial video sequence.
- the inte ⁇ reter manages the flows and the spatial video.
- the first events in a spatial video clip are those which load cached frames into memory. These are the TO frames for the various transitions permitted from different frames within the clip.
- These cached frames are preferably placed in the video stream at bind time to be loaded when the video clip begins to be played.
- these cached frames are loaded into a cache frame buffer by the MPEG streamer in response to the DOEVENTS catch-up mode routine accessed through the video operation code (op code) inte ⁇ reter as illustrated in FIGS. 20 and 22, and as discussed in the text accompanying these figures.
- inte ⁇ reter As two inte ⁇ reters, a flow inte ⁇ reter, i.e. a recursive op code inte ⁇ reter, and a video op code inte ⁇ reter.
- the video op code inte ⁇ reter manages the playing of the frames but passes to the flow inte ⁇ reter when events call for any flows. See FIG. 20 and FIG. 19. These functions will be discussed in greater detail below.
- Associated with the MPEG video is a section of the logic hunk that is keyed to particular frame numbers in the video.
- This logic hunk section is pre-sorted by the binder in ascending frame number order and contains events for any receptacles and hotspots which were associated with those video frames at edit time.
- the runtime system refers to the location information loaded from the data hunk and overlays the hotspots and receptacles on the proper locations on the proper video frames.
- Each hotspot and receptacle can have a flow assigned to it for each mouse event (or other user input device action). For example the associated flow logic could select different cursors to display when the mouse enters and leaves the hotspot. Another example would be to have the left mouse button execute a help program if clicked when the cursor is over the hotspot.
- the types of actions associated with mouse events in multimedia runtime systems are well know to those skilled in the art. However, the uses of a runtime system to track hotspots and receptacles in spatial video and to inte ⁇ olate size and angle of M-views associated with the spatial video are novel aspects ofthe system.
- an M-view that is associated with the object in the receptacle will be displayed at a location and size as described by the data file for each subsequent frame of video with which that receptacle is associated.
- the data in this data file is based on the receptacle definition data created at edit time, however the binder not only places the position and size data from the edit time database into the runtime data file, it preferably also inte ⁇ olates frame-by-frame at bind time and places the inte ⁇ olated data in the data file to minimize the computational work to be done at runtime.
- Hotspots are handled similarly to receptacles and often overlap receptacles; the hotspot recognizes the mouse events, such as a left mouse button click on a hotspot, cursor enter hotspot or cursor leave a hotspot, and causes logic hunk programs to run.
- the appropriate logic hunk programs are created at edit time and are attached by the binder to the hotspot data so the runtime system can execute the flows (logic hunk programs) when needed.
- the runtime engine calls the first logic flow through the logic flow inte ⁇ reter, a recursive operation code inte ⁇ reter.
- the process followed by the logic flow inte ⁇ reter is described in FIG. 19.
- the logic flow inte ⁇ reter parses the logic code and first gets the current program counter (pc) address (an instruction pointer) and gets the op code designated by the pointer. This op code is read to determine whether it is a routine op code which can be executed directly by the inte ⁇ reter, or a non-routine op code for which the flow inte ⁇ reter calls to the operating system or a special routine.
- pc current program counter
- op code is read to determine whether it is a routine op code which can be executed directly by the inte ⁇ reter, or a non-routine op code for which the flow inte ⁇ reter calls to the operating system or a special routine.
- routine op codes are those which almost any inte ⁇ reter would execute, such as math and boolean functions and other simple computational functions dealing with the basic handling of production logic, global variables, receptacles and objects.
- Non-routine op codes deal with specialized functions more unique to the particular application, such as spatial video logic, loading, hiding and showing visuals, such as windows, viewports and pictures, calling dramatic videos and playing and muting of sounds such as MIDI and WAV files.
- Non-routine op codes can also include custom codes created by the developer.
- the play dramatic opcode merits additional comment. While it is similar to opcodes dealing with the playing of sound files, it differs in that it requires the spatial video being played to be interrupted. Thus, in the preferred embodiment of the system, the play dramatic opcode allows the runtime system smoothly to inco ⁇ orate dramatic video with the spatial video. Although there is no requirement that the dramatic video match anything within the spatial video, the overall feel of the environment is improved when there is no visual difference in switching between the two types of video. Similarly to the matching steps described earlier in this document concerning turn and roundabout transitions, the sense of immersion in the environment is enhanced by matching scenes between the transition points in the spatial video and the ends ofthe dramatic video. The matching process is complicated by that fact that spatial video is typically played back in anamo ⁇ hic (stretched) form while the dramatic video is played back at the original aspect ratio (unstretched). Creating the desired matching requires several steps:
- the runtime system When the runtime system encounters a play dramatic opcode, it first takes the offset information and pans the video frame until it is displaying the picture at the requested offset. It then invokes the MPEG decoder to play the dramatic sequence in the same position as the spatial video. Because the two frames have been matched, to the user there is no apparent change. Once the dramatic video has finished, the spatial video player takes up playing the second frame of the 2 frame clip. As previously discussed, where a hardware player is available, the dramatic video is played by the hardware player in step 7. A flow will also return to the software decoder once the dramatic video has finished playing.
- FIG. 25 illustrates the relationship between the flow inte ⁇ reter, the routine op codes and the visual and video op codes.
- FIG. 25 is a relationship diagram, not a flow chart. Therefore, while it shows the hierarchical relationship among the components of the runtime system, it does not show all of the recursive connections. It is suggested that the various flow charts referenced in FIG. 25 be reviewed for this information. Similar op codes and routines exist for dramatic video sequences, sounds and music (not shown). A list of op codes used in a preferred embodiment is contained in the flow language reference contained in Appendix A and inco ⁇ orated herein by reference.
- the inte ⁇ reter executes it and updates the position counter and repeats the loop. If the code is a visual load op code the inte ⁇ reter calls the visual load routine (FIG. 24). If the code is a video op code, the inte ⁇ reter calls the video op code inte ⁇ reter (FIG. 20). The flow inte ⁇ reter passes to the video inte ⁇ reter the current video clip ID, the beginning frame number of the current clip, as well as the starting frame number for the clip and the list of interesting frames (see below). The beginning frame number is the number of the frame at the beginning of the clip, this number is constant for any clip.
- the starting frame number is the frame number where the video clip is to start playing. This number changes from access to access depending on the user navigational and other input.
- the video inte ⁇ reter exits back to the flow inte ⁇ reter (see FIG. 20) it does so with an updated position counter value.
- Other non-routine op codes (not shown) are handled in a fashion similar to that shown for visual op codes.
- An 'Interesting' frame is one where something changes other than merely the frame number as the video is played. That is, an interesting frame is one in which the runtime system must do something in addition to simply playing the MPEG video clip.
- An interesting frame is one where either a hotspot begins or ends, a receptacle begins or ends, a transition/exit to another clip can happen (turn, roundabout or other) or other actions (music, sound) begin or end. Also, the last frame of a clip is 'Interesting'.
- the video op code inte ⁇ reter FIG. 20 takes the information received from the flow inte ⁇ reter, determines the next interesting frame in the clip and sets the current frame number to the starting frame number provided by the flow inte ⁇ reter (see FIG. 19). It compares the frame at which it was instructed to begin playing (current frame) and the beginning frame number of the clip. If it finds that the clip is starting somewhere other than the beginning, it calls the DOEVENTS routine (FIG. 22) in its catch-up mode. Once DOEVENTS has caught up, or if the clip is starting at the beginning, the video op inte ⁇ reter initiates the DOFRAMES routine FIG. 21. When DOFRAMES returns to the video op code inte ⁇ reter it does so with a new current frame value.
- DOEVENTS may return a synchronous call, if so, the call is sent to the flow inte ⁇ reter and the video inte ⁇ reter calls DOFRAMES again. If DOEVENTS returns an asynchronous call
- DOFRAMES routine is called yet again. As shown in FIG. 21, the DOFRAMES routine is entered with the "stopped" variable set as false. Its first task is to determine whether there are interesting frames which must be executed before more video can be played. If the current frame is an interesting frame the DOFRAMES routine exits to the video op code inte ⁇ reter FIG. 20. If there are frames to be played before the next interesting frame, DOFRAMES continues until it either reaches the next interesting frame or it determines that the frame is stopped. When the frame is stopped the MPEG player returns a signal that the frame is stopped. This signal is used by the DOFRAMES routine to make the stopped frame determination. The frame could be stopped because a flow signaled it wants the frame stopped.
- the frame could also be stopped because the end of the clip has been reached or because the user has paused on a particular frame or has reached the end of the clip without exiting. If the frame is stopped, DOFRAMES exits back to the video inte ⁇ reter with a normal return code. This causes the current frame to be repeated and allows the "stopped" frame to be panned and all hotspots, receptacles and the like to be continually checked by the runtime engine.
- DOFRAMES exits with a call or GOTO return code which, as shown on FIG. 20, the video op code inte ⁇ reter reads and exits to the flow inte ⁇ reter to call the appropriate routine. If none of these events have occurred, DOFRAMES draws the current frame into the buffer and gets the new current frame number from the MPEG player.
- DOFRAMES continuously checks the global variables to determine whether any asynchronous events have occurred (see FIGS. 23 & 25). If the DOFRAMES routine detects an asynchronous call (e.g. mouse action) the video inte ⁇ reter exits to the flow inte ⁇ reter where the call routine is performed. After the call routine is processed by the flow inte ⁇ reter it exits with a return call to the video inte ⁇ reter and the video inte ⁇ reter repeats the DOFRAMES routine, this time with its catch-up mode disabled.
- asynchronous call e.g. mouse action
- the DOFRAMES routine detects an asynchronous GOTO (such as a transition to another video clip)
- the video inte ⁇ reter exits the video clip to the to the new position counter position.
- the DOEVENTS returns a synchronous call
- the video inte ⁇ reter recursively calls the flow inte ⁇ reter for the call's flow routine to be processed. After the call is finished, or if there was no call, DOEVENTS is called in non-catchup (regular) mode.
- the DOEVENTS routine is entered from the video inte ⁇ reter either in catch-up mode, where a clip is being entered at a position other than the beginning or in regular mode where there are no interesting frames before the current frame.
- the difference in the DOFRAMES modes is that the display and exits are disabled during the catch-up mode.
- FIG. 22 illustrates the DOEVENTS routine.
- the DOEVENTS routine is entered when the DOFRAMES routine reaches an interesting frame. If the frame number of the next interesting event is greater than the current frame, the DOEVENTS routine determines whether any transition frames remain to be played and, if so, plays them. DOEVENTS then exits to the video inte ⁇ reter which, in due course, executes the DOFRAMES routine. If the current frame is an interesting frame, the DOEVENTS routine proceeds to process the events which make the frame interesting. These could be sound, music, cache frame, hotspot or receptacle activation or deactivation, transition, exit or clip end. Once the events at the current interesting frame are processed, the routine gets the next interesting frame position from the list. If this frame is not the current frame, the routine exits back to the video inte ⁇ reter.
- the runtime system is set up to interleave the playing of transition frames and the placing cache frames in the cache buffer.
- the system instructs the MPEG player to play the next frame of the transition. If all transition frames are finished before all cache frames are loaded, the remaining cache frames are loaded in the "cache frame?" step shown in FIG. 22. If there are no more frames to be cached, but further transition frames to be played, the DOEVENTS routine plays the remaining transition frames before continuing with the parsing ofthe frames on the clip.
- the MPEG player could use one thread (or task) to accomplish the seek to the TO clip while the main program continues to play transition frames and any other events which could be executed prior to the availability of the TO clip. If the transition is finished before the TO clip seek is completed, the main program would wait for the seek thread to complete the seek. Once the TO clip is available any remaining transition frames could be played while the cache frames are loading as catch-up mode parses the TO frame. This would allow the transition sequence, which, in the preferred embodiment, does not require the TO clip, to mask seek time to the TO frame.
- the transition frames can be either generated on the fly or kept in memory, so no disk access is required to play them.
- Hotspot and Receptacle Activation Hotspots and Receptacles are activated when they become visible and deactivated when they become invisible by DOEVENTS adding and removing them from the active hotspot or active receptacle list. Adding them to the active list makes the frame-by-frame size and position information available to the MPEG player and the asynchronous event handler (see FIG. 23).
- End of clip events require the DOEVENTS routine to back up one frame. This continues to run the video and logic to allow the user to navigate or to execute any available exits or other actions.
- Cache Frame events tell the runtime system which, in turn, instructs the MPEG streamer to store a given frame of video rather than send it to the MPEG player for decoding.
- all of the frames to be cached for any transitions possible from a particular clip are physically located at the beginning of that clip and cache frame events are generated to load them when the clip is accessed.
- These transition frames are preferably reference frames. If MPEG coding is used, the frames are preferably I frames.
- the MPEG streamer loads the cached MPEG frames into a cache frame buffer. The MPEG streamer sends the proper cached frame to the MPEG player future buffer at the beginning of the transition sequence.
- the MPEG streamer then sends the proper set of transition B frames, as described in the Special Effects Applications, to the decoder.
- the use of the cached TO frames and the transition B frame sequence, which can also be cached, in transitions from one clip to another allows the immediate construction of transition frames with both FROM and TO frame information for a meaningful transition. This can occur while the system is loading the cache frames for the TO clip or, if a multitasking operating system is used, while the system is seeking the TO clip or frame. Where a clip is entered from a previous clip through a transition the caching of frames can be interleaved with the playing of the transition sequence.
- the DOEVENTS routine instructs the MPEG streamer to play the next frame of the transition sequence before completing the loop to get the next interesting frame, which may be another cache frame.
- the transition frames are assigned one or more invalid frame numbers, such as negative numbers.
- DOEVENTS (FIG. 22) does not return to video until all cache frames arc stored and all transition frames are shown.
- Transition events which make smooth transitions from one clip to another by using a previously cached frame to hide the time spent seeking to and preparing to display the next clip. They allow smooth transitions of various types as described elsewhere herein and in the copending Special Effects Applications.
- Clip Exit events return control to the flow inte ⁇ reter which allows the inte ⁇ reter to cause various designated logic hunk programs to run. These programs can play sounds, set global state variables, keep score, and the like. By changing global state variables, the programs associated with a Clip Exit can change the currently playing video clip to another clip. An associated logic hunk program can also test global state variables to decide what to do. Further, the Clip Exit events only call the associated logic hunk program if a particular global state variable, EXIT ID, matches the Clip Exit's ID. See FIG. 22. There are a number of possible values for the Clip Exit ID: Always Match, Left Turn Match, Right Turn Match, User Data Match.
- a Clip Exit with an ID of Always Match will always call the logic hunk program.
- a Clip Exit with an ID of Left or Right Turn Match will only call the logic hunk program if the runtime system has detected the user is trying to turn left or right.
- a Clip Exit with an ID of User Data Match will only call the logic hunk program if the global user data match variable matches the Clip Exit ID's value.
- the method in which the flow and video inte ⁇ reters of the runtime system receive asynchronous inputs is through use of a global variable table, a method well established in the art.
- the runtime system receives user input through an asynchronous event handler FIG. 23 which detects user actions and communicates with the runtime system through a global variable list (see FIG. 25).
- the asynchronous event handler updates the global variable list in response to mouse events or other user input in the method illustrated in FIG. 23.
- the asynchronous event handler checks global variables regarding hotspot location and activation established by the flow inte ⁇ reter. As shown in FIG. 23, the asynchronous events handler detects cursor moves created by a mouse (or any other input device) as well as clicks. When the asynchronous events handler detects a mouse move, it first sends the new mouse coordinates to the MPEG player panning control so that the screen can be adjusted accordingly.
- the handler sets a global exit variable for the proper transition.
- the handler checks the list of hotspots, receptacles and visuals that are active and determines whether the coordinates and mouse events it has detected require performance of a hotspot, receptacle or visual associated call or GOTO. If so, the async event handler sets an async call or GOTO in the global variable list and returns to the operating system.
- the global variable list is checked by the flow inte ⁇ reter and depending on the nature of the event, it is executed by the DOFRAMES or DOEVENTS routine (see FIG. 20).
- Visuals are handled by the flow inte ⁇ reter FIG. 19 by calling the visual load routine FIG. 24.
- the visual load routine simply gets the relevant information about the visual from the visual table, creates the window needed and calls any flow routine associated with the visual.
- the visuals are rectangular areas ofthe display, sometimes called windows, in which data, in the form of bitmaps, video or a viewport are displayed. Visuals can be placed within each other. Thus one can have a parent window and a child window. Hence, the visual load routine is called recursively to detect and load any child windows and run their associated flows. For example, a window visual may have a viewport visual playing video within it.
- Navigation is handled through the asynchronous events handler.
- FIG. 23 In the preferred embodiment the screen is divided into seven regions for the pu ⁇ oses of navigation.
- Region 0 is in the center; while the cursor is in region 0 the user is not trying to turn or change viewing direction. If the cursor is in regions 1 or 4 the asynchronous events handler provides these coordinates to the global variables table and the flow inte ⁇ reter initiates a flow which directs the MPEG player to pan the MPEG frames one pixel per frame to the left (in case of 1) or right (in the case of 4). In regions 2 or 5 the instruction is to pan at the rate of 4 pixels per frame. In regions 3 or 6 the direction is to pan by 16 pixels per frame. When the image is panned to the maximum to the left or right and the cursor remains in region 3 or 6, the global variable ExitID is set to Left Turn Match or Right Turn Match.
- the DOEVENTS routine calls the transition routine which is paired with the appropriate clip exit and the newly entered clip is started. See FIG. 22. While the various navigation regions could be considered region hotspots, they are preferably recognized by the asynchronous events handler FIG. 23 which places the appropriate panning instruction in the global variable list. Forward and reverse navigation can be handled in any logical manner. For example, there could be regions on the top and bottom of the screen which, when detected by the asynchronous events handler could set a global variable for forward or reverse play.
- FIG. 25 provides an overview of the runtime system showing the relationship between the flow inte ⁇ reter, the video inte ⁇ reter and the various routines as well as the interaction between the asynchronous event handler and the flow inte ⁇ reter through the global variable table.
- FIG. 26 is a table which contains exce ⁇ ts from the code played along with the MPEG video stream.
- the first column is the code line number
- the second column is the source line which, when the system is playing a video clip will contain the clip's frame number.
- Code line 5f is associated with the display of frame 0.
- Code line 73, frame 20 is a hotspot activate event.
- the IDX:27 is an index to the hotspot table contained in the Data Hunk of the bound production. This table tells the system where the hotspot is in the following frames, until line 2a3 when the hotspot is deactivated.
- the hotspot was activated in frame 20 either because it came onto the screen at that frame, or the user was close enough to it to see it, or simply because the person editing the production decided that the hotspot should be activated at this place.
- a hotspot is deactivated for similar kinds of reasons.
- step two once the interesting clip file has been parsed to determine the next interesting clip the video stream is played in the following manner.
- Each frame of video is decoded from the MPEG stream into a buffer. After decoding the buffer holds a bitmap of the image that is about to be displayed. Before the image is displayed any pending events (step 3 above) are handled. Then any active receptacles are handled. For each active receptacle the video inte ⁇ reter determines if there is an M-view in the receptacle. If so, the inte ⁇ reter determines the receptacle's position and size and angle from the active receptacle list. It then renders the M-view into the buffer holding the bitmap of the image to be displayed. In rendering the appropriate M-view bitmap, the transparent parts of the M-view are not drawn into the buffer, only the opaque parts are.
- the DOFRAMES routine draws the M-view into the display buffer, as shown in FIG. 21.
- the display buffer contains a frame which is to be displayed. The frame may already have other M-views rendered on it.
- the M-view associated with the active receptacle is drawn by using the active receptacle list and the receptacle by frame list which tracks receptacle size and angle by frame number.
- the particular M-view associated with the receptacle is also present as an object property variable. As discussed above, the variable is subject to change by flows responsive to game logic.
- the M-view draw routine looks to this variable to retrieve the appropriate M-view.
- the M-view's resolution and angle are retrieved from the M-view directory using the frame specific information in the receptacle by frame list.
- the following steps allow for the rendering of an M-view onto a buffered bitmap.
- a buffer to render into which buffer already contains a frame of video, and may already contain other previously rendered M-views
- the appropriate angle has already been determined at edit time by the developer using the M-view editor.
- the inte ⁇ olated sizes, positions and angles are converted to frame specific sizes, coordinates and angles and placed in appropriate tables in the Data Hunk.
- the appropriate M-view face is stretched or shrunk and is copied to the buffer at the correct location of the video frame.
- the M-view rendering program decodes the run length compressed, bit-packed bytes, and color look up table associated with the M- view, which were created and placed in the production at Bind Time and performs the following steps: STEP ONE: Normalize the angle. As the inte ⁇ olator may end up with a degree number outside this range, but all angles in the M-view angle index are between 0 and 359 degrees the first step is to normalize the angle associated with the receptacle. This is done by adding or subtracting 360 degrees to any angle not within the range. This step can be done on the fly at runtime or can be done by the binder and the appropriate angle entered in to a table in the runtime Data Hunk.
- STEP TWO Determine the scale. Next the appropriate scale for the M-view face to be overlaid is determined by multiplying the size supplied by the receptacle with the size stored with the M-view. This allows the M-view of objects with vastly different absolute sizes to be associated with the same receptacle. For instance, a user may wish to put either a fly or an elephant into a particular receptacle. The receptacle's size need only be set once at edit time to make either the fly or the elephant look correct in the scene. The other will be scaled accordingly.
- STEP THREE Determine the position.
- Receptacles like hotspots, are rectangular. They have an origin or anchor points at one of nine positions: top-left, top-center, top-right, middle-left, middle-center, middle-right, bottom-left, bottom-center, bottom-right.
- the origin that is assigned at edit time remains constant regardless of the scale computed in Step Two.
- the location of the bitmap of M-view rendered in the receptacle varies. If the receptacle origin is bottom center, the center of the bottom edge of the M-view bitmap will be on the origin.
- the M-view rendering program uses the supplied position, origin, and scale to determine the size and location of the rectangular M-view bitmap which is to be displayed.
- STEP FOUR Stretch or shrink M-view face.
- the M-view bitmap be the right size it may have to be stretched or shrunk. This is done to the bitmap corresponding to the M-view face that most closely corresponds to the normalized angle from Step One to fit it into the receptacle from Step Three.
- the transparent pixels from the M-view are not copied into the buffer.
- Stretching orshrinking can use any of a number of well known algorithms. In the preferred embodiment we use a very fast algorithm that sacrifices some quality of final image for speed and takes advantage of run length encoding of the transparency information.
- Runtime Logic The spatial video production logic can also be illustrated and better understood by looking at examples of disassembled code from a production made with a preferred embodiment of the system. Such examples are contained in Tables 1 through 5 which are attached to and inco ⁇ orated in this specification. Table 1 illustrates the code used in a preferred embodiment of a spatial video production to load up the user interface. The 8 digit numbers on the left margin are simply offset numbers in the logic hunk. They were added by the debugger software used to disassemble this portion ofthe code. This section of code is executed by the flow inte ⁇ reter.
- Table 2 may provide additional appreciation for how logic execution and video frame playing run in parallel in spatial video.
- the 8 digit numbers in the column on the left margin are logic hunk offsets.
- the second column is the frame number, containing the abbreviation FRAM: followed by the frame number.
- the column after the frame number column shows the interesting events associated with the frame number.
- line 73, frame 20 is a HOTSPOT activate event.
- the IDX:27 is an index to the hotspot table produced by the binder in the Data Hunk.
- the hotspot table gives the location of the hotspot in the frames ofthe clip.
- Line 2a3, frame 318 shows that the hotspot is deactivated.
- Reasons for the activation of a hotspot vary. The decision is made by the developer at edit time.
- the developer could choose to activate a hotspot in a particular frame because the visual feature with which the hotspot is associated became visible in that frame, or because the feature became large enough to distinguish, or for other reasons.
- Activation of a hotspot could be associated with a flow. That is, the hotspot is not activated until the player succeeds in locating certain objects or otherwise meeting certain requirements of the game logic necessary to activate the hotspot.
- the FRAME POSITION listing shows the X, Y and Z coordinates of the camera for that frame. These coordinates are useful during edit time to construct maps and the like, as previously discussed. The coordinates can also be useful at runtime.
- the OCX which is used to show the player his position in the environment on the map visual may get frame position data to locate the pointer on the map.
- the level of the noise can be adjusted by a OCX which adjusts the level in the sound card using the position of the noise source and the FRAME POSITION data as the player approaches or retreats from the source.
- Table 3 shows more code in a video clip logic section of a production made with a preferred embodiment of the invention.
- the code in this table is at the start of a particular source video clip named "24 EastAcrossFront.” Notice that the first events to occur are to cache destination or "TO" frames used in various transitions possible off of this video clip.
- This code section also shows a receptacle activate on frame 741, hotspot activations on frames 744 and 746. Notice, as previously discussed, that the RUNTIME TRANSITION is always paired with and immediately precedes CLIP EXIT, as a turn or roundabout transition always moves the player to another clip. In contrast, CLIP_EXITs are not always associated with transitions.
- Table 4a shows a flow in a production made using a preferred embodiment of the method of the present invention.
- the flow was typed by the developer in the logic editor portion ofthe spatial video editor. This particular flow is assigned to a hotspot's click event.
- the hotspot to which the flow is assigned appears over a poster of a baseball player in the spatial video clip.
- the flow causes a picture of the baseball player to be displayed and then waits for the player to click. Once the click occurs the flow removes the picture and returns to the video.
- Table 4b shows the actual code the binder generated from the above flow.
- Table 5 is a disassembly of part of the hotspot data from the Data Hunk of a production made with a preferred embodiment of the method of the invention. The disassembler used to create this table did not create data file offsets, but, of course, they exist. However, the difference in the disassemblers used for the logic hunk and the data hunk account for the lack of the first offset number column in Table 5. The first listings in Table 5, those up to the first double space, show a directory of hotspot lists.
- the list includes information on region hotspots as well as spatial video hotspots. Notice that the offset number given for the region hotspot, 3890, is the same as that for the clip (video tracking) hotspot. This is because this particular production contained no region hotspots. This is confirmed by the listing after the double space showing the count of region hotspots as 0.
- the listing goes on to provide information regarding the clip hotspots in the production, providing offset location information on the first occurrence of the hotspot, the flow associated with the hotspot, the ExitID for the hotspot which goes to the global variable table if the hotspot is triggered and the type of trigger which activates the hotspot (e.g., mouse move, mouse click, etc.) and the way the cursor is depicted when it is over the hotspot.
- This information is followed by a series of listings providing the size and location of the hotspot on the various frames where it appears. These figures are calculated by the binder using the same extrapolation/inte ⁇ olation methodology used at edit time. Only the first and last positions are given.
- Flows are used in Video Reality productions to allow the product designer (VRED user) to specify execution logic for the product
- a flow may say what happens when the product user clicks on hotspot X, or when the video player gets to the end of clip Y
- the logic in a flow is specified in a simple programming language that may be entered and changed in VRED
- the flow logic will be scanned by VRED for obvious errors or omissions Further error checking will be done by the VR binder, when the production executable is generated At that time the flow logic will be translated into its final form to allow efficient playback
- All flows start with a FLOW statement and end with an ENDFLOW statement Between these are a list of executable statements like assignments, loops, conditionals, etc Every executable statement starts with a keyword that identifies what kind of statement it is These keywords are assign call do exit for if • jump load midi return select • set
- Names for va ⁇ abies, variables, etc are sequences of alpha characters, digits, and underscores Names must begin with an alphabetic character (a-z) or underscore (_) Characters after the first may be alphabetic (a-z), numeric (0-9), or underscore (_) The maximum length for a name is 64 characters
- Database Names The flow compiler recognizes names for many database objects Names used for these objects must all be unique - for example, you can't have a bitmap and a clip exit with the same name - and each will be translated to the correct type of reference automatically by the flow compiler You cannot refer to the name of an object that you haven't added to the database yet'
- Variables of these types may be declared via the Global Properties Editor in VRED Note that all database objects (like receptacles, bitmaps, clip exits, and so forth) are represented inside the flow language as four-byte integer values This is not ideal, and a future version of the language will recognize more types to handle this problem
- QueuedExitlD Sets up the next exit Used when the user explicitly clicks on a hotspot associated with an exit, or when the user indicates a general direction like "Take the next right" You can assign a specific clip or flow exit by name, or you can assign one of the exit constants ExitLeft, ExitRight, or ExitDefault.
- Speed - Sets the navigation speed in terms of percentage of normal speed 100 is normal speed 50 is a slow speed, 200 is a fast speed Don't bother setting it above 200 or below 0
- Acceleration - Sets the maximum rate of change of speed per frame Not as useful as we thought it would be Set it to 9999 and leave it Xpos, Ypos, Zpos - Current camera X, Y, and Z position in millimeters relative to an arbitrary (0,0) point Only available if the FRAME_POSITION option was used to generate the production Read only
- Read only Act - current act Initialized to 0 when the production starts, should be maintained by your flows if you care about its value Room - current room Set each time a new clip is jumped to or called if that clip was associated with a room in the database Read only Clip - Database ID of the current clip Can be used for comparison purposes only Read only
- Random - Returns a random integer from 0 to the largest four-byte integer
- Picture visuals have a numeric CONTENTS variable Assign a bitmap to this variable to cause that bitmap to show up in the picture visual
- call flow flowname Executes the specified flow and returns do statement-list until expression Executes the statement or statements in statement-list until the expression is not true The statements in the statement-list will be executed at least once endflow
- the runtime system may be able to provide faster switchover from spatial to dramatic video if given some advance notice that the dramatic will be required Use a load dramatic statement from a flow to give that notice
- the balance parameter indicates the desired left-right balance, with +100 indicating 100% on the right and
- the slot parameter specifies the MIDI slot to use for the file 0 and 1 are valid MIDI slots
- the looping parameters specifies whether to start the MIDI clip over again when it ends or terminate it
- the select statement is a way to choose different statement lists to execute based on the value of the select-expression
- the default mouse cursor is used when the mouse cursor is not over a hotspot or active receptacle, the held cursor is used when the mouse is over a hotspot or active receptacle visual hide v ⁇ sual_name visual show v ⁇ sual_name visual load v ⁇ sual_name visual unload v ⁇ sual_name
- Play one of five random wav files flow PlayRandomWav select case (random mod 5) case (0) call wav f ⁇ rst_wav case (1) call wav second_wav case (2) call wav th ⁇ rd_wav case (3) call wav fourth_wav case (4) call wav f ⁇ fth_wav endselect endflow FLOW LANGUAGE GRAMMAR
- Assignment - identifier " " Operand expression
- SetCursorAssignment - identifier " " Operand zero or name of cursor entry in Library. Identifier must be one of: DefaultCursor, HeldCursor, HeldHotspotCursor
- the virtual machine is assumed to have 32 registers of each of the four basic operand types numeric (four-byte integers) boolean (yes/no values), string (fixed 255-byte character strings) and memo(va ⁇ able length character strings with a two-byte length prefix ) All operands in the tables below are four-byte integers unless otherwise specified
- runtime system objects like receptacles, visuals, MIDI files and so forth are often referred to by means of an index number
- the binder assigns these index numbers They are transmitted to the runtime system either via the initialization file that it creates, in the case of external assets like MIDI files and dramatic video files, or inside the data hunk, in the case of receptacles and visuals opcode operands Meaning opSourceLine Length, text (string of 'length' Used for debugging, text string contains the text of characters) the line of flow code that generated the following instructions opCallLogicBlock Logic offset Push current location on the call stack and branch to specified offset in the logic section Continue executing instructions from that point opReturn Pop the call stack and branch to the location specified Continue executing instructions from that point opGoto Logic offset Jump to the specified offset in the logic hunk and continue executing instructions opFlowLogicStart Signals the beginning of a flow opVideoLogicStar
- Offset specified at offset 'Offset' opLoadGlobal Register Offset Loads the value at offset 'Offset' in the list of built- in variables into the register specified by 'Register' opStoreGlobal Register, Offset Stores the value in register 'Register' into the offset in the built-in variable structure that is specified by Offset' opRandom Register Generates a random number and puts it in the numeric register 'Register' opLoadVisual Register, VisualType, Loads the value at offset 'Offset' in the visual Visuallndex, Offset identified by VisualType and Visuallndex into the register 'Register' opStoreVisual Registerl , VisualType, Stores the value in 'Register' into the specified Visuallndex, Offset visual at offset 'Offset' opCallDramatic Register Calls the dramatic video file whose index is in the numeric register 'Register' opLoadDramatic Register Lo
- Reg ⁇ ster2 opMult Registei 1 Reg ⁇ ster2 Multiplies the value in numeric Registerl by the value in numeric Reg ⁇ ster2 and places the result in
- Reg ⁇ ster2 opDiv Registerl Reg ⁇ ster2 Divides the value in numeric Registerl by the value in numeric Reg ⁇ ster2 and places the result in
- Reg ⁇ ster2 Takes the boolean AND operation of the values in boolean Registerl and boolean Reg ⁇ ster2 and places the result in Reg ⁇ ster2 opOr Register 1
- Reg ⁇ ster2 Takes the boolean OR operation of the values in boolean Registerl and boolean Reg ⁇ ster2 and places the result in Reg ⁇ ster2 opNot Register]
- Reg ⁇ ster2 Concatenates the strings in string registers
- Reg ⁇ ster2 opNeg Registerl Negates the value in numeric register Registerl and leaves the result in Registerl opCmpEQ Registerl , Reg ⁇ ster2, Reg ⁇ ster3 Sets boolean register Registerl to TRUE if the value in Reg ⁇ ster2 equals the value in Reg ⁇ ster3 opCmpNE Sets boolean register Registerl to TRUE if the value in Reg ⁇ ster2 is not equal to the value m
- Reg ⁇ ster3 opCmpGT Sets boolean register Registerl to TRUE if the value in Reg ⁇ ster2 is greater than the value m
- Reg ⁇ ster3 opCmpLT Sets boolean register Registerl to TRUE if the value in Reg ⁇ ster2 is less than the value in
- Reg ⁇ ster3 opCmpGE Sets boolean register Registerl to TRUE if the value in Reg ⁇ ster2 is greater than or equal to the value in Reg ⁇ ster3 opCmpLE Sets boolean register Registerl to TRUE if the value in Reg ⁇ ster2 is less than or equal to the value in Reg ⁇ ster3 opBranchl rue Registerl , Offset Branches to the specified offset if boolean register
- Register! is TRUE opBranchFalse Registerl , Offset Branches to the specified offset if boolean register
- Registerl is FALSE opWaitForEvent Causes the system to pause until input of any kind is available to process This input might include messages from the operating system, user input, mouse events, and so forth opWaitForUser Causes the system to pause until the user types a keystroke or clicks the mouse. Useful for "Press any key to continue" messages.
- ChpExit table is used to define the exits from a given clip to another clip or flow There is one record for each transition between videos or for each flow which is called when a video frame is encountered ChpExit specifies
- FromChpFrameNumber the frame to trigger on which to trigger the clipexit ToChplD the destination chplD or nil if clip exit exists just to call a flow
- ToChpFrameNumber the target frame number in the destination clip
- ToChpCalltype either jump to target video or "call" target video
- TransitionCode a code to identify the type of transition to play between clips
- RoundaboutlD the ID for the roundabout if this clipexit is part of a roundabout (otherwise nil)
- the ChpGroup table is used to define the paths in a map There is one record for each path
- the table SpatialChps records point to records in this table in a many to one relation
- Each record specifies ID an ID for the group of clips
- Custom table is used to define all the instances of Custom visuals in the production It is joined with the visual table to determine each instance's geometry Each record specifies
- the CustomType table is used to define all the custom visual types in the production Each record specifies
- FileName a handle to the file system for the file which implements the custom visual type
- the Flow table is used to identify each of the flows in a production Each record specifies
- StartFlowlconlD the ID for the starting token (Flowlcon) for this flow TABLE Flowlcon
- the FLowlcon table is used to store the parse tree for each flow in the production Each record specifies
- IconType the token type for this flowlcon in the flows parse tree
- the FlowlconBooleanData table is used to define boolean data associated with a Flowlcon Each record specifies
- the FlowlconConnection table is used to define the parse tree connectedness of the records in the Flowlcon table Each record specifies ParentlconlD the parent of the child node
- ChildlconlD the child of the parent node
- the FlowlconlDData table is used to define ID data associated with a Flowlcon Each record specifies
- the FlowlconMemoData table is used to define memo data (variable length st ⁇ ng data) associated with a Flowlcon Each record specifies
- the FlowlconNume ⁇ cData table is used to define numeric data (32-b ⁇ t integers) associated with a
- the FlowlconSt ⁇ ngData table is used to define numeric data (32-b ⁇ t integers) associated with a Flowlcon Each record specifies
- the FramePosition table is used to define the spatial coordinate data for each frame of a video clip Because the camera is sometimes stationary, each record defines the frame at which the camera arrived at the point and the frame number at which the camera departed from the point This table is also used to derive spline points from drawing clips on the map One record exists for each video clip frame or spline point in a One-to-many relation from videoclip to FramePosition Each record specifies
- ChplD the ID of the video clip to which this frameposition record belongs
- IsStraight for splines this value is a boolean indicating if the point is rounded or straight
- RoundingFactor a factor of the computation for rounding
- the HotSpotByCLip table is used to define the video hotspots One record exist for each hotspot function on each video clip in the production Each record specifies ID the ID for this hotspot Name a user defined name for this hotspot
- ChpExit a clipexit to run when this hotspot is triggered CursorlD the ID for the cursor to display when the mouse pointer is moved over this hotspot
- the HotSpotByClipLocns table is used to define the key frame locations of hotspots on each frame of video One record exists for each hotspot on each frame Each record specifies
- HotSpotlD the ID for the Hotspot in the Hotspotbychp table
- ChplD the ID for the video clip which contains this hotspot FrameNumber the frame number to which this record applies
- IsFirstFrame an indicator if this is the first key frame
- Width Pixels the width of the hotspot area HeightPixels the height of the hotspot area
- the Keyboardlnput table is used to define the flows which are executed when the user presses specific keys or key sequences on the keyboard Each record specifies ID the ID for this record
- NonGraphicKey a boolean indicating if this is a "printable” ascn character FlowlD the flow to run when this key is pressed FlowAccessType the flow access type Oump or call) Enabled a boolean if this key response is enabled
- the Library table is used to define the file locations of all the assets used in the production To construct the full record for each asset it is necessary to join one-to-one with each of the asset tables (videochps where viewport is nil, bitmaps, or mview catalogs) ID the ID for this library record string 1 a descriptive name for this asset st ⁇ ng2 descriptive information about this asset FileName the handle to the file system location of the asset MediaType what asset type this record represents Medial D the ID of the record in each of the media tables ClipListldx the index for sorting in the use ⁇ nterface
- ThumbnailFilename the handle to the file system file which contains a an image to represent this asset
- Resourcelndex an index into the mview catalog for Mviews and albums
- the Map table is used to define the maps in the production Each map represents a unique coordinate space Each record in the map table specifies ID the ID for the map record Name a user defined name for this map
- OnginX the X coordinate of the origin O ⁇ ginY the Y coordinate of the origin ViewportlD the default viewport for clips in this map
- the MIDI table is used to store information about MIDI assets This table is joined with the library table to determine file locations Each record specifies ID the ID for this asset LibrarylD the ID into the Library table for this asset
- the Object table is used to define the objects in a production Each record specifies ID the ID for this object Name a user defined text name for this object
- SuperClassObjectlD the parent class ID for the object IsVirtual a boolean which indicates whether this is an abstract class NumberBitmapSets the number of bitmaps in the Mview which represents this object MviewFileName the file name of the Mview catalog associated with this object Mviewlndex the index in the Mview catalog associated with this object
- the OnScreenReceptacle table is used to define the receptacles for objects in the video
- One record exist for each unique geographical point in space which can hold an object
- Each record specifies ID the ID for this receptacle ObjectlD the ID for the initial object in the receptacle Name the user defined name for the receptacle MaplD the ID for the map which contains this receptacle Alignment • the alignment to be used when painting objects into this receptacle MapX the X coordinate in the map for this receptacle
- MapY the Y coordinate in the map for this receptacle
- MapZ the Z coordinate in the map for this receptacle
- the OnScreenReceptacleLocns table is used to define the key frame locations of receptacles on each frame of video
- ReceptaclelD the ID for the receptacle which this point defines ChplD the ID for the video clip which contains this receptacle
- FrameNumber the frame number in the video clip which contains this receptacle key definition IsFirstFrame a boolean which indicates if this is the first key frame PtxelsX the X pixel coordinate into the video for this receptacle PixelsY the Y pixel coordinate into the video for this receptacle
- DrawScale an arbitrary scale factor for applying to the rendering of objects in the receptacle
- the Picture table is used to define the picture frames for display of bitmaps
- Pictures are a type of visual Properties common to all visuals are stored in the Visual table and may be obtained by joining the Picture and Visual table
- LibrarylD the library record ID which contains file information for the initial bitmap
- the Property Boolean table is used to define the boolean properties of objects Each record specifies
- ID the ID for this property
- IsPerObject a boolean which indicates if the property is per object or global
- the PropertyMemo table is used to define the memo properties of objects Each record specifies ID the ID for this property Name a user defined text name for this property IsPerObject a boolean which indicates if the property is per object or global DefaultlnitialValue the default initial value
- the PropertyNume ⁇ c table is used to define the numeric properties of objects Each record specifies ID the ID for this property
- the PropertyOfObjectBoolean table is used to define the initial instance values for each object's properties Each record specifies
- Boolean Property ID the ID of the property in the respective property table ObjectlD the ID of the object 10
- InitialValue the default initial value for the object instance property
- the PropertyOfObjectMemo table is used to define the initial instance values for each object's 15 properties Each record specifies
- the PropertyOfObjectNumenc table is used to define the initial instance values for each object's properties Each record specifies
- the PropertyStnng table is used to define the string properties of objects Each record specifies
- ID the ID for this property 40 Name a user defined text name for this property
- IsPerObject a boolean which indicates if the property is per object or global
- the Roundabout table is used to define the roundabouts (places you can turn around on a path) on a map
- There is one record for for each clip in the clip group which contains the roundabout eg four records if there is a forward, left facing, right facing and backwards clip
- the actual turn is described by a collection of up to eight clipexit records which reference a specific roundabout
- Each record 50 specifies
- MaplD The ID for the map which contains this round about
- ChplD The ID for the video clip which contains this round about
- the SpatialClip table is used to define properties of video clips which are specific to video which is used on a map (spatial video) Each record is left joined one-to-one with the VideoClip table Each record specifies
- MaplD the ID for the map which contains this clip
- GroupID the ID for the group to which this clip belongs or nil if it is a single clip
- the System table is used to define global properties of the production which are used by the Edit system software Each record specifies
- AlloclD - is a counter which indicates the last used ID in the production
- DebugMode - a boolean which is used by the binder to determine if debug capabilities should be turned on in the production
- StandAlone - a boolean used by the binder to determine if all software dependencies should be resolved or referenced
- StartWmdow - an ID which indicates the startup visual
- Table SystemWindow is used to story the edit system software window positions from sessions to sessions
- WindowName the string name of the window left the left coordinate of the window in twips top the top coordinate of the window in twips width the width of the window in twips height the height of the window in twips
- VideoClip table is used to define all video in the production
- Each record represents a video asset when left joined with the Library table
- Each record represents a VideoClip on a map when left joined with the SpatialClip table
- Name a user defined name which identifies this video FrameCount the number of frames in a video asset
- ViewPortlD the ID of the viewport visual into which this clip is played
- Library ID the Library ID for the file if this is a video asset
- FrameRate the frame rate of the video asset Width the X resolution of the video asset Height the Y resolution of the video asset
- Anamorphic a boolean which indicates if the video asset was shot with an anamorphic lense
- SourceLeft the source left coordinate of the rectangle used for projection into the viewport SourceTop the source top coordinate of the rectangle used for projection into the viewport
- SourceRight the source right coordinate of the rectangle used for projection into the viewport
- SourceBottom the source bottom coordinate of the rectangle used for projection into the viewport PingPong a boolean which indicates the video clip should be played forwards and backwards in a loop
- Table ViewPort is used to define the viewports in a production There is one record for each viewport which is left joined with the visual table to determine the all the properties for the viewport visual Each record specifies
- CancelTurnFlowlD the flow ID of the flow to execute when the user signals to cancel a turn
- Visual table is used to define the common properties for each of the visual types (Viewport, Window, Picture, TextLabel etc )
- ID the ID for the visual
- ContainerlD the parent container for this visual
- MouseMoveFlowlD the ID of the flow to execute when the user moves the mouse over this visual
- MouseUpFlowlD the ID of the flow to execute when the user releases the mouse button
- the WallSegment table is used to define the decorative walls drawn on a map There is one record for each wall segment Each record specifies
- MaplD The ID for the map which contains this wall StartX .
- the Windows table is used to define the Windows in a production Windows are the type of all topmost visual containers in the visual hierarchy There is one record for each window which is left joined with the visual table to determine the all the properties for the window visual. Each record specifies
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
L'invention concerne un système multimédia navigable affichant une représentation basée sur une cinévidéo navigable (16) d'un environnement comprenant des voies d'accès, au moins un clip étant associé à chaque voie et permettant la navigation spontanée sur les voies en réponse à l'entrée utilisateur. Le système permet la transition directe entre un clip vidéo d'une voie et une voie d'intersection et entre un champ de vision le long de la voie et un autre champ de vision sur cette voie. Le système produit des zones sensibles associées à des caractéristiques d'environnement, des représentations en mode point d'objets à placer dans des réceptacles associés à certaines parties restent associées aux caractéristiques de l'environnement et les objets apparaissent en perspective appropriée quelle que soit la position du spectateur. L'invention porte aussi sur un procédé de création d'un environnement de cinévidéo navigable, qui consiste à capturer (13) dans des clips vidéo ayant certaines propriétés optiques un champ environnemental de vision le long des voies naturelles traversant l'environnement, à capturer des lutins de vision-M et d'autres éléments et à assurer une corrélation avec des clips en mode point, à ajouter et à éditer des zones sensibles, des réceptacles et d'autres composantes de production multimédia et à associer l'ensemble pour la production finale.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US1697596P | 1996-05-06 | 1996-05-06 | |
US60/016,975 | 1996-05-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997042601A1 true WO1997042601A1 (fr) | 1997-11-13 |
Family
ID=21780031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1997/007359 WO1997042601A1 (fr) | 1996-05-06 | 1997-05-02 | Procede multimedia interactif integre |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1997042601A1 (fr) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1099343A2 (fr) * | 1998-05-13 | 2001-05-16 | Infinite Pictures Inc. | Films panoramiques simulant un deplacement dans un espace multidimensionnel |
EP1132123A2 (fr) * | 2000-03-08 | 2001-09-12 | Sony Computer Entertainment Inc. | Méthode pour rejouer à un jeu, support d'enregistrement, programme, et système de divertissement |
DE10030868A1 (de) * | 2000-06-23 | 2002-01-10 | Realspace Gmbh | Verfahren, Computerprogrammprodukt, Computersystem, Netzwerkserver und Netzwerkclient zur Erzeugung einer Darstellung eines Objektes mit interaktiven Flächen in veränderlicher Ansicht |
WO2007018435A1 (fr) * | 2005-08-10 | 2007-02-15 | Telenor Asa | Procede de creation d'une animation a partir d'une serie d'images fixes preenregistrees |
US7299417B1 (en) * | 2003-07-30 | 2007-11-20 | Barris Joel M | System or method for interacting with a representation of physical space |
US7430473B2 (en) | 2004-10-01 | 2008-09-30 | Bose Corporation | Vehicle navigation display |
WO2007041696A3 (fr) * | 2005-10-04 | 2009-04-23 | Eugene J Alexander | Systeme et procede d'etalonnage d'un ensemble de dispositifs d'imagerie et calcul de coordonnees en 3d de caracteristiques detectees dans un systeme de coordonnees de laboratoire |
WO2010058334A1 (fr) * | 2008-11-21 | 2010-05-27 | Koninklijke Philips Electronics N.V. | Fusion d'une séquence vidéo et d'images fixes du même événement, sur la base de vecteurs mouvements globaux de cette vidéo |
US8223208B2 (en) | 2005-11-10 | 2012-07-17 | Motion Analysis Corporation | Device and method for calibrating an imaging device for generating three dimensional surface models of moving objects |
WO2013076478A1 (fr) * | 2011-11-21 | 2013-05-30 | Martin Wright | Supports interactifs |
US8848035B2 (en) | 2005-10-04 | 2014-09-30 | Motion Analysis Corporation | Device for generating three dimensional surface models of moving objects |
EP2858369A1 (fr) * | 2013-10-01 | 2015-04-08 | Dolby Laboratories Licensing Corporation | Filtrage matériel FIR épars efficace dans un codec vidéo |
US9651412B2 (en) | 2011-01-31 | 2017-05-16 | Sage Vision Inc. | Bottle dispenser having a digital volume display |
EP3336845A1 (fr) * | 2016-12-16 | 2018-06-20 | Samsung Electronics Co., Ltd. | Afficheur et son procédé de commande |
US10176591B2 (en) | 2012-06-15 | 2019-01-08 | Sage Vision, Inc. | Absolute position detection |
WO2019183676A1 (fr) * | 2018-03-27 | 2019-10-03 | Spacedraft Pty Ltd | Système de planification de contenu multimédia |
US10616621B2 (en) | 2018-06-29 | 2020-04-07 | At&T Intellectual Property I, L.P. | Methods and devices for determining multipath routing for panoramic video content |
US10623791B2 (en) | 2018-06-01 | 2020-04-14 | At&T Intellectual Property I, L.P. | Field of view prediction in live panoramic video streaming |
US10708494B2 (en) | 2018-08-13 | 2020-07-07 | At&T Intellectual Property I, L.P. | Methods, systems and devices for adjusting panoramic video content |
GB2579760A (en) * | 2018-03-29 | 2020-07-08 | Displaylink Uk Ltd | Position error measurement in an extended reality mobile display device |
US10812774B2 (en) | 2018-06-06 | 2020-10-20 | At&T Intellectual Property I, L.P. | Methods and devices for adapting the rate of video content streaming |
US11019361B2 (en) | 2018-08-13 | 2021-05-25 | At&T Intellectual Property I, L.P. | Methods, systems and devices for adjusting panoramic view of a camera for capturing video content |
US11064175B2 (en) | 2019-12-11 | 2021-07-13 | At&T Intellectual Property I, L.P. | Event-triggered video creation with data augmentation |
US20230343035A1 (en) * | 2022-04-22 | 2023-10-26 | George Mason University | Systems and methods for facilitating navigation in space |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5396583A (en) * | 1992-10-13 | 1995-03-07 | Apple Computer, Inc. | Cylindrical to planar image mapping using scanline coherence |
US5414801A (en) * | 1991-06-11 | 1995-05-09 | Virtus Corporation | Computerized method and apparatus using containment relationships to represent objects in a three-dimensional space, and for moving therethrough |
US5434592A (en) * | 1991-02-04 | 1995-07-18 | International Business Machines Corporation | Multimedia expansion unit |
US5444478A (en) * | 1992-12-29 | 1995-08-22 | U.S. Philips Corporation | Image processing method and device for constructing an image from adjacent images |
US5479597A (en) * | 1991-04-26 | 1995-12-26 | Institut National De L'audiovisuel Etablissement Public A Caractere Industriel Et Commercial | Imaging system for producing a sequence of composite images which combine superimposed real images and synthetic images |
US5495576A (en) * | 1993-01-11 | 1996-02-27 | Ritchey; Kurtis J. | Panoramic image based virtual reality/telepresence audio-visual system and method |
US5499146A (en) * | 1994-05-24 | 1996-03-12 | Texas Instruments Incorporated | Method and apparatus for recording images for a virtual reality system |
US5602564A (en) * | 1991-11-14 | 1997-02-11 | Hitachi, Ltd. | Graphic data processing system |
US5629732A (en) * | 1994-03-29 | 1997-05-13 | The Trustees Of Columbia University In The City Of New York | Viewer controllable on-demand multimedia service |
US5642477A (en) * | 1994-09-22 | 1997-06-24 | International Business Machines Corporation | Method and apparatus for selectably retrieving and outputting digitally stored multimedia presentations with real-time non-interrupting, dynamically selectable introduction of output processing |
US5644694A (en) * | 1994-12-14 | 1997-07-01 | Cyberflix Inc. | Apparatus and method for digital movie production |
US5650814A (en) * | 1993-10-20 | 1997-07-22 | U.S. Philips Corporation | Image processing system comprising fixed cameras and a system simulating a mobile camera |
-
1997
- 1997-05-02 WO PCT/US1997/007359 patent/WO1997042601A1/fr active Application Filing
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5434592A (en) * | 1991-02-04 | 1995-07-18 | International Business Machines Corporation | Multimedia expansion unit |
US5479597A (en) * | 1991-04-26 | 1995-12-26 | Institut National De L'audiovisuel Etablissement Public A Caractere Industriel Et Commercial | Imaging system for producing a sequence of composite images which combine superimposed real images and synthetic images |
US5414801A (en) * | 1991-06-11 | 1995-05-09 | Virtus Corporation | Computerized method and apparatus using containment relationships to represent objects in a three-dimensional space, and for moving therethrough |
US5602564A (en) * | 1991-11-14 | 1997-02-11 | Hitachi, Ltd. | Graphic data processing system |
US5396583A (en) * | 1992-10-13 | 1995-03-07 | Apple Computer, Inc. | Cylindrical to planar image mapping using scanline coherence |
US5444478A (en) * | 1992-12-29 | 1995-08-22 | U.S. Philips Corporation | Image processing method and device for constructing an image from adjacent images |
US5495576A (en) * | 1993-01-11 | 1996-02-27 | Ritchey; Kurtis J. | Panoramic image based virtual reality/telepresence audio-visual system and method |
US5650814A (en) * | 1993-10-20 | 1997-07-22 | U.S. Philips Corporation | Image processing system comprising fixed cameras and a system simulating a mobile camera |
US5629732A (en) * | 1994-03-29 | 1997-05-13 | The Trustees Of Columbia University In The City Of New York | Viewer controllable on-demand multimedia service |
US5499146A (en) * | 1994-05-24 | 1996-03-12 | Texas Instruments Incorporated | Method and apparatus for recording images for a virtual reality system |
US5642477A (en) * | 1994-09-22 | 1997-06-24 | International Business Machines Corporation | Method and apparatus for selectably retrieving and outputting digitally stored multimedia presentations with real-time non-interrupting, dynamically selectable introduction of output processing |
US5644694A (en) * | 1994-12-14 | 1997-07-01 | Cyberflix Inc. | Apparatus and method for digital movie production |
Non-Patent Citations (2)
Title |
---|
COMPUTER GRAPHICS, July 1995, CHEN et al., "Quick Time VR-An Image Based Approach to Virtual Environment Navigation", pages 29-38. * |
COMPUTER GRAPHICS, July 1995, McMILLAN et al., "Plenoptic Modelling: An Image Based Rendering System", pages 39-46. * |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1099343A4 (fr) * | 1998-05-13 | 2007-10-17 | Infinite Pictures Inc | Films panoramiques simulant un deplacement dans un espace multidimensionnel |
EP1099343A2 (fr) * | 1998-05-13 | 2001-05-16 | Infinite Pictures Inc. | Films panoramiques simulant un deplacement dans un espace multidimensionnel |
EP1132123A3 (fr) * | 2000-03-08 | 2003-06-18 | Sony Computer Entertainment Inc. | Méthode pour rejouer à un jeu, support d'enregistrement, programme, et système de divertissement |
US6724385B2 (en) | 2000-03-08 | 2004-04-20 | Sony Computer Entertainment Inc. | Method of replaying game, recording medium, program, and entertainment system |
EP1132123A2 (fr) * | 2000-03-08 | 2001-09-12 | Sony Computer Entertainment Inc. | Méthode pour rejouer à un jeu, support d'enregistrement, programme, et système de divertissement |
DE10030868A1 (de) * | 2000-06-23 | 2002-01-10 | Realspace Gmbh | Verfahren, Computerprogrammprodukt, Computersystem, Netzwerkserver und Netzwerkclient zur Erzeugung einer Darstellung eines Objektes mit interaktiven Flächen in veränderlicher Ansicht |
US7299417B1 (en) * | 2003-07-30 | 2007-11-20 | Barris Joel M | System or method for interacting with a representation of physical space |
US7430473B2 (en) | 2004-10-01 | 2008-09-30 | Bose Corporation | Vehicle navigation display |
WO2007018435A1 (fr) * | 2005-08-10 | 2007-02-15 | Telenor Asa | Procede de creation d'une animation a partir d'une serie d'images fixes preenregistrees |
US8896608B2 (en) | 2005-08-10 | 2014-11-25 | Movinpics As | Method for providing an animation from a prerecorded series of still pictures |
WO2007041696A3 (fr) * | 2005-10-04 | 2009-04-23 | Eugene J Alexander | Systeme et procede d'etalonnage d'un ensemble de dispositifs d'imagerie et calcul de coordonnees en 3d de caracteristiques detectees dans un systeme de coordonnees de laboratoire |
US8848035B2 (en) | 2005-10-04 | 2014-09-30 | Motion Analysis Corporation | Device for generating three dimensional surface models of moving objects |
US8223208B2 (en) | 2005-11-10 | 2012-07-17 | Motion Analysis Corporation | Device and method for calibrating an imaging device for generating three dimensional surface models of moving objects |
WO2010058334A1 (fr) * | 2008-11-21 | 2010-05-27 | Koninklijke Philips Electronics N.V. | Fusion d'une séquence vidéo et d'images fixes du même événement, sur la base de vecteurs mouvements globaux de cette vidéo |
US8649660B2 (en) | 2008-11-21 | 2014-02-11 | Koninklijke Philips N.V. | Merging of a video and still pictures of the same event, based on global motion vectors of this video |
JP2012509635A (ja) * | 2008-11-21 | 2012-04-19 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 同じイベントの動画及び静止画をこのビデオのグローバル運動ベクトルに基づき合併すること |
CN102224545A (zh) * | 2008-11-21 | 2011-10-19 | 皇家飞利浦电子股份有限公司 | 相同事件的视频和静态画面基于该视频的全局运动矢量的合并 |
US9651412B2 (en) | 2011-01-31 | 2017-05-16 | Sage Vision Inc. | Bottle dispenser having a digital volume display |
WO2013076478A1 (fr) * | 2011-11-21 | 2013-05-30 | Martin Wright | Supports interactifs |
US11816856B2 (en) | 2012-06-15 | 2023-11-14 | Sage Vision Inc. | Absolute position detection |
US10176591B2 (en) | 2012-06-15 | 2019-01-08 | Sage Vision, Inc. | Absolute position detection |
US9712834B2 (en) | 2013-10-01 | 2017-07-18 | Dolby Laboratories Licensing Corporation | Hardware efficient sparse FIR filtering in video codec |
US10182235B2 (en) | 2013-10-01 | 2019-01-15 | Dolby Laboratories Licensing Corporation | Hardware efficient sparse FIR filtering in layered video coding |
EP2858369A1 (fr) * | 2013-10-01 | 2015-04-08 | Dolby Laboratories Licensing Corporation | Filtrage matériel FIR épars efficace dans un codec vidéo |
US11094105B2 (en) | 2016-12-16 | 2021-08-17 | Samsung Electronics Co., Ltd. | Display apparatus and control method thereof |
EP3336845A1 (fr) * | 2016-12-16 | 2018-06-20 | Samsung Electronics Co., Ltd. | Afficheur et son procédé de commande |
WO2019183676A1 (fr) * | 2018-03-27 | 2019-10-03 | Spacedraft Pty Ltd | Système de planification de contenu multimédia |
US11360639B2 (en) | 2018-03-27 | 2022-06-14 | Spacedraft Pty Ltd | Media content planning system |
GB2579760B (en) * | 2018-03-29 | 2023-03-22 | Displaylink Uk Ltd | Position error measurement in an extended reality mobile display device |
GB2579760A (en) * | 2018-03-29 | 2020-07-08 | Displaylink Uk Ltd | Position error measurement in an extended reality mobile display device |
US11294454B2 (en) | 2018-03-29 | 2022-04-05 | Displaylink (Uk) Limited | Position error measurement in an extended reality mobile display device |
US11641499B2 (en) | 2018-06-01 | 2023-05-02 | At&T Intellectual Property I, L.P. | Field of view prediction in live panoramic video streaming |
US11190820B2 (en) | 2018-06-01 | 2021-11-30 | At&T Intellectual Property I, L.P. | Field of view prediction in live panoramic video streaming |
US10623791B2 (en) | 2018-06-01 | 2020-04-14 | At&T Intellectual Property I, L.P. | Field of view prediction in live panoramic video streaming |
US10812774B2 (en) | 2018-06-06 | 2020-10-20 | At&T Intellectual Property I, L.P. | Methods and devices for adapting the rate of video content streaming |
US10616621B2 (en) | 2018-06-29 | 2020-04-07 | At&T Intellectual Property I, L.P. | Methods and devices for determining multipath routing for panoramic video content |
US11019361B2 (en) | 2018-08-13 | 2021-05-25 | At&T Intellectual Property I, L.P. | Methods, systems and devices for adjusting panoramic view of a camera for capturing video content |
US10708494B2 (en) | 2018-08-13 | 2020-07-07 | At&T Intellectual Property I, L.P. | Methods, systems and devices for adjusting panoramic video content |
US11671623B2 (en) | 2018-08-13 | 2023-06-06 | At&T Intellectual Property I, L.P. | Methods, systems and devices for adjusting panoramic view of a camera for capturing video content |
US11064175B2 (en) | 2019-12-11 | 2021-07-13 | At&T Intellectual Property I, L.P. | Event-triggered video creation with data augmentation |
US11575867B2 (en) | 2019-12-11 | 2023-02-07 | At&T Intellectual Property I, L.P. | Event-triggered video creation with data augmentation |
US20230343035A1 (en) * | 2022-04-22 | 2023-10-26 | George Mason University | Systems and methods for facilitating navigation in space |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO1997042601A1 (fr) | Procede multimedia interactif integre | |
US5872575A (en) | Method and system for the creation of and navigation through a multidimensional space using encoded digital video | |
US5790124A (en) | System and method for allowing a performer to control and interact with an on-stage display device | |
US6570581B1 (en) | On-location video assistance system with computer generated imagery overlay | |
US9367942B2 (en) | Method, system and software program for shooting and editing a film comprising at least one image of a 3D computer-generated animation | |
US6968973B2 (en) | System and process for viewing and navigating through an interactive video tour | |
US6268864B1 (en) | Linking a video and an animation | |
US6278466B1 (en) | Creating animation from a video | |
US6081278A (en) | Animation object having multiple resolution format | |
KR101203243B1 (ko) | 상호작용적 시점 비디오 시스템 및 프로세스 | |
Miller et al. | The virtual museum: Interactive 3d navigation of a multimedia database | |
US10970843B1 (en) | Generating interactive content using a media universe database | |
US20130321575A1 (en) | High definition bubbles for rendering free viewpoint video | |
US20030227453A1 (en) | Method, system and computer program product for automatically creating an animated 3-D scenario from human position and path data | |
US8457387B2 (en) | System and method for interactive environments presented by video playback devices | |
CN102576247A (zh) | 用于互动电视的超链接3d视频插件 | |
WO2007016055A2 (fr) | Traitement de donnees tridimensionnelles | |
US7554542B1 (en) | Image manipulation method and system | |
CN114327700A (zh) | 一种虚拟现实设备及截屏图片播放方法 | |
US6924821B2 (en) | Processing pipeline responsive to input and output frame rates | |
EP1097568A2 (fr) | Creation d'une animation a partir d'une video | |
US11948257B2 (en) | Systems and methods for augmented reality video generation | |
Thorn | Learn unity for 2d game development | |
Reinhardt et al. | ADOBE FLASH CS3 PROFESSIONAL BIBLE (With CD) | |
Ichikari et al. | Mixed reality pre-visualization for filmmaking: On-set camera-work authoring and action rehearsal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
122 | Ep: pct application non-entry in european phase |