AU2003271320A1 - Multimedia playout system - Google Patents
Multimedia playout system Download PDFInfo
- Publication number
- AU2003271320A1 AU2003271320A1 AU2003271320A AU2003271320A AU2003271320A1 AU 2003271320 A1 AU2003271320 A1 AU 2003271320A1 AU 2003271320 A AU2003271320 A AU 2003271320A AU 2003271320 A AU2003271320 A AU 2003271320A AU 2003271320 A1 AU2003271320 A1 AU 2003271320A1
- Authority
- AU
- Australia
- Prior art keywords
- receiver
- buffer
- data stream
- packets
- presentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Description
S&F Ref: 659982
AUSTRALIA
PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address of Applicant: Actual Inventor(s): Address for Service: Invention Title: Canon Kabushiki Kaisha 30-2, Shimomaruko 3-chome, Ohta-ku Tokyo 146 Japan Sanjay Kumar Jha Spruson Ferguson St Martins Tower Level 31 Market Street Sydney NSW 2000 (CCN 3710000177) Multimedia Playout System ASSOCIATED PROVISIONAL APPLICATION DETAILS [33] Country [31] Applic. No(s) AU 2003900668 [32] Application Date 14 Feb 2003 The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5815c -1- MULTIMEDIA PLAYOUT SYSTEM Field The present invention relates to communication of multimedia data, and in particular to a technique for real-time playout of multimedia audio/video packet information over a network.
Background The discussion contained in this "Background" section relating to prior art arrangements is based to some degree on documents or devices which form public knowledge through their respective publication and use. Such should not be interpreted as a representation that such documents or devices in any way form part of the common general knowledge in the art.
Fig. 1 shows a block diagram of a prior-art system 100 which supports multimedia communications. The system 100 consists of a transmitter, or sender 101, which communicates as depicted by an arrow 102 to a communication network 103. A diagonal arrow 104 is used to indicate the fact that communication performance across the network 103 can vary with time. The network 103 generally does not guarantee bounds on delay and jitter, and if the network 103 is connectionless, packets can take different routes and even arrive out of sequence. The communication from the sender 101 is directed from the network 103 as depicted by an arrow 105 to a receiver 106, which then directs the data to a display 108.
The increase in processing power of personal computers (PCs) and workstations, combined with the increasing available bandwidth of high speed networks has given rise to new real-time software applications which may involve interaction between the sender 101 and the receiver 106. Many such applications use a mix of different media, including video and audio, and this type of traffic is commonly referred to as multimedia traffic.
070203 608336.doc -2- Multi-media applications have perfonrmance requirements that are substantially different from other data-oriented applications such as file transfer. Examples of multimedia applications include desktop teleconferencing, multimedia mail and documents, surveillance, course authoring, audio and video editing, TV on-screen, and video on demand. Video playback generally needs sufficient CPU performance to provide for 30 repetitions (for NTSC systems) of the grab, decompress and display sequence every second (see Fig. The compute time needed for each video frame within a stream generally varies, and CPU requirements for different video streams will thus differ. It therefore is difficult to predict the CPU requirements for video streams.
A wide variety of end-systems (exemplified by the receiver 106 and the display 108 in Fig. communications architectures (exemplified by the network 103 in Fig. 1), and system software are used to support these multi-media applications. When applications are distributed geographically, it is often the case that many disparate receivers such as 106 exchange multi-media data with each other.
Multi-media applications typically demand a "smooth" presentation of audio/video, without gaps, breaks or variations in quality.
Although the bandwidth of communication networks and the processing power of end systems are, as noted, rapidly evolving, these capabilities are not yet generally sufficient to support the desired performance requirements of the noted multi-media applications. Exacerbating this fact, low-cost consumer equipment typically does not make use of the highest performance technology that is available. In addition, the increasing penetration of mobile devices presents yet another constraint to the use of high performance presentation technology, both because of size and space limits, and adds other network performance degradation effects associated with the mobile communication environment.
070203 608336.doc -3- Interactive multi-media applications such as video conferencing have a limited tolerance for delay and jitter, where jitter is a term used to denote packet delay variance.
Jitter can be caused by a number of factors including variation in the delay imposed by the communication network 103, scheduling latencies in the operating systems underpinning the participating processing platforms 106, and variation in the time required to digitise, compress and decompress video and audio data.
Fig. 2 shows a functional block diagram of the sender 101 in Fig. 1. The video source 1000 produces a stream of video frames, which are sent as depicted by an arrow 1001 to a frame grabber/compressor 1002. The module 1002 compresses the aforementioned video frames, and sends them as depicted by an arrow 1003 to a processing module 1004. The processing module 1004 then emits the compressed and processed video frames as depicted by the arrow 102.
Video packets at 1003 in Fig. 2 are shown pictorially on a "grab" line 201 in Fig.
3. The video packets at 102 in Fig. 2 are shown pictorially on a "send" line 205 in Fig. 3.
The video packets at 105 in Fig. 1 are shown pictorially on a "receive" line 207 in Fig. 3.
The video packets at 107 in Fig. 1, for the case where the receiver 106 makes use of a real-time operating system, e.g. LynxOS T M from LinuxWOrks T M is shown on a "playout" line 209 in Fig. 3. The video packets at 107 in Fig. 1, for the case where the receiver 106 makes use of a non-real-time operating system such as Unix
T
M, is shown pictorially on a "playout" line 213 in Fig. 3.
Fig. 3 illustrates how delay and delay variation jitter) develop when the video data stream 1001 (see Fig. 2) is transmitted from the sender 101 to the display 108 in Fig. 1.
The "grab" line 201 depicts a number of video frames such as 202 which have been captured from the video source 1000 and compressed inside the frame grabber 1002 of Fig. 2. The grab line 201 depicts video frames at 1003 in Fig. 2. As depicted by 070203 608336.doc -4arrows 203 and 204, it is apparent that time intervals between successive frames are variable.
The frames depicted on the grab line 201 (ie. at 1003 in Fig. 2) are processed in the processing module 1004, and are pictorially reproduced on a "send" line 205 in Fig. 3.
The "send" line 205 depicts packets at 102 in Figs. 1 and 2. It is apparent that the time intervals between successive frames on the send line 205 are the same as those on the grab line 201, however the frames on the send line 205 have been delayed as depicted by an arrow 206. The delay 206 is introduced by the processing module 1004 in Fig. 2.
A "receive" line 207 depicts the video frames arriving, as depicted by the arrow 105, at the receiver 106 in Fig. 1. Delay imposed by the network 103 is depicted by an arrow 208. Jitter imposed by the network 103 is apparent in the change in inter-frame time intervals that is depicted on the receive line 207. It is noted that jitter can be very large in connectionless networks such as the Internet.
In order to accommodate the jitter introduced by the aggregate effects of the grabber 1002, the processor 1004 and the network 103, the receiver 106 delays each frame by an additional amount D, as depicted by an arrow 210 for a playout line 209 in Fig. 3. The playout line 209 depicts frames at 107 in Fig. 1 for the case where the operating system of the receiver 106 is a "real-time" operating system. The imposition of the delay D (ie 210) enables the frames on the playout line 209 to be played back at regular intervals R as depicted by arrows 211 and 212. It is seen that when the receiver 106 in Fig. 1 supports a real-time operating system, then the inter-frame delays 211 and 212 are equal and constant.
If the receiver 106, however, makes use of a non-real-time operating system, then the frames at 107 in Fig. 1 are depicted by a playout line 213 in Fig. 3. Accordingly, if a non real-time operating system, such as Unix is used in the receiver 106, then interframe intervals are depicted by respective arrows 217 and 216. These intervals can vary, 070203 608336.doc as shown in Fig. 3, between R+r 217) and R-r 216), where represents the variation in latency of the non real-time operating system of the receiver 106. The term (see 215 in Fig. 3) is the end-to-end latency of the first packet.
Reduction or elimination of delay jitter can often require services from the operating system of the receiver 106 and complementary services from the transport system of the network 103. These services are typically not provided in general purpose computing and networking environments.
Typically, video conferencing software lacks jitter control entirely, or makes use of very simplified mechanisms. Systems may adapt to the received video rate by discarding frames that have missed a specified arrival deadline, and may also buffer frames arriving before the deadline. Frame dropping (ie discarding) is a problematic technique with video traffic, because compression methods such as MPEG may use interframe dependent coding, in which case dropping of a single frame can cause propagation of video distortion to neighbouring frames.
Frame buffering consumes large amounts of memory, and although memory is becoming increasingly available, limits on available memory are always present.
Accordingly, currently multi-media systems have limited capability for coping with the jitter produced in network arrangements shown in Fig. 1.
Summary of the Invention It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
Disclosed are arrangements which seek to address the above problems by setting the playout time of multi-media packets using a composite adaptive scheme. A number of different metrics are used in the adaptive scheme, including the fill level of an input buffer in the receiver 106, user preferences, system attributes and traffic type.
070203 608336.doc -6- According to a first aspect of the present disclosure, there is provided a method of communicating a multi-media packet data stream from a transmitter across a network to a receiver for presentation, said method comprising the steps of: establishing performance characteristics for at least one of the transmitter and the receiver, and a presentation policy for the receiver; receiving data stream packets in a buffer in the receiver; identifying a type of the data stream packets; determining a fill level of the buffer; determining, based upon the fill level, (ii) the packet type, (iii) the performance characteristics and (iv) the presentation policy, a rate at which the packets are output from the buffer for presentation; and controlling, dependent upon the fill level, a rate at which the transmitter sends the data stream packets to the receiver.
According to another aspect of the present disclosure, there is provided a system wherein a transmitter communicates a multi-media packet data stream across a network to a receiver for presentation, said system comprising: the transmitter, which is responsive to a control signal from the receiver; the receiver, which comprises: means for establishing performance characteristics for at least one of the transmitter and the receiver, and a presentation policy for the receiver; a buffer for receiving data stream packets; means for identifying a type of the data stream packets; means for determining a fill level of the buffer; means for determining, based upon the fill level, (ii) the packet type, (iii) the performance characteristics and (iv) the presentation policy, a rate at which the packets are output from the buffer for presentation; and 070203 608336.doc -7means for sending the control signal, dependent upon the fill level, to the transmitter; wherein the rate at which the transmitter sends the data stream packets to the receiver is dependent upon the control signal.
According to another aspect of the present disclosure, there is provided a computer program for directing at least one processor to execute a method of communicating a multi-media packet data stream from a transmitter across a network to a receiver for presentation, said program being composed of at least one code module, and the program comprising: code for establishing performance characteristics for at least one of the transmitter and the receiver, and a presentation policy for the receiver; code for receiving data stream packets in a buffer in the receiver; code for identifying a type of the data stream packets; code for determining a fill level of the buffer; code for determining, based upon the fill level, (ii) the packet type, (iii) the performance characteristics and (iv) the presentation policy, a rate at which the packets are output from the buffer for presentation; and code for controlling, dependent upon the fill level, a rate at which the transmitter sends the data stream packets to the receiver.
According to another aspect of the present disclosure, there is provided a system wherein a transmitter communicates a multi-media packet data stream across a network to a receiver for presentation, the system comprising: at least one memory for storing a program; and at least one processor for executing the program, said program being composed of at least one code module, and the program comprising: code for establishing performance characteristics for at least one of the transmitter and the receiver, and a presentation policy for the receiver; 070203 608336.doc -8code for receiving data stream packets in a buffer in the receiver; code for identifying a type of the data stream packets; code for determining a fill level of the buffer; code for determining, based upon the fill level, (ii) the packet type, (iii) the performance characteristics and (iv) the presentation policy, a rate at which the packets are output from the buffer for presentation; and code for controlling, dependent upon the fill level, a rate at which the transmitter sends the data stream packets to the receiver.
Other aspects of the invention are also disclosed.
Brief Description of the Drawings Some aspects of the prior art and one or more embodiments of the present invention are described in this specification, with reference to the drawings in which: Fig. 1 is a block diagram of a prior art system for multi-media communication; Fig. 2 is a functional block diagram of a prior art sender used in Fig. 1; Fig. 3 depicts typical delay and delay variations delay jitter) of video frames in the system of Fig. 1; Fig. 4 is a functional block diagram of the disclosed multi-media playout system; Fig. 5 is a functional block diagram of the multi-media sender in Fig. 3; Fig. 6 is a functional block diagram of the multi-media receiver in Fig. 3; Fig. 7 is a functional block diagram of the Quality of Service (QoS) module in the receiver of Fig. 6; Fig. 8 is a process flowchart depicting the disclosed method of multi-media playout; Fig. 9 depicts variation between playout time and display time for video frames; Fig. 10 depicts data flow and computation sequencing in the process of Fig. 8; and 070203 608336.doc -9- Fig. 11 shows one example of implementation of the disclosed multi-media playout system in an interconnected computer network.
Detailed Description including Best Mode Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of'. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings.
The proposed network architecture Fig. 4 shows a functional block diagram of a network arrangement that can support the disclosed multi-media playout technique. A sender 301 communicates multimedia traffic, as depicted by an arrow 302, over the network 103 to a receiver 303. The receiver 303 communicates the multi-media traffic data stream, as depicted by the arrow 107, to the display 108 for presentation. The receiver 303 communicates a control feedback signal, as depicted by a dashed arrow 304, back to the sender 301. A timing reference module 305 communicates respective timing signals to the sender 301 and the receiver 303, as depicted by respective dashed arrows 306 and 307. Although a centralised timing reference module 305 is depicted, other timing arrangements can be used. Thus for example, the sender 301 can distribute a timing signal together with the multimedia traffic, and the receiver 303 can extract the timing signal for its own use. The sender 301 and the receiver 303 as well as elements in the network 103 can use general purpose operating systems and general computing devices such as work-stations, Personal Computers (PCs), hand held computing devices, TV set top boxes and so on.
070203 608336.doc The sender Fig. 5 is a functional block diagram of a multi-media sender that can be used with the disclosed multi-media playout technique. The sender 301 contains a video source 405, a frame grabber/compressor 406, and a frame processor 407 which are similar to those in the sender 101 of Fig. 2. However, the output from the frame processor 407 is communicated, as depicted by an arrow 408, to a sender buffer 401. The buffer 401 outputs packets, as depicted by the arrow 302 to the receiver 303 of Fig. 3. A controller/scheduler 403 controls the rate at which packets are output from the buffer 401 by means of a control signal depicted by a dashed arrow 402. The controller/scheduler 403 receives a feedback signal depicted by a dashed arrow 304 and a timing signal depicted by the dashed arrow 306. Output packets are time-stamped with their transmit time (shown as 219) in Fig. 3 for the first packet).
The feedback signal 304 directs the controller/scheduler 403 to vary the output rate of packets from the sender buffer 401. Furthermore, this feedback effect is propagated, as depicted by a dashed arrow 404, to the frame grabber/compressor in order to vary the rate at which frames are produced by that module 406.
The receiver Fig. 6 shows a functional block diagram of the multi-media receiver 303 for the disclosed playout technique. Data packets are received from the sender 301, as depicted by the arrow 302. These packets are input into a buffer 501, which sends the packets to a decoder 505 as depicted by an arrow 504. A controller/schedule 503 controls the rate at which packets are output from the buffer 501 using a control signal depicted by a dashed arrow 502. A Quality of Service (QoS) module 507 controls the controller/scheduler module 503 using a control signal depicted by a dashed arrow 508. The QoS module 507 receives information on occupancy of the buffer 501, the information being depicted by a 070203 608336.doc I -11dashed arrow 506. The QoS module 507 receives the timing signal 307 from the timing reference 305, and provides the feedback signal 304 back to the sender 301. The decoder 505 outputs the video frames for display as depicted by the arrow 107.
The QoS module 507 therefore controls the playout rate from the buffer 501, and it performs this control function on the basis of interaction with a number of other modules as will be described in relation to Figs. 7-10. The QoS module 507 uses a composite adaptive scheme to adapt to changing network conditions in the network 103.
User preferences, specific implementation details of the sender 301 and the receiver 303 as well as attributes of the multi-media data stream itself all play a part in the operation of the QoS module 507.
The QoS Module Fig. 7 is a functional block diagram of the QoS module 507 in Fig. 5. The heart of the QoS module 507 is a QoS decision engine 605.
The QoS decision engine 605 is the brain of the disclosed playout arrangement.
The engine 605 interacts with other modules in the receiver 303 to provide a smooth presentation of media streams to the end user. For example, based on pre-defined user input (ie presentation policy) or application characteristics (which are known a priori by the user at run time), the QoS engine 605 will set the playout mode to be either interactive or non-interactive. The engine 605 can set a degradation policy based on user policy, and can initiate commencement of feedback via 304 to the sender 301 if high packet loss is detected at the receiver buffer 501 by the buffer monitor 602. The QoS decision engine 605 also monitors the playout of streams. In layered encoding schemes, the engine 605 may skip the layers that are of least significance.
A buffer monitor 602 receives information on occupancy of the receiver buffer 501 as depicted by the dashed arrow 506. In turn, the buffer monitor 602 provides three types of information, as depicted by a single dashed arrow 609, to the QoS decision 070203 608336.doc -12engine 605. In the first instance, the buffer monitor 602 measures the type of data in the receiver buffer 501 by examining header information in the data packets in the buffer 501.
Secondly, the buffer monitor 602 provides information on the buffer occupancy of the receiver buffer 501 by determining the average number of packets in the buffer 501, or some alternate measure. Thirdly, the buffer monitor 602 provides information on packet sequence numbers to the QoS decision engine 605. The buffer fill level can be monitored, for example, using a scheme proposed by Yuang et a! which uses an Interrupted Poisson Process to measure the buffer fill level (see M. Yuang, S.T. Liang, and Y.G Chen "Dynamic video play-out smoothing method for multimedia applications", Multimedia Tools and Applications, 6:47-60, June 1998).
The QoS decision engine 605 uses the buffer occupancy information in order to direct, as depicted by a dashed arrow 614, a feedback controller 601. The feedback controller 601 in turn generates the feedback signal, depicted by the dashed arrow 304, to the controller/scheduler 403 in the sender 301. The feedback signal can be generated, for example, using a scheme described by Rowe et al which uses a penalty scheme as the feedback signal for a rate controller at the sender. In Rowe, if frames are queued (but not played) and two consecutive frames are missed, penalty information is sent to vary the sending rate (see L.A. Rowe and B.C. Smith "A Continuous Media Player", 3 rd International Workshop on Network and Operating System Support for Digital Audio and Video, pp 334-344, San Diego, California, 1992). The sender controller/schedule 403 throttles back the output packets from the sender buffer 401 using the control signal depicted by a dashed arrow 402.
The QoS decision engine 605 makes an estimate of packet loss by considering sequence numbers of received packets, where this information is provided by the buffer monitor 602. Thus, for example, if the packets n+l, n+2 and n+3 have been received, while packet n is missing, then packet n is considered lost. If the packet loss exceeds a 070203 608336.doc -13predetermined threshold, then the feedback controller 601 directs the controller/scheduler 403 of the sender 301, via the dashed arrow 304, to throttle back the output rate of packets from the sender buffer 401. From an implementation perspective, the packet loss information (ie the feedback signal 304) can be sent as a NACK (negative acknowledgment) packet, or using RTCP messages if the number of participants in the multi-media session is small. For large multicast conferences, however, the packet loss information (ie the feedback signal 304) can be sent periodically as a NACK packet, by sending the information once every 100 packets or at least once every two minutes, in order to avoid an uncontrollable increase in NACK traffic.
The QoS decision engine 605 in the QoS module 507 also controls the controller/scheduler 503 in the receiver, as depicted by the dashed arrow 508. The controller/scheduler 503, in turn, controls the rate at which output packets are emitted from the receiver buffer 501.
The QoS decision engine 605 also controls the decoder 505 in the receiver 303 using a control signal depicted by a dashed arrow 509. This control signal can effect the mode of operation of the decoder 505, by reducing the resolution of the output of the decoder 505 for example.
A display time estimator 603 estimates a display cycle time for successive video frames, and provides information in regard thereto, as depicted by a dashed arrow 610 to the QoS decision engine 605.
The user of the multi-media playout arrangement can establish display policies 604, also referred to as presentation attributes. These display policies are communicated, as depicted by a dashed arrow 611, to the QoS decision engine 605. Accordingly, the user can influence, or even over-ride the decision(s) made by the QoS engine 605. This user control can be based, for example, on the personal viewing preferences of the user, or can be based on information derived from monitoring the perceived Quality of Service of the 070203 608336.doc -14presented session. For example, a user may prefer a higher frame rate at a lower resolution if computing and/or resources are scarce. Another user may prefer higher resolution at lower frame rates. The user may also choose a lower quality for cost reasons, not being willing to bear the cost associated with a higher quality in a network that supports multiple grades of service. From a user interface perspective, the system may provide an interface such as a sliding bar to control the frame rate, and a pop-down menu to select the presentation window size. In addition, a user may select interactive versus down-load mode.
The user preference inputs can be provided to the QoS decision engine 605 from the policy module 604 in form of the following policies, for example: Policy 1: rate and resolution both equally weighted Policy 2: rate more important than resolution Policy 3: resolution more important than rate Policy 4: QoS decision engine to optimise Specific implementation details of the particular sender 301 and the particular receiver 303, such as CPU speed(s) also have a bearing on the performance and performance limits of the disclosed system. These implementation details, also referred to as system performance characteristics or system calibration parameters, are stored in a system calibration module 606, and the associated implementation information is provided, as depicted by a dashed arrow 612, to the QoS decision engine 605. From a practical perspective, the most significant implementation details in the system calibration module 606 relate to the implementation details of the receiver 303 rather than that of the sender 301.
The system calibration module 606 maintains historic data about the capability of the end systems 303/108. For example, based on types of CPU used, the calibration 070203 608336.doc module 606 sets a maximum possible rate and resolution for compressed video. The QoS decision engine 605 takes these calibration values into account in defining the maximum achievable quality that can be supported by the end devices 303/108.
The buffer monitor 602 provides, as depicted by an arrow 608, information on the type of data in the receiver buffer 501 to a profile database module 607. The profile database module 607 stores a number of different display policies, which are used in accordance with the type of multi-media traffic being received. The particular policy selected in accordance with the type of data traffic detected by the buffer monitor is communicated, as depicted by a dashed arrow 613, to the QoS decision engine 605.
The QoS decision engine 605 also receives the timing signal 307 from the timing reference module 305.
The multi-media plavout process Fig. 8 depicts process flowchart threads 700 and 700' for the receiver of Figs. 6 and 7.
PROVIDING FEEDBACK TO THE SENDER The thread 700' relates to provision of the feedback signal 304 from the receiver 303 to the sender 301. A step 708 monitors packet sequence numbers for packets in the receiver buffer 501. This is performed by the receiver buffer 501 and the buffer monitor 602, as indicated by the reference numerals 501/602 at the top right corner of the step 708 in Fig. 8. A subsequent step 709 determines a feedback signal as a function of packet loss, which is determined by monitoring the sequence numbers as noted in regard to the step 708. The determination of the feedback signal as a function of packet loss is performed by the QoS decision engine 605. Thereafter, a step 710 sends the aforementioned feedback signal to the sender 301, and in particular to the controller/scheduler 403 of the sender. Sending of the feedback signal is performed by 070203 608336.doc -16the QoS decision engine 605 together with the feedback controller 601. The process thread 700' then returns to the step 708.
READING FRAMES OUT FROM THE RECEIVER The thread 700 relates to how frames are read out from the receiver buffer 501 in order to achieve the advantageous multi-media playout technique disclosed.
The frame delay variability of frames output by the receiver buffer 501 is controlled by controlling a "destination wait time" D which determines how long a frame remains in the buffer 501 before being output. For audio traffic, a new D is established at the beginning of each talk-spurt. The term "talk-spurt" is used to reflect the fact that people generally talk for a while, after which they pause, and then start talking again, thereby talking in "spurts". D is defined so that the receiver 303 has at least one frame in the buffer 501 for most of the time. This reduces the frequency and duration of gaps caused by late arrival of frames, but also results in increased latency in the receiver 303.
Selection of the value of D is thus extremely important. In the disclosed multi-media playout arrangement, the QoS module 507 and the controller/scheduler 503 attempt to play a frame from the buffer 501 each destination wait time interval D where D is the aggregate value of DA, jitter and any other receiver delays (see the next paragraph).
Turning to the process thread 700, a first step 701 determines transmit and arrival times (ti and ai respectively) for frames in the receiver buffer 501. This is performed by the buffer 501 together with the buffer monitor 602. Thereafter, a step 702 determines jitter 'j and desired playout delay "DA (also referred to as adapted delay value) in accordance with Equations below. Determination of the jitter and the desired playout delay is performed by the controller/scheduler 503 in the receiver 303.
The necessary information is provided from the QoS decision engine 605 to the controller/scheduler 503 by means of a signal depicted by the dashed arrow 508. This is 070203 608336.doc -17performed by the receiver buffer 501, the buffer monitor 602, the QoS decision engine 605 and the controller/scheduler 503 in the receiver 303.
DETERMINING JITTER AND ADAPTED DELAY The adapted delay value DA and jitter j are estimated using the following equations: DAi a DAi- (1 a) di; [1] ji= *ji I- (1 I DAi di 1; [2] where di ai ti; ai arrival time of packet i ti transmit time of packet i DAi andji are updated for each packet but are used only at the beginning of each talkspurt. It is found that best results are achieved by using a 0.125 for delay estimates and a .875 for jitter estimates.
A subsequent step 703 ascertains the send time Si for the packet being considered, this being derived from the time stamp in the packet header which is determined by the buffer monitor 602. Using this send time, together with the adapted delay value DAi and jitterji previously determined in accordance with Equations the desired playout time Pi, Pk is determined by the controller/scheduler 503 in accordance with Equations as follows: DETERMINING PLAYOUT TIME The playout time for the first packet of the talkspurt Pi is calculated using: Pi Si DA n [3] where Pi Playout time for frame i 070203 608336.doc
I
-18- Si Send time for frame i (this may be set equal to the transmit time ti) ji= Jitter to be used for frame i (this is a smoothed value based on historic data) DAi Adapted delay value to be used for frame i (this is a smoothed value based on historic data) n= 1,2, The larger the value of n, the more packets are played out, at the expense of longer playout delays. This parameter can be used to control the delay vs loss tradeoff in the playout algorithm.
The playout point of all subsequent packets Pk in a talkspurt is calculated using: Pk Pi Sk Si. [4] where k i 1, i 2, The implementation of Equations and is performed by the combination of the receiver buffer 501, the buffer monitor 602, the QoS decision engine 605 and the receiver controller/schedule 503.
Thereafter, in the step 704 the display time estimation module 603 ascertains the display time or display cycle DRi, this being based on an initial display time signal which is communicated, as depicted by a dashed line 510, from the decoder 505. This display cycle DRi is updated by an estimation process in accordance with Equations in the display time estimation module 603, in order to determine an estimated (updated) display cycle time DRi.
ESTIMATING DISPLAY TIME The display time estimator module 603 maintains an estimate of the display cycle time DRi of video frames. The aggregate of the decoding, decompressing and render time of video frame is defined as the display cycle time DRi. The Packet scheduler 503 needs this estimate to set up the playout point Pi as described in Equation The estimate of display cycle DRi is performed using a Single Exponential Smoothing (SES) 070203 608336.doc -19technique. This method is used because of its suitability for performing forecasting using a large number of historic data items. The computing resources used by this smoothing method is low in terms of both CPU and memory. The following equation is used to calculate the forecasted value:
F+
1 aX (1 O 1. where
F
t Forecast value of DRi for t th period X Current value of DRi a Parameter chosen by user Equation may be written in the following form: F+1 F, a(X [6] F1+1 Ft [7] Where, et forecast error for period t Equation shows that the forecast SES value is the previous forecast plus an adjustment for the error that occurred in that previous forecast. The closer the a value is to the greater the adjustment for error in the previous forecast. On the other hand, the lower the value of a (ie the closer the value is to the less the adjustment that is applied.
It should be noted that SES will always lag behind any trend in the actual data as it adjusts only a certain percentage of the most recent error. The best performance values are obtained for a .4 to .6.
Alternatively, the Adaptive-Response-Rate Single Exponential Smoothing Method (ARRSES) can be used to calculate the forecast value. This allows the value of a to change in response to changes in data pattern. The a value is automatically changed 070203 608336.doc from period to period to allow for changes in the structure of data. Fluctuations in a can be controlled by putting an upper bound on how much a is allowed to change from one period to the next.
F, is calculated using equation where, at+, I. [8]
M,
E, =fe, 0 <pll. [9] M, fletl (1 f)M- 1 where Parameter chosen by user and et is the error term defined by equation et Xt- Ft. [11] Another two-parameter method of forecasting can be used, which smoothes the trend values separately. This provides greater flexibility, since it allows the trend to be smoothed with a different parameter than that used on the original series. The forecast can be found using two smoothing constants (with values between 0 and 1) and the three equations: S, aXt (1 a) (St-i 0< ac 1. [12] b, (St-S t- 1 (1 0 yr 1. [13] Ft St b, [14] where St Smoothed Value for t th period a,y Parameters chose by user Equation [12] adjusts St for the trend of previous period, bot-, by adding it to the last smoothed value, St-1. This eliminates the lag in responding to changes in trend.
070203 608336.doc -21- Equation [13] keeps the updated value of the trend. Finally equation [14] is used to forecast by adding the trend b, to the smoothed value St.
This estimated display cycle time DRi is communicated to the QoS decision engine 605 via a dashed arrow 610 which then communicates with the controller/scheduler 503 as depicted by the dashed arrow 508. Equations [15] [17] are used to determine updated values for the desired playout times Pk as follows: DETERMINING UPDATED VALUES FOR PLAYOUT TIME For audio packets, the playout time Pi and "display time" (which in the case of audio refers to the audio decode time) are assumed to be the same, since the required time to play out the audio packet is very small. Video frames however are more complex, and may involve a substantial amount of CPU time to decode and display a frame.
Accordingly, upon receipt of first video frame (in contrast to a packet in audio), the playout time Pi is estimated using the following formula: PI Si +DA n *j DR] where, DR 1 Average value for display cycle (this is derived from the historic database of variable values, since no value is available for the current session for the first frame) In this approach the compressed video frames are stored in the receiver buffer 501 as they arrive. A delay factor DAi n j is added to the send time S of the frame.
The playout time P for subsequent frames can be calculated in one of two ways. In the case of non-interactive sessions such as video on demand applications (where DAi n *ji can be as large as one second) subsequent frames are calculated as Pk= Sk DA n *j IG DRk. [16] where k= 2,3 IG Inter-frame gap (If the frame rate is 25 fps then IG 1000/25 DA andj are smoothed values kept over a time period, and updated for every new packet 070203 608336.doc -22- For interactive sessions such as video-conferencing where there is a maximum bound on end-to-end latency and IG is not known in advance, it is preferable to maintain the timing relationship between arriving frames. A modified equation [17] is thus used to calculate the playout time of subsequent frames as follows: Pk Pi Sk- Si- DRk. [17] As shown in Fig. 9, a playout point 1201 is DR units (ie 1202) before the actual display time 1203. Display times are represented by dark black rectangles on the Display line 1204. If the estimate of DR is accurate, it should possible to display a frame close to the display time 1203. In the case of continuous voice traffic, the factor DA n j is updated only between talkspurts. Video and music, however, do not have talkspurts or silence periods. Video thus requires a different approach which is compression algorithm dependent. For MPEG video streams, for example, the factor DA n j can be changed at the beginning of each Group Of Pictures (GOP).
Returning to Fig. 8, a subsequent step 705 accounts for information from the policy module 604, the profile module 607, and the calibration module 606, after which a step 706 finalises the playout time and applies it to the next packet in the receiver buffer 501. The process thread 700 then returns to the step 701.
Data flow and computation sequencing Fig. 10 shows a data flow and computation sequence diagram 1100. Fig. depicts the flow and distribution of processing, as exemplified by the Equations [17] among the various functional modules in the described arrangement. It should, however, be apparent that other distributions of functionality can be used.
The buffer monitor 602 (see Fig. 7) determines packet sequence numbers n+l, n+2, as depicted at 1101. These sequence numbers are communicated, as depicted by an arrow 1102, to the feedback controller module 601, which in turn produces the feedback signal 304.
070203 608336.doc -23- The buffer monitor 602 also produces data "type" information at 1103, this being derived from the headers of packets in the receiver buffer 501. This "type" data is communicated, as depicted by an arrow 608 in Fig. 7 to the traffic profile module 607.
The buffer monitor 602 also provides, in respect of each packet arriving at the receiver buffer 501, an arrival time ai as depicted at 1104. The arrival times are communicated, as depicted by an arrow 1105, to the receiver controller/scheduler 503.
The buffer monitor 602 also provides, for every packet, and based on information in the packet header (see Fig. 10), a transmit time ti, as depicted at 1106. This transmit time ti is communicated, as depicted by an arrow 1107, to the controller/scheduler 503. The controller/scheduler 503 processes the aforementioned arrival time ai and transmit time ti in accordance with Equations and to thereby produce an adapted delay value DAi and jitter valueji, as depicted at 1112.
The buffer monitor 602 also provides, for every packet arriving at the receiver buffer 501, a packet send time Si derived from time-stamp information provided in the packet header. This is depicted at 1108. This send time Si is communicated, as depicted by an arrow 1109, to the controller/scheduler module 503. The send time Si, the adapted delay value DAi, and jitterji are processed in accordance with Equations and to thereby determine audio playout times Pk, as depicted at 1114.
A value for display time, also referred to as display cycle DRi is provided, as depicted by a dashed arrow 510, from the decoder module 505 in the receiver 303, to the display time estimation module 603. The module 603 processes the display time DRi in accordance with Equations [14] to thereby output, as depicted at 1111, and updated estimated) display cycle time DRi. This is communicated, as depicted by an arrow 1110, to the controller/scheduler module 503 for processing in accordance with Equations [15] The processed output at 1117 represents playout times for video frames Pi Pk.
070203 608336.doc
I
-24- An implementation example of the multi-media plavout system Fig. 11 shows how the disclosed multi-media play-out technique can be practiced on a system of interconnected computer systems 900, such as that shown in Fig.
11 wherein the receiver processes of Fig. 8, and the corresponding sender processes (not described in detail in this specification) may be implemented as software, such as application program(s) executing within the interconnected computers in the computer system 900. In particular, the steps of method of multi-media playout are effected by software instructions that are carried out by the computer(s), with the sender and receiver software modules being depicted by respective shaded rectangles 926 and 923.
The respective instructions may be formed as one or more code modules, each for performing one or more particular tasks. The receiver code modules can be organised along functional lines as depicted in Figs. 5-7. Relevant software modules may also be divided into two separate parts, in which a first part performs the relevant aspects of the multi-media playout methods, and a second part manages a user interface between the first part and the user. The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer from the computer readable medium, and then executed by the computer. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer preferably effects an advantageous apparatus for performing multi-media playout.
The system of interconnected computers 900 is formed by two computers 922 and 925 communicating over a network 920. The computer 922 is depicted in some detail. The computer 925 is essentially the same as the computer 922, and is thus shown and described in less detail. It is noted that both the computer 925 incorporates audio/video frame grabbers as depicted in Fig. 2.
070203 608336.doc The computer 922 includes a computer module 901, input devices such as a keyboard902 and mouse903, output devices including a printer915, a display device914 and loudspeakers917. A Modulator-Demodulator (Modem) transceiver device 916 is used by the computer module 901 for communicating to and from a communications network 920, for example connectable via a telephone line 921 or other functional medium. The modem 916 is used to obtain access to the computer 925 and can also be used to obtain access to the Internet, and other network systems, such as a Local Area Network (LAN) or a Wide Area Network (WAN), and may be incorporated into the computer module 901 in some implementations.
The computer module 901 typically includes at least one processor unit 905, and a memory unit 906, for example formed from semiconductor random access memory (RAM) and read only memory (ROM). The software module(s) 923 implementing the receiver multi-media playout processes are shown to reside in the memory unit 906. The module 901 also includes an number of input/output interfaces including an audiovideo interface 907 that couples to the video display 914 and loudspeakers 917, an I/O interface913 for the keyboard902 and mouse903 and optionally a joystick (not illustrated), and an interface908 for the modem916 and printer915. In some implementations, the modem 916 may be incorporated within the computer module 901, for example within the interface 908. A storage device 909 is provided and typically includes a hard disk drive 910 and a floppy disk drive 911. A magnetic tape drive (not illustrated) may also be used. A CD-ROM drive 912 is typically provided as a nonvolatile source of data. The components 905 to 913 of the computer module 901, typically communicate via an interconnected bus 904 and in a manner which results in a conventional mode of operation of the computer system 900 known to those in the relevant art. Examples of computers on which the described arrangements can be 070203 608336.doc -26practised include IBM-PC's and compatibles, Sun Sparcstations or alike computer systems evolved therefrom.
Typically, the receiver multi-media playout application program is resident on the hard disk drive 910 and read and controlled in its execution by the processor 905.
Intermediate storage of the program and any data fetched from the network 920 may be accomplished using the semiconductor memory 906, possibly in concert with the hard disk drive 910. Fig. 11 shows the multi-media playout receiver application 923 in an intermediate storage state in the memory module 906. In some instances, the application program(s) may be supplied to the user encoded on a CD-ROM or floppy disk and read via the corresponding drive 912 or 911, or alternatively may be read by the user from the network 920 via the modem device 916. Still further, the software can also be loaded into the computer system 900 from other computer readable media. The term "computer readable medium" as used herein refers to any storage or transmission medium that participates in providing instructions and/or data to the computer system 900 for execution and/or processing. Examples of storage media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 901. Examples of transmission media include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
The method of multi-media playout may alternatively be implemented in dedicated hardware module(s) such as one or more integrated circuits performing the functions or sub functions of multi-media playout. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and 070203 608336.doc -27associated memories and sender/receiver modules can be incorporated into the respective computers 925 and 922 as desired.
Industrial Applicability It is apparent from the above that the arrangements described are applicable to the communication industry.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
070203 608336.doc
Claims (4)
1. A method of communicating a multi-media packet data stream from a transmitter across a network to a receiver for presentation, said method comprising the steps of: establishing performance characteristics for at least one of the transmitter and the receiver, and a presentation policy for the receiver; receiving data stream packets in a buffer in the receiver; identifying a type of the data stream packets; determining a fill level of the buffer; determining, based upon the fill level, (ii) the packet type, (iii) the performance characteristics and (iv) the presentation policy, a rate at which the packets are output from the buffer for presentation; and controlling, dependent upon the fill level, a rate at which the transmitter sends the data stream packets to the receiver.
2. A system wherein a transmitter communicates a multi-media packet data stream across a network to a receiver for presentation, said system comprising: the transmitter, which is responsive to a control signal from the receiver; the receiver, which comprises: means for establishing performance characteristics for at least one of the transmitter and the receiver, and a presentation policy for the receiver; a buffer for receiving data stream packets; means for identifying a type of the data stream packets; means for determining a fill level of the buffer; 070203 608336.doc -29- means for determining, based upon the fill level, (ii) the packet type, (iii) the performance characteristics and (iv) the presentation policy, a rate at which the packets are output from the buffer for presentation; and means for sending the control signal, dependent upon the fill level, to the transmitter; wherein the rate at which the transmitter sends the data stream packets to the receiver is dependent upon the control signal.
3. A computer program for directing at least one processor to execute a method of communicating a multi-media packet data stream from a transmitter across a network to a receiver for presentation, said program being composed of at least one code module, and the program comprising: code for establishing performance characteristics for at least one of the transmitter and the receiver, and a presentation policy for the receiver; code for receiving data stream packets in a buffer in the receiver; code for identifying a type of the data stream packets; code for determining a fill level of the buffer; code for determining, based upon the fill level, (ii) the packet type, (iii) the performance characteristics and (iv) the presentation policy, a rate at which the packets are output from the buffer for presentation; and code for controlling, dependent upon the fill level, a rate at which the transmitter sends the data stream packets to the receiver.
4. A system wherein a transmitter communicates a multi-media packet data stream across a network to a receiver for presentation, the system comprising: at least one memory for storing a program; and 070203
608336.doc I at least one processor for executing the program, said program being composed of at least one code module, and the program comprising: code for establishing performance characteristics for at least one of the transmitter and the receiver, and a presentation policy for the receiver; code for receiving data stream packets in a buffer in the receiver; code for identifying a type of the data stream packets; code for determining a fill level of the buffer; code for determining, based upon the fill level, (ii) the packet type, (iii) the performance characteristics and (iv) the presentation policy, a rate at which the packets are output from the buffer for presentation; and code for controlling, dependent upon the fill level, a rate at which the transmitter sends the data stream packets to the receiver. A method of communicating a multi-media packet data stream substantially as described herein with reference to Figs. 3-11. 6. A system for communicating a multi-media packet data stream across a network substantially as described herein with reference to Figs. 3-11. 7. A computer program for directing a processor to execute a method of communicating a multi-media packet data stream substantially as described herein with reference to Figs. 3-11. 070203 608336.doc
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2003271320A AU2003271320A1 (en) | 2003-02-14 | 2003-12-22 | Multimedia playout system |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2003900668A AU2003900668A0 (en) | 2003-02-14 | 2003-02-14 | Multimedia playout system |
AU2003900668 | 2003-02-14 | ||
AU2003271320A AU2003271320A1 (en) | 2003-02-14 | 2003-12-22 | Multimedia playout system |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2003271320A1 true AU2003271320A1 (en) | 2004-09-02 |
Family
ID=34378349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2003271320A Abandoned AU2003271320A1 (en) | 2003-02-14 | 2003-12-22 | Multimedia playout system |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2003271320A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012154156A1 (en) * | 2011-05-06 | 2012-11-15 | Google Inc. | Apparatus and method for rendering video using post-decoding buffer |
US10103999B2 (en) | 2014-04-15 | 2018-10-16 | Dolby Laboratories Licensing Corporation | Jitter buffer level estimation |
-
2003
- 2003-12-22 AU AU2003271320A patent/AU2003271320A1/en not_active Abandoned
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012154156A1 (en) * | 2011-05-06 | 2012-11-15 | Google Inc. | Apparatus and method for rendering video using post-decoding buffer |
US10103999B2 (en) | 2014-04-15 | 2018-10-16 | Dolby Laboratories Licensing Corporation | Jitter buffer level estimation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7652994B2 (en) | Accelerated media coding for robust low-delay video streaming over time-varying and bandwidth limited channels | |
Sen et al. | Online smoothing of variable-bit-rate streaming video | |
US9544602B2 (en) | Wireless video transmission system | |
US7657672B2 (en) | Packet scheduling for data stream transmission | |
US7898950B2 (en) | Techniques to perform rate matching for multimedia conference calls | |
US8356327B2 (en) | Wireless video transmission system | |
US8018850B2 (en) | Wireless video transmission system | |
US7490342B2 (en) | Content provisioning system and method | |
US8351762B2 (en) | Adaptive media playout method and apparatus for intra-media synchronization | |
US20070067480A1 (en) | Adaptive media playout by server media processing for robust streaming | |
Saparilla et al. | Optimal streaming of layered video | |
US20100091888A1 (en) | Multi-Rate Encoder with GOP Alignment | |
CA2803449C (en) | Adaptive frame rate control for video in a resource limited system | |
Su et al. | Smooth control of adaptive media playout for video streaming | |
US20110010625A1 (en) | Method for Manually Optimizing Jitter, Delay and Synch Levels in Audio-Video Transmission | |
KR100924309B1 (en) | Quality adaptive streaming method using temporal scalability and system thereof | |
JP2002290974A (en) | Transmission rate control method | |
CN112073751B (en) | Video playing method, device, equipment and readable storage medium | |
Feamster | Adaptive delivery of real-time streaming video | |
Xie et al. | Rate-distortion optimized dynamic bitstream switching for scalable video streaming | |
AU2003271320A1 (en) | Multimedia playout system | |
Bouras et al. | Streaming multimedia data with adaptive QoS characteristics | |
CN105306970B (en) | A kind of control method and device of live streaming media transmission speed | |
Luo et al. | Video streaming over the internet with optimal bandwidth resource allocation | |
Laraspata et al. | A scheduling algorithm for interactive video streaming in umts networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MK1 | Application lapsed section 142(2)(a) - no request for examination in relevant period |