JP6105717B2

JP6105717B2 - Improved block request streaming system for handling low latency streaming

Info

Publication number: JP6105717B2
Application number: JP2015509146A
Authority: JP
Inventors: マイケル・ジー・ルビー; マーク・ワトソン; ロレンツォ・ヴィシザーノ; パヤム・パクザド; ビン・ワン; イン・チェン; トーマス・ストックハンマー; ジャバー・モハンマド・ボラン
Original assignee: クアルコム，インコーポレイテッド
Priority date: 2012-04-26
Filing date: 2013-04-25
Publication date: 2017-03-29
Anticipated expiration: 2033-04-25
Also published as: KR101741484B1; TWI492598B; TW201408020A; CN104221390B; BR112014026741A2; BR112014026741B1; CA2869311A1; MY166917A; RU2629001C2; PH12014502203B1; KR20150003296A; BR112014026741A8; RU2014147463A; JP2015519813A; HK1203015A1; WO2013163448A1; PH12014502203A1; CN104221390A; EP2842336A1; CA2869311C

Description

本発明は、改善されたメディアストリーミングシステムおよび方法に関し、より具体的には、ストリーミングされたメディアのプレゼンテーションを最適化するためにネットワークおよびバッファの状態に適応し、ストリーミングされたメディアデータの効率的かつ同時の配信、またはタイムリーに分配される配信を可能にする、システムおよび方法に関する。 The present invention relates to an improved media streaming system and method, and more particularly to adapting to network and buffer conditions to optimize the presentation of streamed media, and to efficiently and efficiently streamed media data. The present invention relates to systems and methods that enable simultaneous delivery or timely delivery.

高品質のオーディオおよびビデオが、インターネット、セルラーおよびワイヤレスネットワーク、電力線ネットワーク、ならびに他のタイプのネットワークのようなパケットベースのネットワークを通じて配信されることがより一般的になるにつれて、ストリーミングメディア配信はますます重要になり得る。配信されるストリーミングメディアが提示され得る際の品質は、元のコンテンツの解像度(または他の属性)、元のコンテンツの符号化品質、メディアを復号し提示するための受信側デバイスの能力、受信機において受信される信号の適時性および品質などを含む、多数の要因に依存し得る。知覚される良好なストリーミングメディア体験を作り出すには、受信機において受信される信号のトランスポートおよび適時性が特に重要であり得る。良好なトランスポートは、送信者が送信するものに対する、受信機において受信されるストリームの忠実性を提供することができ、一方適時性は、コンテンツに対する最初の要求の後で受信機がどれだけ迅速にそのコンテンツの再生を開始できるかを表し得る。 As high quality audio and video is more commonly delivered over packet-based networks such as the Internet, cellular and wireless networks, power line networks, and other types of networks, streaming media delivery is increasingly Can be important. The quality at which delivered streaming media can be presented includes the resolution (or other attributes) of the original content, the encoding quality of the original content, the ability of the receiving device to decode and present the media, the receiver Can depend on a number of factors, including the timeliness and quality of the signal received at. In order to create a perceived good streaming media experience, the transport and timeliness of the signal received at the receiver may be particularly important. Good transport can provide fidelity of the stream received at the receiver to what the sender sends, while timeliness is how fast the receiver is after the initial request for content. Can indicate whether or not the reproduction of the content can be started.

メディア配信システムは、メディアソース、メディア送信先、およびソースと送信先とを分離するチャネル(時間および/または空間における)を有するシステムとして特徴付けられ得る。通常、ソースは、電気的に管理可能な形式のメディアへのアクセス権がある送信機と、メディア(またはその類似物)の受信を電気的に制御しメディアをメディア利用者(たとえば、受信機、記憶デバイスまたは要素、別のチャネルなどと何らかの方式で結合された表示デバイスを有するユーザ)に提供する能力のある受信機とを含む。 A media distribution system may be characterized as a system having a media source, a media destination, and a channel (in time and / or space) that separates the source and destination. Typically, a source is a transmitter that has access to an electronically manageable form of media and an electronic control of the reception of the media (or the like) to control the media (e.g., receiver, A storage device or element, another channel, etc., and a receiver capable of providing to a user having a display device coupled in some manner.

多くの変形が可能であるが、一般的な例では、メディア配信システムは、電気的な形式のメディアコンテンツへのアクセス権を有する1つまたは複数のサーバを有し、1つまたは複数のクライアントシステムまたはデバイスは、メディアに対する要求をサーバへ行い、サーバは、サーバの一部として送信機を使用してメディアを搬送し、受信されたメディアが何らかの方式でクライアントによって利用され得るように、クライアントにおける受信機に送信する。単純な例では、所与の要求および応答に対して1つのサーバおよび1つのクライアントがあるが、そうである必要はない。 Although many variations are possible, in a general example, a media distribution system has one or more servers with access to media content in electrical form, and one or more client systems Or the device makes a request for media to the server, the server uses the transmitter as part of the server to carry the media, and the reception at the client so that the received media can be utilized by the client in some manner. To the machine. In a simple example, there is one server and one client for a given request and response, but this need not be the case.

従来、メディア配信システムは、「ダウンロード」モデルまたは「ストリーミング」モデルのいずれかとして特徴付けられ得る。「ダウンロード」モデルは、メディアデータの配信と、ユーザまたは受信者デバイスに対するメディアの再生との間の、タイミングの独立性により特徴付けられ得る。 Traditionally, media distribution systems can be characterized as either a “download” model or a “streaming” model. The “download” model can be characterized by timing independence between the delivery of media data and the playback of the media to the user or recipient device.

ある例として、メディアは、メディアが必要とされるときまたは使用されるであろうときよりも十分に前にダウンロードされ、メディアが使用されるときには、必要とされるだけの量がすでに受信者において利用可能である。ダウンロードの文脈における配信は、HTTP、FTP、またはFile Delivery over Unidirectional Transport(FLUTE)のようなファイルトランスポートプロトコルを使用して実行されることが多く、配信レートは、TCP/IPのような、背後にあるフローおよび/または混雑制御プロトコルにより決まり得る。フローまたは混雑制御プロトコルの動作は、ユーザまたは宛先デバイスに対するメディアの再生とは独立であってよく、メディアの再生はダウンロードと同時に、またはそれ以外の時に行われ得る。 As an example, the media is downloaded well before the media is needed or will be used, and when the media is used, the amount needed is already at the recipient. Is available. Delivery in the download context is often performed using a file transport protocol such as HTTP, FTP, or File Delivery over Unidirectional Transport (FLUTE), and the delivery rate is behind, such as TCP / IP Depending on the flow and / or congestion control protocol in The operation of the flow or congestion control protocol may be independent of media playback for the user or destination device, and media playback may occur at the same time as the download or otherwise.

「ストリーミング」モードは、メディアデータの配信のタイミングと、ユーザまたは受信者デバイスに対するメディアの再生のタイミングとの、密接な結合により特徴付けられ得る。この文脈における配信は、制御のためのReal Time Streaming Protocol(RTSP)およびメディアデータのためのReal Time Transport Protocol(RTP)のような、ストリーミングプロトコルを使用して実行されることが多い。配信レートは、ストリーミングサーバによって決定されることがあり、データの再生レートと一致することが多い。 The “streaming” mode can be characterized by a close coupling between the timing of delivery of the media data and the timing of playback of the media to the user or recipient device. Distribution in this context is often performed using streaming protocols such as Real Time Streaming Protocol (RTSP) for control and Real Time Transport Protocol (RTP) for media data. The delivery rate may be determined by the streaming server and often matches the data playback rate.

L. Rizzo、「Effective Erasure Codes for Reliable Computer Communication Protocols」、Computer Communication Review、27(2):24-36 (1997年4月)L. Rizzo, "Effective Erasure Codes for Reliable Computer Communication Protocols", Computer Communication Review, 27 (2): 24-36 (April 1997) Bloemer他、「An XOR-Based Erasure-Resilient Coding Scheme」、Technical Report TR-95-48、International Computer Science Institute、カリフォルニア州バークレー(1995)Bloemer et al., `` An XOR-Based Erasure-Resilient Coding Scheme '', Technical Report TR-95-48, International Computer Science Institute, Berkeley, California (1995)

「ダウンロード」モデルのいくつかの欠点は、配信と再生とのタイミングの独立性により、メディアデータのいずれもが、再生のために必要とされるときに利用可能ではないことがあり(たとえば、メディアのデータレートよりも利用可能な帯域幅が少ないことが原因で)、再生の一時的な停止を引き起こし(「ストール」)、悪いユーザ体験をもたらすこと、またはメディアデータが再生のはるかに前にダウンロードされることが必要とされることがあり(たとえば、メディアのデータレートよりも利用可能な帯域幅が大きいことが原因で)、乏しいことがある受信側デバイスの記憶リソースを消費すること、および、コンテンツが最終的に再生されなければ、または別様に使用されなければ無駄になり得る配信のために貴重なネットワークリソースを消費することであり得る。 Some disadvantages of the “download” model are that due to the timing independence of delivery and playback, none of the media data may be available when needed for playback (e.g., media Caused by a temporary suspension of playback (`` stall ''), resulting in a bad user experience, or media data downloaded much before playback Consume storage resources of the receiving device, which may be scarce (e.g., due to greater available bandwidth than the media data rate), and A valuable network resource for delivery that can be wasted if the content is not finally played or otherwise used It may be to consume.

「ダウンロード」モデルの利点には、そのようなダウンロードを実行するために必要とされる技術、たとえばHTTPが非常に成熟しており、広範な用途にわたって広く展開されており適用可能であるということであり得る。ダウンロードサーバ、およびそのようなファイルダウンロードの大きな拡張性に対する解決法(たとえば、HTTPウェブサーバおよびコンテンツ配信ネットワーク)は容易に利用可能であることがあり、この技術に基づくサービスの展開を単純および低コストにする。 The advantage of the “download” model is that the technologies required to perform such downloads, such as HTTP, are very mature, widely deployed and applicable across a wide range of uses. possible. Download servers, and solutions for such file download great scalability (eg, HTTP web servers and content delivery networks) may be readily available, making deployment of services based on this technology simple and low cost To.

「ストリーミング」モデルのいくつかの欠点は、メディアデータの配信のレートが一般に、サーバからクライアントへの接続において利用可能な帯域幅に適合しないこと、および、帯域幅と遅延保証を提供する特別なストリーミングサーバまたはより複雑なネットワークアーキテクチャが必要とされることであり得る。利用可能な帯域幅に応じた配信データレートの変動に対応するストリーミングシステムが存在するが(たとえば、Adobe Flash Adaptive Streaming)、これらは一般に、すべての利用可能な帯域幅の利用において、TCPのようなダウンロードトランスポートフロー制御プロトコルほど効率的ではない。 Some drawbacks of the “streaming” model are that the rate of delivery of media data generally does not match the bandwidth available in the server-to-client connection, and special streaming that provides bandwidth and delay guarantees There may be a need for servers or more complex network architectures. There are streaming systems that respond to variations in delivery data rate depending on available bandwidth (e.g. Adobe Flash Adaptive Streaming), but these are generally similar to TCP in all available bandwidth usage. Not as efficient as the download transport flow control protocol.

最近、「ストリーミング」モデルと「ダウンロード」モデルの組合せに基づく新たなメディア配信システムが、開発され展開されている。そのようなモデルの例は、本明細書では「ブロック要求ストリーミング」モデルと呼ばれ、メディアクライアントは、HTTPのようなダウンロードプロトコルを使用して、サービングインフラストラクチャからメディアデータのブロックを要求する。そのようなシステムにおける課題は、ストリームの再生を開始する能力、たとえば、パーソナルコンピュータを使用して、受信されたオーディオおよびビデオストリームを復号しレンダリングし、コンピュータスクリーン上にビデオを表示しかつ内蔵スピーカーを通じてオーディオを再生すること、または別の例として、セットトップボックスを使用して、受信されたオーディオおよびビデオストリームを復号しレンダリングし、テレビジョンディスプレイデバイス上にビデオを表示しかつステレオシステムを通じてオーディオを再生することであり得る。 Recently, new media distribution systems based on the combination of “streaming” and “download” models have been developed and deployed. An example of such a model is referred to herein as a “block request streaming” model, where a media client requests a block of media data from a serving infrastructure using a download protocol such as HTTP. The challenge in such systems is the ability to initiate playback of the stream, eg, using a personal computer to decode and render the received audio and video streams, display the video on a computer screen and through the built-in speakers Play audio or, as another example, use a set-top box to decode and render received audio and video streams, display video on a television display device, and play audio through a stereo system Could be.

他の課題、たとえば、復号のレイテンシを最小限にし、利用可能なCPUリソースの使用を減らすために、ソースのストリーミングレートについていくのに十分高速にソースブロックを復号できるようにすることも問題である。別の課題は、システムのコンポーネントが故障しても受信機に配信されるストリームの品質に悪影響を与えることのない、ロバストで拡張性のあるストリーミング配信の方法を提供することである。プレゼンテーションが分配されるにつれて、プレゼンテーションについての高速に変換する情報に起因して、他の問題が発生し得る。したがって、改善された処理および装置を有することが望ましい。 It is also a problem to be able to decode the source block fast enough to keep up with the source streaming rate, for example, to minimize decoding latency and reduce the use of available CPU resources. . Another challenge is to provide a robust and scalable method of streaming delivery that does not adversely affect the quality of the stream delivered to the receiver if a system component fails. As the presentation is distributed, other problems may arise due to the fast converting information about the presentation. Therefore, it is desirable to have an improved process and apparatus.

ブロック要求ストリーミングシステムは、そのようなシステムのユーザ体験および帯域幅の効率性の改善をもたらし、通常は、従来のファイルサーバによって提供される形式(HTTP、FTPなど)のデータを生成する取込システムを使用し、取込システムは、コンテンツを取り込んで、キャッシュを含むこともまたは含まないこともある、ファイルサーバによって提供されるべきファイルまたはデータ要素としてコンテンツを準備する。 Block request streaming systems provide improved user experience and bandwidth efficiency for such systems, and capture systems that typically generate data in the format (HTTP, FTP, etc.) provided by traditional file servers , The capture system captures the content and prepares the content as a file or data element to be provided by the file server, which may or may not include a cache.

ある実施形態によれば、ブロック要求ストリーミングシステムのメディアサーバは、メディアプレゼンテーションコンテンツの低レイテンシストリーミングを可能にする。ライブプロファイルストリーミングのための比較的大きなセグメントファイルは、低レイテンシストリーミングのために、比較的小さなメディアフラグメントから統合され得る。メディアセグメントおよびメディアフラグメントは、同じ符号化プロトコルに従って符号化される。 According to one embodiment, the media server of the block request streaming system enables low latency streaming of media presentation content. A relatively large segment file for live profile streaming can be integrated from a relatively small media fragment for low latency streaming. Media segments and media fragments are encoded according to the same encoding protocol.

以下の発明を実施するための形態は、添付の図面とともに、本発明の性質および利点のさらなる理解をもたらす。 The following detailed description, together with the accompanying drawings, provides a further understanding of the nature and advantages of the present invention.

本発明の実施形態によるブロック要求ストリーミングシステムの要素を示す図である。FIG. 3 illustrates elements of a block request streaming system according to an embodiment of the present invention. ブロックサービングインフラストラクチャ(「BSI」)に結合されて、コンテンツ取込システムによって処理されるデータを受信する、クライアントシステムの要素をさらに詳細に示す、図1のブロック要求ストリーミングシステムを示す図である。FIG. 2 illustrates the block request streaming system of FIG. 1 showing in more detail elements of a client system coupled to a block serving infrastructure (“BSI”) that receives data processed by a content capture system. 取込システムのハードウェア/ソフトウェアの実装形態を示す図である。It is a figure which shows the hardware / software implementation form of an acquisition system. クライアントシステムのハードウェア/ソフトウェアの実装形態を示す図である。It is a figure which shows the hardware / software implementation form of a client system. セグメントおよびmedia presentation description(「MPD」)ファイル、ならびに、MPDファイル内のセグメントの内訳、タイミング、および他の構造を含む、図1に示されるコンテンツ記憶装置のあり得る構造を示す図である。FIG. 2 illustrates a possible structure of the content storage device shown in FIG. 1, including segment and media presentation description (“MPD”) files, as well as the breakdown, timing, and other structure of the segments in the MPD file. 図1および図5に示されるコンテンツ記憶装置に記憶され得る、通常のソースセグメントの詳細を示す図である。FIG. 6 is a diagram showing details of a normal source segment that can be stored in the content storage device shown in FIGS. 1 and 5. ファイル内の単純なおよび階層的インデクシングを示す図である。FIG. 6 illustrates simple and hierarchical indexing within a file. ファイル内の単純なおよび階層的インデクシングを示す図である。FIG. 6 illustrates simple and hierarchical indexing within a file. メディアストリームの複数のバージョンにわたって揃えられた探索点を伴う可変のブロックサイジングを示す図である。FIG. 6 illustrates variable block sizing with search points aligned across multiple versions of a media stream. メディアストリームの複数のバージョンにわたって揃えられていない探索点を伴う可変のブロックサイジングを示す図である。FIG. 6 illustrates variable block sizing with search points that are not aligned across multiple versions of a media stream. メタデータテーブルを示す図である。It is a figure which shows a metadata table. サーバからクライアントへのブロックおよびメタデータテーブルの送信を示す図である。It is a figure which shows transmission of the block and metadata table from a server to a client. RAPの境界とは独立のブロックを示す図である。It is a figure which shows the block independent of the boundary of RAP. 複数のセグメントにまたがる連続的なタイミングおよび不連続的なタイミングを示す図である。It is a figure which shows the continuous timing and discontinuous timing over a some segment. スケーラブルなブロックのある態様を示す図である。It is a figure which shows the certain aspect of a scalable block. ブロック要求ストリーミングシステム内のいくつかの変数の経時的な展開のグラフィカルな表現を示す図である。FIG. 3 shows a graphical representation of the evolution over time of some variables in a block request streaming system. ブロック要求ストリーミングシステム内のいくつかの変数の経時的な展開の別のグラフィカルな表現を示す図である。FIG. 6 shows another graphical representation of the evolution over time of some variables in a block request streaming system. 閾値の関数として状態のセルグリッドを示す図である。FIG. 6 shows a cell grid of states as a function of threshold. 要求ごとに単一のブロックおよび複数のブロックを要求できる、受信機において実行され得るプロセスのフローチャートである。FIG. 4 is a flowchart of a process that may be performed at a receiver that may request a single block and multiple blocks per request. 柔軟性のあるパイプラインプロセスのフローチャートである。Figure 5 is a flowchart of a flexible pipeline process. ある時間における、要求の候補セット、要求の優先順位、およびどの接続で要求が出され得るかの例を示す図である。FIG. 4 is a diagram illustrating an example of a candidate set of requests, request priority, and on which connection a request can be issued at a certain time. ある時間から別の時間までの間に変化した、要求の候補セット、要求の優先順位、およびどの接続で要求が出され得るかの例を示す図である。FIG. 4 is a diagram illustrating an example of a candidate set of requests, request priority, and on which connection a request can be issued, changed from one time to another. ファイル識別子に基づく一貫性のあるキャッシングプロキシサーバの選択のフローチャートである。FIG. 6 is a flowchart of selecting a consistent caching proxy server based on a file identifier. 適切な式言語に対するシンタックス定義を示す図である。It is a figure which shows the syntax definition with respect to a suitable formula language. 適切なハッシュ関数の例を示す図である。It is a figure which shows the example of a suitable hash function. ファイル識別子構築ルールの例を示す図である。It is a figure which shows the example of a file identifier construction rule. TCP接続の帯域幅の変動を示す図である。It is a figure which shows the fluctuation | variation of the bandwidth of a TCP connection. ソースデータおよび修復データに対する複数のHTTP要求を示す図である。It is a figure which shows the some HTTP request with respect to source data and repair data. FECを伴う、およびFECを伴わない、例示的なチャネルザッピング時間を示す図である。FIG. 3 illustrates an exemplary channel zapping time with FEC and without FEC. 図1に示される取込システムの一部として、ソースセグメントおよび制御パラメータから修復セグメントを生成する、修復セグメント生成器の詳細を示す図である。FIG. 2 shows details of a repair segment generator that generates a repair segment from source segments and control parameters as part of the capture system shown in FIG. ソースブロックと修復ブロックとの関係を示す図である。It is a figure which shows the relationship between a source block and a repair block. クライアントにおける異なる時間でのライブサービスのための手順を示す図である。It is a figure which shows the procedure for the live service in a different time in a client. 低レイテンシストリーミングのためのメディアフラグメントとメディアフラグメントとの関係を示す図である。It is a figure which shows the relationship between the media fragment for low latency streaming, and a media fragment.

図面において、同様の項目は同様の数字により参照され、同様のまたは同一の項目の複数のインスタンスを示すために、サブインデックスが括弧の中で与えられる。別段示されない限り、最後のサブインデックス(たとえば、「N」または「M」)は、いずれの特定の値にも限定することは意図されず、1つの項目のインスタンスの数は、同じ数が示されサブインデックスが再使用される場合であっても、別の項目のインスタンスの数とは異なり得る。 In the drawings, similar items are referred to by similar numerals, and sub-indexes are given in parentheses to indicate multiple instances of similar or identical items. Unless otherwise indicated, the last subindex (e.g., `` N '' or `` M '') is not intended to be limited to any particular value, and the number of instances of an item is indicated by the same number. Even if the subindex is reused, it can be different from the number of instances of another item.

本明細書で説明されるように、ストリーミングシステムの目的は、メディアをその記憶位置(またはメディアが生成されている位置)から、メディアが利用される位置、すなわち、メディアがユーザに提示され、または人または電子的な利用者によって別様に「使い果たされる」位置へと移すことである。理想的には、ストリーミングシステムは、受信側において中断されない再生(またはより一般的には、中断されない「消費」)を実現することができ、ユーザが1つまたは複数のストリームを要求してからすぐに、1つのストリームまたはストリームの集合体の再生を開始することができる。効率性の理由で、ユーザがあるストリームから別のストリームに切り替えるとき、または、ユーザがストリーム、たとえば「サブタイトル」ストリームのプレゼンテーションに従うときのように、ストリームがもはや必要とされないことをユーザが示すと、各ストリームが停止されることも望ましい。ビデオなどのメディアコンポーネントの提示が続けられるが、このメディアコンポーネントを提示するために異なるストリームが選択される場合、限られた帯域幅を新たなストリームで専有し、古いストリームを停止することが好ましいことが多い。 As described herein, the purpose of a streaming system is to use the media from its storage location (or the location where the media is generated), the location where the media is utilized, i.e., the media is presented to the user, It is moved to a location that is otherwise “used up” by a human or electronic user. Ideally, a streaming system can achieve uninterrupted playback (or more generally, uninterrupted “consumption”) at the receiver, as soon as the user requests one or more streams. In addition, playback of one stream or a collection of streams can be started. For efficiency reasons, when the user indicates that the stream is no longer needed, such as when switching from one stream to another, or when the user follows a presentation of a stream, for example a “subtitle” stream, It is also desirable that each stream be stopped. If a media component such as video continues to be presented, but a different stream is selected to present this media component, it is preferable to occupy the limited bandwidth with the new stream and stop the old stream There are many.

本明細書で説明される実施形態によるブロック要求ストリーミングシステムは、多くの利点をもたらす。一部の適用形態は、本明細書で説明される特徴のすべてよりも少ないものによって、適切に満足のいく体験を提供し得るので、実行可能なシステムが本明細書で説明される特徴のすべてを含む必要はないことを理解されたい。 The block request streaming system according to the embodiments described herein provides many advantages. Some applications may provide a reasonably satisfactory experience with less than all of the features described herein, so that a viable system has all of the features described herein. It should be understood that it is not necessary to include.

HTTPストリーミング
HTTPストリーミングは、特定のタイプのストリーミングである。HTTPストリーミングでは、ソースは、標準的なウェブサーバおよびコンテンツ配信ネットワーク(CDN)であってよく、標準的なHTTPを使用してよい。本技法は、ストリーミングのセグメント化および複数のストリームの使用を伴ってよく、すべてが標準化されたHTTP要求という文脈の範囲内にある。ビデオなどのメディアは、複数のビットレートで符号化されて、異なるバージョンまたは表現を形成し得る。「バージョン」および「表現」という用語は、本文書では同義的に使用される。各バージョンまたは表現は、場合によっては各々が数秒というオーダーの、より小さな断片へと分解され、セグメントを形成し得る。各セグメントは次いで、別個のファイルとしてウェブサーバまたはCDNに記憶され得る。 HTTP streaming
HTTP streaming is a specific type of streaming. For HTTP streaming, the source may be a standard web server and content delivery network (CDN) and may use standard HTTP. This technique may involve streaming segmentation and the use of multiple streams, all within the context of a standardized HTTP request. Media such as video may be encoded at multiple bit rates to form different versions or representations. The terms “version” and “representation” are used interchangeably in this document. Each version or representation can be broken down into smaller pieces, sometimes in the order of seconds, to form segments. Each segment can then be stored on the web server or CDN as a separate file.

クライアント側では、クライアントによってシームレスに一緒につなぎ合わされた個々のセグメントに対して、HTTPを使用して要求が行われ得る。クライアントは、利用可能な帯域幅に基づいて、異なるデータレートへと切り替えることができる。クライアントはまた、各々が異なるメディアコンポーネントを提示する、複数の表現を要求することができ、これらの表現の中のメディアを、一緒におよび同時に提示することができる。切り替えを引き起こすものには、たとえば、バッファ占有率およびネットワーク測定結果があり得る。安定状態で動作している場合、クライアントは、サーバに対する要求を調整して、目標のバッファ占有率を維持することができる。 On the client side, requests can be made using HTTP for individual segments that are seamlessly stitched together by the client. The client can switch to a different data rate based on the available bandwidth. The client can also request multiple representations, each presenting a different media component, and the media in these representations can be presented together and simultaneously. What causes the switch can be, for example, buffer occupancy and network measurement results. When operating in a steady state, the client can adjust requests to the server to maintain the target buffer occupancy.

HTTPストリーミングの利点には、ビットレートの適応、高速な始動および探索、ならびに最小限の不必要な配信があり得る。これらの利点は、配信を再生の時間のすぐ前となるように制御すること、利用可能な帯域幅を最大限に利用すること(可変ビットレートのメディアを通じて)、ならびに、ストリーミングのセグメント化およびインテリジェントなクライアント側手順を最適化することによるものである。 The advantages of HTTP streaming may include bit rate adaptation, fast startup and searching, and minimal unnecessary delivery. These benefits include controlling delivery to be in front of playback time, maximizing available bandwidth (through variable bit rate media), and streaming segmentation and intelligent By optimizing client-side procedures.

クライアントがファイルの集合体(たとえば、本明細書では3gpセグメントと呼ばれる、3GPPによって規定されるフォーマットの)を使用してストリーミングサービスをユーザに提供できるように、media presentation descriptionが、HTTPストリーミングクライアントに提供され得る。media presentation description、および場合によってはこのmedia presentation descriptionの更新は、含まれるメディアを同期した方式でクライアントが提示でき、探索、ビットレートの切り替え、および異なる表現の中のメディアコンポーネントの結合プレゼンテーションなどの進んだ機能を提供できるように、メディアコンポーネントを各々含む、セグメントの構造化された集合体であるメディアプレゼンテーションを記述する。クライアントは、サービスのプロビジョニングのために、様々な方法でmedia presentation description情報を使用することができる。具体的には、データがクライアントの能力およびストリーミングサービス内のユーザに対して有用であるように、media presentation descriptionから、HTTPストリーミングクライアントは、集合体の中のどのセグメントがアクセスされ得るかを判定することができる。 A media presentation description is provided to the HTTP streaming client so that the client can provide streaming services to the user using a collection of files (eg, in the format defined by 3GPP, referred to herein as the 3gp segment) Can be done. The media presentation description and, in some cases, updates to this media presentation description, allow clients to present the included media in a synchronized manner, and include advanced search, bit rate switching, and combined presentation of media components in different representations. Describes a media presentation, which is a structured collection of segments, each containing media components so that it can provide functionality. The client can use the media presentation description information in various ways for service provisioning. Specifically, from the media presentation description, the HTTP streaming client determines which segments in the aggregate can be accessed so that the data is useful to the client capabilities and users in the streaming service. be able to.

いくつかの実施形態では、media presentation descriptionは不変であり得るが、セグメントは動的に作成され得る。media presentation descriptionは、サービスに対するアクセスおよびダウンロードの時間を最小限にするために、可能な限り小型であり得る。他の専用のサーバ接続性、たとえば、クライアントとサーバとの間の定期的または頻繁なタイミングの同期は、最小限にされ得る。 In some embodiments, the media presentation description may be unchanged, but the segments may be created dynamically. The media presentation description can be as small as possible to minimize access and download times for the service. Other dedicated server connectivity, eg, regular or frequent timing synchronization between the client and server, can be minimized.

メディアプレゼンテーションは、異なるタイプのアクセスネットワークへのアクセス、異なる現在のネットワーク状態、ディスプレイサイズ、アクセスビットレート、およびコーデックのサポートなどの、異なる能力を伴う端末によるアクセスを許容するように構築され得る。クライアントは次いで、適切な情報を抽出して、ストリーミングサービスをユーザに提供することができる。 Media presentations can be constructed to allow access by terminals with different capabilities, such as access to different types of access networks, different current network conditions, display sizes, access bit rates, and codec support. The client can then extract the appropriate information and provide the streaming service to the user.

media presentation descriptionはまた、要件に従って、デプロイメントの柔軟性および小型化を可能にし得る。 The media presentation description may also allow deployment flexibility and miniaturization according to requirements.

最も単純な場合には、各Alternative Representationは、単一の3GPファイル、すなわち、3GPP TS26.244で定義されるものに準拠するファイルに、または、ISO/IEC 14496-12または派生した仕様(3GPP Technical Specification 26.244に記載される3GPファイルフォーマットなど)において定義されるようなISOベースのメディアファイルフォーマットに準拠する任意の他のファイルに、記憶され得る。この文書の残りの部分では、3GPファイルに言及するとき、ISO/IEC 14496-12および派生した仕様は、すべての説明された特徴を、ISO/IEC 14496-12または任意の派生した仕様において定義されるようなより一般的なISOベースのメディアファイルフォーマットへと対応付けることができることを理解されたい。クライアントは次いで、ファイルの最初の部分を要求して、メディアのメタデータ(通常は、「moov」ボックスとも呼ばれるMovieヘッダボックスに記憶される)を、ムービーフラグメントの時間およびバイトのオフセットとともに知ることができる。クライアントは次いで、HTTP partial get要求を出して、要求されたムービーフラグメントを取得することができる。 In the simplest case, each Alternative Representation is either a single 3GP file, i.e. a file that conforms to that defined in 3GPP TS26.244, or ISO / IEC 14496-12 or a derived specification (3GPP Technical It can be stored in any other file that conforms to the ISO-based media file format as defined in 3GP file format described in Specification 26.244. In the rest of this document, when referring to 3GP files, ISO / IEC 14496-12 and derived specifications are defined in ISO / IEC 14496-12 or any derived specification. It should be understood that this can be mapped to a more general ISO-based media file format. The client then requests the first part of the file and knows the media metadata (usually stored in the Movie header box, also called the “moov” box), along with the time and byte offset of the movie fragment. it can. The client can then issue an HTTP partial get request to obtain the requested movie fragment.

いくつかの実施形態では、各表現をいくつかのセグメントへと分割することが望ましいことがある。セグメントフォーマットが3GPファイルフォーマットに基づく場合、セグメントは、「時間的分割」と呼ばれる、ムービーフラグメントの重複しない時間スライスを含む。これらのセグメントの各々は、複数のムービーフラグメントを含んでよく、各々は、それ自体が有効な3GPファイルであり得る。別の実施形態では、表現は、メタデータ(通常はMovieヘッダ「moov」ボックス)を含む初期セグメントとメディアセグメントのセットとに分割され、各メディアセグメントがメディアデータを含み、初期セグメントおよび任意のメディアセグメントの連結物が有効な3GPファイルを形成するとともに、初期セグメントおよび1つの表現のすべてのメディアセグメントの連結物が有効な3GPファイルを形成する。全体の表現が、各セグメントを再生し、次いで各表現の開始時間に従って、ファイル内のローカルのタイムスタンプをグローバルなプレゼンテーション時間と対応付けることによって形成され得る。 In some embodiments, it may be desirable to divide each representation into several segments. If the segment format is based on the 3GP file format, the segment includes non-overlapping time slices of movie fragments, called “temporal partitioning”. Each of these segments may include multiple movie fragments, each of which can be a valid 3GP file in itself. In another embodiment, the representation is divided into an initial segment containing metadata (usually a Movie header “moov” box) and a set of media segments, each media segment containing media data, the initial segment and any media The concatenation of segments forms a valid 3GP file, and the concatenation of all media segments of the initial segment and one representation forms a valid 3GP file. The entire representation can be formed by playing each segment and then associating a local timestamp in the file with a global presentation time according to the start time of each representation.

この説明の全体で、「セグメント」への言及は、完全にもしくは部分的に構築される、または記憶媒体から読み取られる、または、たとえばHTTP要求を含むファイルダウンロードプロトコル要求の結果として別様に取得される任意のデータオブジェクトを含むものと理解されるべきであることに、留意されたい。たとえば、HTTPの場合、データオブジェクトは、ディスク上、またはHTTPサーバに接続される、もしくはその一部を形成する他の記憶媒体上に存在する実際のファイルに記憶されてよく、または、データオブジェクトは、HTTP要求に応答して実行される、CGIスクリプト、または他の動的に実行されるプログラムによって構築され得る。「ファイル」および「セグメント」という用語は、別段規定されない限り、本文書では同義的に使用される。HTTPの場合、セグメントは、HTTP要求応答のエンティティボディであると見なされ得る。 Throughout this description, references to “segments” are fully or partially constructed, or read from a storage medium, or otherwise obtained as a result of a file download protocol request including, for example, an HTTP request. Note that it should be understood to include any data object. For example, in the case of HTTP, the data object may be stored in an actual file that resides on disk or other storage medium connected to or forming part of an HTTP server, or the data object is It can be built by CGI scripts or other dynamically executed programs that are executed in response to HTTP requests. The terms “file” and “segment” are used interchangeably in this document unless otherwise specified. In the case of HTTP, a segment may be considered the entity body of an HTTP request response.

「プレゼンテーション」および「コンテンツアイテム」という用語は、本文書では同義的に使用される。多くの例において、プレゼンテーションは、定められた「再生」時間を有する、オーディオ、ビデオ、または他のメディアプレゼンテーションであるが、他の変形も可能である。 The terms “presentation” and “content item” are used interchangeably in this document. In many examples, the presentation is an audio, video, or other media presentation with a defined “play” time, but other variations are possible.

「ブロック」および「フラグメント」という用語は、別段規定されない限り、本文書では同義的に使用され、インデックスが付けられたデータの最小の集合体を全般に指す。利用可能なインデクシングに基づいて、クライアントは、異なるHTTP要求でフラグメントの異なる部分を要求することができ、または、1つのHTTP要求で1つまたは複数の連続的なフラグメントまたはフラグメントの部分を要求することができる。ISOベースのメディアファイルフォーマットベースのセグメントまたは3GPファイルフォーマットベースのセグメントが使用される場合、フラグメントは通常、ムービーフラグメントヘッダ(「moof」)ボックスとメディアデータ(「mdat」)ボックスの組合せとして定義されるムービーフラグメントを指す。 The terms “block” and “fragment” are used interchangeably in this document, unless otherwise specified, and generally refer to the smallest collection of indexed data. Based on available indexing, the client can request different parts of a fragment with different HTTP requests, or request one or more consecutive fragments or parts of a fragment with one HTTP request Can do. If an ISO-based media file format-based segment or 3GP file format-based segment is used, the fragment is usually defined as a combination of a movie fragment header (“moof”) box and a media data (“mdat”) box Refers to a movie fragment.

本明細書では、データを搬送するネットワークは、本明細書の説明を簡単にするために、パケットベースであると仮定され、本開示を読んだ後、当業者は、本明細書で説明された本発明の実施形態を、連続的なビットストリームネットワークなどの他のタイプの送信ネットワークに適用できることが認識される。 In this document, the network carrying the data is assumed to be packet-based to simplify the description herein, and after reading this disclosure, one skilled in the art will be described herein. It will be appreciated that embodiments of the present invention can be applied to other types of transmission networks, such as continuous bitstream networks.

本明細書では、FEC符号は、本明細書の説明を簡単にするために、データの長くて変わりやすい配信時間に対する保護を行うと仮定され、本開示を読んだ後、当業者は、本発明の実施形態を、データのビット反転破損などの他のタイプのデータ送信の問題に適用できることが認識される。たとえば、FECがない場合、要求されたフラグメントの最後の部分がはるかに遅く到達すると、または、フラグメントの前の部分よりも到達時間の変動が大きいと、コンテンツザッピング時間が大きく、および変わりやすくなり得るが、FECおよび並列要求を使用すると、フラグメントに対して要求されたデータの過半のみ、データが復元され得る前に到達する必要があるので、コンテンツザッピング時間が短くなりコンテンツザッピング時間の変わりやすさが小さくなる。この説明では、符号化されるべきデータ(たとえば、ソースデータ)は、任意の長さ(最小で単一ビット)であり得る等しい長さの「シンボル」へと分解されていると仮定され得るが、シンボルは、データの異なる部分に対しては異なる長さであってよく、たとえば、異なるシンボルサイズがデータの異なるブロックに対して使用され得る。 In this specification, FEC codes are assumed to provide protection against long and variable delivery times of data to simplify the description herein, and after reading this disclosure, one of ordinary skill in the art will recognize the present invention. It will be appreciated that this embodiment can be applied to other types of data transmission problems such as bit reversal corruption of data. For example, in the absence of FEC, if the last part of the requested fragment arrives much later, or if the arrival time variation is larger than the previous part of the fragment, the content zapping time can be large and variable However, with FEC and parallel requests, only the majority of the data requested for a fragment needs to be reached before the data can be recovered, so content zapping time can be reduced and content zapping time can vary. Get smaller. In this description, it may be assumed that the data to be encoded (eg, source data) has been decomposed into equal length “symbols” that can be of arbitrary length (minimum single bit). The symbols may be of different lengths for different parts of the data, for example, different symbol sizes may be used for different blocks of data.

この説明では、本明細書の説明を簡単にするために、FECが一度にデータの「ブロック」または「フラグメント」に適用されることが仮定され、すなわち、「ブロック」は、FECの符号化および復号のための「ソースブロック」である。クライアントデバイスは、本明細書で説明されたセグメントインデクシングを使用して、セグメントのソースブロック構造を決定するのを助けることができる。当業者は、他のタイプのソースブロック構造に本発明の実施形態を適用することができ、たとえば、ソースブロックは、フラグメントの一部分であってよく、または、1つまたは複数のフラグメントまたはフラグメントの部分を包含し得る。 In this description, to simplify the description herein, it is assumed that the FEC is applied to a “block” or “fragment” of data at a time, ie, the “block” is the FEC encoding and It is a “source block” for decoding. The client device can help determine the source block structure of the segment using segment indexing as described herein. One skilled in the art can apply embodiments of the present invention to other types of source block structures, for example, a source block may be part of a fragment, or one or more fragments or fragment parts Can be included.

ブロック要求ストリーミングとともに使用することが考えられるFEC符号は通常、システマティックなFEC符号であり、すなわち、ソースブロックのソースシンボルは、ソースブロックの符号化の一部として含まれ得るので、ソースシンボルが送信される。当業者が認識するように、本明細書で説明される実施形態は、システマティックではないFEC符号に等しく良好に適用される。システマティックなFECエンコーダは、ソースシンボルのソースブロックから、ある数の修復シンボルを生成し、ソースシンボルと修復シンボルの少なくともいくつかの組合せが、ソースブロックを表すチャネルを通じて送信される符号化シンボルである。いくつかのFEC符号が、「情報追加符号」または「噴水符号」のような、必要なだけの修復シンボルを効率的に生成するのに有用であることがあり、これらの符号の例には、「連鎖反応符号」および「多段階連鎖反応符号」がある。リードソロモン符号のような他のFEC符号は、実際には、各ソースブロックに対して限られた数の修復シンボルしか生成できない。 FEC codes that are considered for use with block request streaming are typically systematic FEC codes, i.e. the source symbols of the source block can be included as part of the source block encoding so that the source symbols are transmitted. The As those skilled in the art will appreciate, the embodiments described herein apply equally well to non-systematic FEC codes. A systematic FEC encoder generates a certain number of repair symbols from a source block of source symbols, and at least some combinations of source symbols and repair symbols are encoded symbols transmitted over a channel representing the source block. Some FEC codes may be useful in efficiently generating as many repair symbols as needed, such as "information add code" or "fountain code", examples of these codes include There are “chain reaction codes” and “multi-stage chain reaction codes”. Other FEC codes, such as Reed-Solomon codes, can actually only produce a limited number of repair symbols for each source block.

これらの例の多くにおいて、クライアントがメディアサーバまたは複数のメディアサーバに結合されること、および、クライアントがメディアサーバまたは複数のメディアサーバから1つまたは複数のチャネルを通じてストリーミングメディアを要求することが仮定される。しかしながら、より複雑な構成も可能である。 In many of these examples, it is assumed that the client is coupled to a media server or multiple media servers and that the client requests streaming media from the media server or multiple media servers through one or more channels. The However, more complex configurations are possible.

利点の例
ブロック要求ストリーミングでは、メディアクライアントは、これらのブロック要求のタイミングと、ユーザに対するメディア再生のタイミングとの関係を維持する。このモデルは、上で説明された「ダウンロード」モデルの利点を保持しながら、通常発生する、データ配信に対するメディア再生の関係の解消に起因する欠点のいくつかを避けることができる。ブロック要求ストリーミングモデルは、TCPのようなトランスポートプロトコルにおいて利用可能な、レートおよび混雑制御機構を利用して、最大の利用可能な帯域幅がメディアデータのために使用されることを確実にする。加えて、ブロックへのメディアプレゼンテーションの分割は、符号化メディアデータの各ブロックが、複数の利用可能な符号化のセットから選択されることを可能にする。 Example Benefits In block request streaming, the media client maintains a relationship between the timing of these block requests and the timing of media playback for the user. While this model retains the advantages of the “download” model described above, it can avoid some of the disadvantages that usually arise from the elimination of the relationship of media playback to data delivery. The block request streaming model takes advantage of the rate and congestion control mechanisms available in transport protocols such as TCP to ensure that the maximum available bandwidth is used for media data. In addition, the division of the media presentation into blocks allows each block of encoded media data to be selected from a plurality of available encoding sets.

この選択は、利用可能な帯域幅が時間とともに変化している場合でも利用可能な帯域幅に対してメディアデータレートが一致すること、クライアントの能力または構成に対してメディアの解像度もしくは復号の複雑さが適合していること、または、言語などのユーザ選好に対して適合していることを含む、任意の数の基準に基づき得る。この選択はまた、アクセシビリティコンポーネント、クローズドキャプション、サブタイトル、手話ビデオなどのような、補助コンポーネントのダウンロードおよび提示を含み得る。ブロック要求ストリーミングモデルを使用する既存のシステムの例には、Move Networks(商標)、Microsoft Smooth Streaming、およびApple iPhone(商標) Streaming Protocolがある。 This choice means that the media data rate matches the available bandwidth, even if the available bandwidth changes over time, the resolution of the media or the complexity of the decoding for the capabilities or configuration of the client. May be based on any number of criteria, including being adapted or adapted to user preferences such as language. This selection may also include downloading and presenting auxiliary components such as accessibility components, closed captions, subtitles, sign language videos, and the like. Examples of existing systems that use the block request streaming model are Move Networks ™, Microsoft Smooth Streaming, and Apple iPhone ™ Streaming Protocol.

一般に、メディアデータの各ブロックは、個別のファイルとしてサーバに記憶されてよく、次いでHTTPのようなプロトコルが、ユニットとしてファイルを要求するために、サーバ上で実行されるHTTPサーバソフトウェアとともに使用される。通常、クライアントはメタデータファイルを与えられ、メタデータファイルはたとえば、Extensible Markup Language(XML)フォーマットまたはプレイリストテキストフォーマットまたはバイナリフォーマットであってよく、利用可能な符号化物(たとえば、必要な帯域幅、解像度、符号化パラメータ、メディアタイプ、言語)のような、この文書では通常「表現」と呼ばれるメディアプレゼンテーションの特徴と、符号化物がブロックへと分割された方式とを記述する。たとえば、メタデータは、各ブロックのUniform Resource Locator(URL)を含み得る。文書化されたリソースにアクセスするために使用されるべきプロトコルがHTTPであることを示すために、URL自体が、文字列「http://」が追加されたものなどの方式を提供することができる。別の例は、使用されるべきプロトコルがFTPであることを示すための「ftp://」である。 In general, each block of media data may be stored on the server as a separate file, and then a protocol such as HTTP is used with HTTP server software running on the server to request the file as a unit. . Typically, the client is given a metadata file, which can be, for example, Extensible Markup Language (XML) format or playlist text format or binary format, and available encodings (e.g., required bandwidth, This document describes the characteristics of the media presentation, usually called “representation” in this document, such as resolution, coding parameters, media type, language), and the manner in which the encoded material is divided into blocks. For example, the metadata may include a Uniform Resource Locator (URL) for each block. To indicate that the protocol to be used to access the documented resource is HTTP, the URL itself may provide a scheme such as the one with the string “http: //” appended it can. Another example is “ftp: //” to indicate that the protocol to be used is FTP.

他のシステムでは、たとえば、メディアブロックは、要求されているメディアプレゼンテーションの部分を示す、クライアントからの要求に応答して、サーバによって「その場で」時間内に構築され得る。たとえば、「http://」方式のHTTPの場合、このURLの要求の実行は、そのエンティティボディに何らかの特定のデータを含む、要求応答を提供する。この要求応答をどのように生成するかについてのネットワークにおける実装は、そのような要求にサービスするサーバの実装形態に応じて大きく異なり得る。 In other systems, for example, media blocks may be built in time by the server “on the fly” in response to a request from a client indicating the portion of the media presentation that is being requested. For example, in the case of “http: //” style HTTP, execution of the request for this URL provides a request response that includes some specific data in its entity body. The implementation in the network of how to generate this request response can vary greatly depending on the implementation of the server serving such a request.

通常、各ブロックは、独立に復号可能であり得る。たとえば、ビデオメディアの場合、各ブロックは「探索点」で開始し得る。いくつかのコーディング方式では、探索点は、「ランダムアクセスポイント」または「RAP」と呼ばれるが、すべてのRAPが探索点として指定されなくてもよい。同様に、他のコーディング方式では、探索点は、H.264ビデオ符号化の場合、「Independent Data Refresh」フレームまたは「IDR」で開始するが、すべてのIDRが探索点として指定されなくてもよい。探索点は、デコーダが以前のフレームまたはデータまたはサンプルについてのデータを何ら必要とすることなく復号を開始できる、ビデオ(または他の)メディア中の位置であり、復号されているフレームまたはサンプルがスタンドアロン方式で符号化されず、たとえば、現在のフレームと以前のフレームの差分として符号化された場合には、デコーダが以前のフレームまたはデータまたはサンプルについてのデータを必要とすることがある。 In general, each block may be independently decodable. For example, for video media, each block may start at a “search point”. In some coding schemes, the search points are referred to as “random access points” or “RAPs”, but not all RAPs may be designated as search points. Similarly, in other coding schemes, search points start with an “Independent Data Refresh” frame or “IDR” in the case of H.264 video encoding, but not all IDRs may be designated as search points. . A search point is a position in the video (or other) media where the decoder can start decoding without requiring any data about the previous frame or data or sample, and the frame or sample being decoded is standalone If not encoded in a manner, for example, encoded as the difference between the current frame and the previous frame, the decoder may need data for the previous frame or data or samples.

そのようなシステムにおける課題は、ストリームの再生を開始する能力、たとえば、パーソナルコンピュータを使用して、受信されたオーディオおよびビデオストリームを復号しレンダリングし、コンピュータスクリーン上にビデオを表示しかつ内蔵スピーカーを通じてオーディオを再生すること、または別の例として、セットトップボックスを使用して、受信されたオーディオおよびビデオストリームを復号しレンダリングし、テレビジョンディスプレイデバイス上にビデオを表示しかつステレオシステムを通じてオーディオを再生することであり得る。主な課題は、ユーザがストリームとして配信された新たなコンテンツを見ることを決めて、その決定を表す動作をとるとき、たとえば、ユーザがブラウザウィンドウ内のリンクをクリックする、またはリモートコントロールデバイスの再生ボタンを押すときから、コンテンツがユーザのスクリーン上に表示され始めるときまでの遅延、を最小限にすることであってよく、この遅延は以後「コンテンツザッピング時間」と呼ばれる。これらの課題の各々が、本明細書で説明される改善されたシステムの要素によって対処され得る。 The challenge in such systems is the ability to initiate playback of the stream, eg, using a personal computer to decode and render the received audio and video streams, display the video on a computer screen and through the built-in speakers Play audio or, as another example, use a set-top box to decode and render received audio and video streams, display video on a television display device, and play audio through a stereo system Could be. The main challenge is when the user decides to see the new content delivered as a stream and takes action to represent that decision, for example, when the user clicks a link in a browser window or plays a remote control device It may be to minimize the delay from when the button is pressed to when the content begins to be displayed on the user's screen, and this delay is hereinafter referred to as “content zapping time”. Each of these challenges may be addressed by the improved system elements described herein.

コンテンツザッピングの例は、ユーザが第1のストリームを介して配信された第1のコンテンツを見ており、次いでユーザが、第2のストリームを介して配信された第2のコンテンツを見ると決め、第2のコンテンツを見ることを始めるための動作を開始するときである。第2のストリームは、第1のストリームと同じセットのサーバまたは異なるセットのサーバから送信され得る。コンテンツザッピングの別の例は、ユーザがあるウェブサイトを訪問しており、ブラウザウィンドウ内のリンクをクリックすることによって、第1のストリームを介して配信された第1のコンテンツを見るのを始めると決めるときである。同様の方式で、ユーザは、初めからではなく、ストリーム内の何らかの時間からコンテンツの再生を始めると決めることができる。ユーザは、時間位置を探索することをクライアントデバイスに対して示し、ユーザは、選択された時間が瞬時にレンダリングされることを期待し得る。コンテンツザッピング時間を最小限にすることは、多種多様な利用可能なコンテンツを検索して試すときに、高品質な高速のコンテンツサーフィン体験をユーザに対して可能にするために、ビデオ視聴にとって重要である。 An example of content zapping is that the user is watching the first content delivered via the first stream, then the user decides to watch the second content delivered via the second stream, This is when the operation for starting to view the second content starts. The second stream may be sent from the same set of servers as the first stream or from a different set of servers. Another example of content zapping is when a user visits a website and starts watching the first content delivered via the first stream by clicking a link in the browser window It is time to decide. In a similar manner, the user can decide to start playing the content at some time in the stream rather than from the beginning. The user indicates to the client device to search for a time position and the user can expect the selected time to be rendered instantly. Minimizing content zapping time is important for video viewing to enable users to have a high-quality, high-speed content surfing experience when searching and experimenting with a wide variety of available content. is there.

最近では、送信の間のストリーミングメディアの保護のために前方誤り訂正(FEC)符号を使用するのを検討することが、一般的となっている。3GPP、3GPP2、およびDVBのようなグループによって標準化されたようなインターネットおよびワイヤレスネットワークをその例に含むパケットネットワークを通じて送信される場合、ソースストリームは、生成されまたは利用可能にされるときにパケットへと置かれるので、パケットは、生成され受信者に対して利用可能にされる順序でソースまたはコンテンツストリームを搬送するために使用され得る。 Recently, it has become common to consider using forward error correction (FEC) codes to protect streaming media during transmission. When transmitted through a packet network that includes the Internet and wireless networks, such as those standardized by groups such as 3GPP, 3GPP2, and DVB, the source stream is routed to packets as they are generated or made available As placed, the packets can be used to carry the source or content stream in the order that they are generated and made available to the recipient.

FEC符号のこれらのタイプのシナリオへの一般的な適用では、エンコーダは、修復パケットの作成においてFEC符号を使用することができ、修復パケットは次いで、ソースストリームを含む元のソースパケットに加えて送信される。修復パケットは、ソースパケットの喪失が発生すると、喪失ソースパケットに含まれるデータを復元するために受信された修復パケットが使用され得るという、性質を有する。修復パケットは、完全に失われた喪失ソースパケットのコンテンツを復元するために使用され得るが、完全に受信された修復パケット、またはさらには、部分的に受信された修復パケットのいずれかである、修復パケットが、部分的なパケット喪失の発生から回復するために使用され得る。したがって、完全にまたは部分的に失われたソースパケットを復元するために、完全にまたは部分的に受信された修復パケットが使用され得る。 In general application of these types of FEC codes to scenarios, the encoder can use the FEC codes in creating repair packets, which are then transmitted in addition to the original source packet including the source stream. Is done. The repair packet has the property that when a loss of the source packet occurs, the received repair packet can be used to recover the data contained in the lost source packet. A repair packet can be used to recover the contents of a completely lost lost source packet, but is either a fully received repair packet, or even a partially received repair packet, Repair packets can be used to recover from the occurrence of partial packet loss. Thus, fully or partially received repair packets can be used to recover completely or partially lost source packets.

さらに他の例では、他のタイプの破損が送信データに発生することがあり、たとえば、ビットの値が反転することがあるので、修復パケットが、そのような破損を訂正し、ソースパケットの可能な限り正確な回復を実現するために使用され得る。他の例では、ソースストリームは、必ずしも個別のパケットで送信されず、代わりに、たとえば連続的なビットストリームとして送信され得る。 In yet another example, other types of corruption may occur in the transmitted data, for example, the value of a bit may be reversed, so that the repair packet corrects such corruption and allows for the source packet It can be used to achieve as accurate recovery as possible. In other examples, the source stream is not necessarily transmitted in separate packets, but may instead be transmitted, for example, as a continuous bit stream.

ソースストリームの保護を行うために使用され得る、多くのFEC符号の例がある。リードソロモン符号は、通信システムにおける誤りおよび抹消の訂正のためのよく知られている符号である。たとえば、パケットデータネットワークを通じた抹消訂正のために、リードソロモン符号のよく知られている効率的な実装形態は、L. Rizzo、「Effective Erasure Codes for Reliable Computer Communication Protocols」、Computer Communication Review、27(2):24-36(1997年4月)(以後「Rizzo」)、および、Bloemer他、「An XOR-Based Erasure-Resilient Coding Scheme」、Technical Report TR-95-48、International Computer Science Institute、カリフォルニア州バークレー(1995)(以後「XOR-Reed-Solomon」)または他で説明されるように、コーシー行列またはファンデルモンド行列を使用する。 There are many examples of FEC codes that can be used to provide source stream protection. A Reed-Solomon code is a well-known code for correcting errors and erasures in a communication system. For example, a well-known and efficient implementation of Reed-Solomon codes for erasure correction over packet data networks is L. Rizzo, “Effective Erasure Codes for Reliable Computer Communication Protocols”, Computer Communication Review, 27 ( 2): 24-36 (April 1997) (hereinafter `` Rizzo ''), Bloemer et al., `` An XOR-Based Erasure-Resilient Coding Scheme '', Technical Report TR-95-48, International Computer Science Institute, California The Cauchy matrix or van der Monde matrix is used as described in State Berkeley (1995) (hereinafter "XOR-Reed-Solomon") or elsewhere.

FEC符号の他の例には、LDPC符号、Luby Iで説明されるような連鎖反応符号、および、Shokrollahi Iにおけるような多段階連鎖反応符号がある。 Other examples of FEC codes include LDPC codes, chain reaction codes as described in Luby I, and multi-stage chain reaction codes as in Shokrollahi I.

リードソロモン符号の変形のためのFEC復号プロセスの例は、RizzoおよびXOR-Reed-Solomonで説明されている。それらの例では、復号は、十分なソースおよび修復データパケットが受信された後で適用され得る。復号プロセスは、計算集約的であることがあり、利用可能なCPUリソースによっては、復号プロセスは、ブロックの中でメディアがまたがる時間の長さに対して、完了するのにかなりの時間がかかることがある。受信機は、メディアストリームの受信の開始とメディアの再生との間で必要とされる遅延を計算するとき、復号のために必要とされるこの時間の長さを考慮し得る。復号によるこの遅延は、特定のメディアストリームに対するユーザの要求から、再生の開始までの遅延として、ユーザにより知覚される。したがって、この遅延を最小限にすることが望ましい。 Examples of FEC decoding processes for Reed-Solomon code variants are described in Rizzo and XOR-Reed-Solomon. In those examples, decoding may be applied after sufficient source and repair data packets are received. The decryption process can be computationally intensive and depending on the available CPU resources, the decryption process can take a significant amount of time to complete for the length of time that the media spans in the block There is. The receiver may consider this length of time required for decoding when calculating the required delay between the start of receiving the media stream and playing the media. This delay due to decoding is perceived by the user as a delay from the user's request for a particular media stream to the start of playback. Therefore, it is desirable to minimize this delay.

多くの適用形態において、パケットはさらに、FECプロセスが適用されるシンボルへと副分割され得る。パケットは、1つまたは複数のシンボルを含んでよい(または1つのシンボル未満を含んでよいが、通常、シンボルは、パケットのグループの間での誤り条件が大きく相関していることが知られていない限り、パケットのグループにまたがって分割されない)。シンボルは任意のサイズを有し得るが、シンボルのサイズは最大でも、パケットのサイズに等しいことが多い。ソースシンボルは、送信されるべきデータを復号するシンボルである。修復シンボルは、ソースシンボルに加えて、ソースシンボルから直接または間接的に生成されるシンボルである(すなわち、送信されるべきデータは、ソースシンボルのすべてが利用可能であり修復シンボルのいずれもが利用可能ではない場合、完全に復元され得る)。 In many applications, the packet may be further subdivided into symbols to which the FEC process is applied. A packet may contain one or more symbols (or may contain less than one symbol, but typically symbols are known to be highly correlated with error conditions between groups of packets. Unless otherwise split across groups of packets). The symbols can have any size, but the size of the symbol is often at most equal to the size of the packet. A source symbol is a symbol that decodes data to be transmitted. A repair symbol is a symbol that is generated directly or indirectly from the source symbol in addition to the source symbol (i.e., the data to be transmitted is available to all of the source symbols and any of the repair symbols If not possible, it can be fully restored).

いくつかのFEC符号は、その符号化動作が、ブロック中にあるシンボルに依存し、そのブロック中にないシンボルに対して独立であり得るという点で、ブロックベースであり得る。ブロックベースの符号化では、FECエンコーダは、そのブロック中のソースシンボルからあるブロックのための修復シンボルを生成し、次いで次のブロックに移ることができ、符号化されている現在のブロックに対するもの以外のソースシンボルを参照する必要がない。送信において、ソースシンボルを含むソースブロックは、符号化シンボル(これらは、いくつかのソースシンボル、いくつかの修復シンボル、またはこれら両方であり得る)を含む符号化ブロックによって表され得る。修復シンボルの存在により、ソースシンボルのすべてが各々の符号化ブロックにおいて必要とはされなくなる。 Some FEC codes may be block-based in that their encoding operation depends on symbols that are in the block and may be independent of symbols that are not in the block. In block-based coding, the FEC encoder can generate repair symbols for a block from the source symbols in that block and then move on to the next block, other than for the current block being coded There is no need to reference the source symbol. In transmission, a source block that includes source symbols may be represented by an encoded block that includes encoded symbols (which may be some source symbols, some repair symbols, or both). Due to the presence of repair symbols, not all of the source symbols are required in each coding block.

いくつかのFEC符号、特にリードソロモン符号では、ソースブロック当たりの符号化シンボルの数が増えるにつれて、符号化および復号の時間が非実用的に長くなり得る。したがって、実際には、特に、リードソロモン符号化または復号プロセスがカスタムハードウェアによって実行される典型的な場合、たとえば、パケット喪失に対してストリームを保護するためにDVB-H規格の一部として含まれるリードソロモン符号を使用するMPE-FECプロセスが、ソースブロック当たり全体で255個のリードソロモン符号化シンボルに制限される携帯電話内の特別なハードウェアにおいて実装される場合、ソースブロック当たりの生成され得る符号化シンボルの総数に対して、実用的な上限(いくつかの適用形態では、255がおよその実用的な限界である)が存在することが多い。シンボルは別個のパケットペイロードへと置かれることが必要とされることが多いので、このことが、符号化されるソースブロックの最大の長さに対して実用的な上限を課す。たとえば、パケットペイロードが1024バイト以下に制限され、各パケットが1つの符号化シンボルを搬送する場合、符号化ソースブロックは、最大でも255キロバイトであってよく、これは当然、ソースブロック自体のサイズの上限でもある。 For some FEC codes, especially Reed-Solomon codes, the encoding and decoding times can become impractical as the number of encoded symbols per source block increases. Therefore, in practice, especially in the typical case where the Reed-Solomon encoding or decoding process is performed by custom hardware, for example included as part of the DVB-H standard to protect the stream against packet loss If the MPE-FEC process using a Reed-Solomon code is implemented on special hardware in a mobile phone that is limited to a total of 255 Reed-Solomon encoded symbols per source block, There is often a practical upper limit to the total number of encoded symbols that can be obtained (255 is an approximate practical limit in some applications). This places a practical upper limit on the maximum length of the source block to be encoded, since symbols are often required to be placed in a separate packet payload. For example, if the packet payload is limited to 1024 bytes or less and each packet carries one encoded symbol, the encoded source block may be at most 255 kilobytes, which is of course the size of the source block itself. It is also the upper limit.

ソースのストリーミングレートについていくのに十分高速にソースブロックを復号できることなどの、FEC復号によってもたらされる復号のレイテンシを最小限にして、FEC復号の間の任意の時点で受信デバイスにおいて利用可能なCPUの小さな断片のみを使用するための他の課題が、本明細書で説明される要素により対処される。 Minimize the decoding latency introduced by FEC decoding, such as being able to decode the source block fast enough to keep up with the source streaming rate, and the available CPU at the receiving device at any point during FEC decoding. Other challenges for using only small pieces are addressed by the elements described herein.

必要とされているのは、システムのコンポーネントが故障しても受信機に配信されるストリームの品質に悪影響を与えることのない、ロバストで拡張性のあるストリーム配信の方法を提供することである。 What is needed is to provide a robust and scalable stream delivery method that does not adversely affect the quality of the stream delivered to the receiver if a system component fails.

ブロック要求ストリーミングシステムは、プレゼンテーションの構造またはメタデータに対する変更、たとえば、利用可能なメディア符号化物の数に対する変更、または、ビットレート、解像度、アスペクト比、オーディオもしくはビデオコーデック、またはコーデックパラメータなどのメディア符号化物のパラメータに対する変更、または、コンテンツファイルと関連付けられるURLなどの他のメタデータの変更をサポートする必要がある。そのような変更は、より大きなプレゼンテーションの異なるセグメントの告知、たとえば構成の変更、機器の故障もしくは機器の故障からの回復、または他の理由でサービングインフラストラクチャが変わった結果として必要となるURLまたは他のパラメータの修正などの、異なるソースからのコンテンツを一緒に編集することを含む、数々の理由で必要とされ得る。 Block request streaming systems can make changes to the structure or metadata of the presentation, eg, changes to the number of available media encodings, or media encodings such as bit rate, resolution, aspect ratio, audio or video codec, or codec parameters. Need to support changes to other parameters, or other metadata changes such as URLs associated with content files. Such changes may be required as a result of announcements in different segments of larger presentations, such as configuration changes, equipment failures or recovery from equipment failures, or other reasons that change the serving infrastructure May be required for a number of reasons, including editing together content from different sources, such as modifying the parameters of

プレゼンテーションが継続的に更新されるプレイリストファイルによって制御され得る、方法が存在する。このファイルは継続的に更新されるので、上で説明された変更の少なくとも一部は、これらの更新の中で行われ得る。従来の方法の欠点は、「ポーリング」とも呼ばれる、クライアントデバイスがプレイリストファイルを継続的に取り出すことを行わなければならず、サービングインフラストラクチャに負荷がかかること、および、このファイルが更新の期間よりも長くキャッシュされることが不可能であり、サービングインフラストラクチャのタスクがはるかに難しくなることである。上で説明された種類の更新がメタデータファイルに対するクライアントによる継続的なポーリングを必要とせずに行われるように、上記のことが本明細書の要素によって対処される。 There are methods where the presentation can be controlled by a playlist file that is continuously updated. Since this file is continuously updated, at least some of the changes described above can be made in these updates. Disadvantages of the traditional method, also called “polling”, are that the client device has to continuously retrieve the playlist file, which puts a load on the serving infrastructure and It is impossible to be cached for a long time, and the task of serving infrastructure is much more difficult. The above is addressed by the elements herein so that the type of update described above is performed without the need for continuous polling by the client to the metadata file.

特にライブサービスにおける、ブロードキャスト配信から通常知られている別の問題は、ユーザが番組に参加した時点よりも前にブロードキャストされたコンテンツをユーザが見ることができないということである。通常、ローカルでの個人による録画は、不必要にローカルの記憶容量を消費し、または、クライアントが番組に合わされていなかったので不可能であり、または、コンテンツ保護規則によって禁止されている。ネットワーク録画およびタイムシフト視聴は好ましいが、サーバへのユーザの個々の接続と、ライブサービスとは別個の配信プロトコルおよびインフラストラクチャとを必要とし、インフラストラクチャの重複および大きなサーバのコストをもたらす。このことも、本明細書で説明される要素によって対処される。 Another problem commonly known from broadcast delivery, especially in live services, is that the user cannot see content that was broadcast before the user joined the program. Typically, local personal recording is unnecessarily consuming local storage capacity or impossible because the client was not matched to the program, or prohibited by content protection rules. Network recording and time-shifted viewing are preferred, but require individual connections to the server and a delivery protocol and infrastructure that is separate from the live service, resulting in infrastructure duplication and large server costs. This is also addressed by the elements described herein.

システムの概要
本発明の一実施形態が図1を参照して説明され、図1は、本発明を具現化するブロック要求ストリーミングシステムの簡略化された図を示す。 System Overview One embodiment of the present invention will be described with reference to FIG. 1, which shows a simplified diagram of a block request streaming system embodying the present invention.

図1では、取込システム103を含むブロックサービングインフラストラクチャ(「BSI」)101を含むブロックストリーミングシステム100が示されており、取込システム103は、コンテンツ102を取り込み、取込システム103とHTTPストリーミングサーバ104の両方へアクセス可能なコンテンツ記憶装置110へとそのコンテンツを記憶することによって、HTTPストリーミングサーバ104によるサービスのためにそのコンテンツを準備してパッケージングするためのものである。示されるように、システム100は、HTTPキャッシュ106も含み得る。動作において、HTTPストリーミングクライアントのようなクライアント108は、要求112をHTTPストリーミングサーバ104へ送信し、HTTPストリーミングサーバ104またはHTTPキャッシュ106から応答114を受信する。各々の場合において、図1に示される要素は、少なくとも一部は、プロセッサまたは他の電子回路で実行されるプログラムコードを含むソフトウェアで実装されてよい。 In FIG. 1, a block streaming system 100 is shown that includes a block serving infrastructure (“BSI”) 101 that includes a capture system 103 that captures content 102 and HTTP streaming with the capture system 103. By storing the content in a content storage device 110 accessible to both servers 104, the content is prepared and packaged for service by the HTTP streaming server 104. As shown, the system 100 may also include an HTTP cache 106. In operation, a client 108, such as an HTTP streaming client, sends a request 112 to the HTTP streaming server 104 and receives a response 114 from the HTTP streaming server 104 or the HTTP cache 106. In each case, the elements shown in FIG. 1 may be implemented at least in part in software that includes program code executed on a processor or other electronic circuit.

コンテンツは、ムービー、オーディオ、2D平面ビデオ、3Dビデオ、他のタイプのビデオ、画像、時限のテキスト、時限のメタデータなどを含み得る。いくつかのコンテンツは、補助情報(局の識別情報、広告、株価、Flash(商標)シーケンスなど)を、再生されている他のメディアとともに提示するためのデータなどの、時限の方式で提示または利用されるべきデータを伴い得る。他のメディアを組み合わせる、かつ/または単なるオーディオおよびビデオを超える、他のハイブリッドプレゼンテーションも使用され得る。 Content may include movies, audio, 2D flat video, 3D video, other types of video, images, timed text, timed metadata, and the like. Some content is presented or used in a timely manner, such as data to present auxiliary information (such as station identification information, advertisements, stock prices, Flash (TM) sequences, etc.) along with other media being played. Can be accompanied by data to be done. Other hybrid presentations that combine other media and / or go beyond just audio and video may also be used.

図2で示されるように、メディアブロックは、たとえば、HTTPサーバ、コンテンツ配信ネットワークデバイス、HTTPプロキシ、FTPプロキシもしくはサーバ、または何らかの他のメディアサーバもしくはシステムであり得る、ブロックサービングインフラストラクチャ101(1)内に記憶され得る。ブロックサービングインフラストラクチャ101(1)は、たとえば、インターネットのようなインターネットプロトコル(「IP」)ネットワークであってよい、ネットワーク122に接続される。ブロック要求ストリーミングシステムクライアントは、6個の機能的コンポーネント、すなわち、上で説明されたメタデータを与えられ、メタデータによって示される複数の利用可能なブロックの中から要求されるべきブロックまたは部分ブロックを選択する機能を実行する、ブロック選択器123、ブロック選択器123から要求命令を受信し、規定されたブロック、ブロックの部分、または複数のブロックに対する要求を、ネットワーク122を通じてブロックサービングインフラストラクチャ101(1)へ送信し、返答としてそのブロックを含むデータを受信するのに必要な動作を実行する、ブロック要求器124、さらには、ブロックバッファ125、バッファモニタ126、メディアデコーダ127、およびメディアの利用を容易にする1つまたは複数のメディアトランスデューサ128を、有するものとして示されている。 As shown in FIG. 2, the media block may be, for example, an HTTP server, content delivery network device, HTTP proxy, FTP proxy or server, or some other media server or system, block serving infrastructure 101 (1) Can be stored within. Block serving infrastructure 101 (1) is connected to a network 122, which may be, for example, an Internet Protocol (“IP”) network such as the Internet. The block request streaming system client is given the six functional components, ie, the block or partial block to be requested from among the multiple available blocks indicated by the metadata given the metadata described above. A block selector 123, which performs a function to select, receives a request instruction from the block selector 123, and sends a request for a specified block, block portion, or blocks to the block serving infrastructure 101 (1 The block requestor 124, and also the block buffer 125, buffer monitor 126, media decoder 127, and media are easily used to perform the operations necessary to receive the data containing that block as a reply. One or more media transactions to It is shown the inducer 128, as having.

ブロック要求器124によって受信されるブロックデータは、一時的な記憶のために、メディアデータを記憶するブロックバッファ125へと渡される。あるいは、受信されたブロックデータは、図1に示されるように、ブロックバッファ125へと直接記憶されてもよい。メディアデコーダ127は、ブロックバッファ125によってメディアデータを与えられ、メディアトランスデューサ128へ適切な入力を与えるために必要な変換を、このデータに対して実行し、メディアトランスデューサ128は、メディアをユーザの利用または他の利用に適切な形態にする。メディアトランスデューサの例には、携帯電話、コンピュータシステムまたはテレビにおいて見出されるような、視覚ディスプレイデバイスが含まれ、スピーカーまたはヘッドフォンのような音声表現デバイスも含まれ得る。 Block data received by the block requestor 124 is passed to a block buffer 125 that stores media data for temporary storage. Alternatively, the received block data may be stored directly in the block buffer 125 as shown in FIG. Media decoder 127 is provided with the media data by block buffer 125 and performs the necessary transformations on this data to provide the appropriate input to media transducer 128, which can use the media for user utilization or Make it suitable for other uses. Examples of media transducers include visual display devices, such as found in cell phones, computer systems or televisions, and may also include audio presentation devices such as speakers or headphones.

メディアデコーダの例は、H.264ビデオコーディング規格において説明されるフォーマットのデータを、各フレームまたはサンプルのための関連するプレゼンテーションタイムスタンプを伴うYUVフォーマットピクセルマップのような、ビデオフレームのアナログまたはデジタルのプレゼンテーションへと変換する、関数であろう。 Examples of media decoders are analog or digital for video frames, such as YUV format pixel maps with associated presentation timestamps for each frame or sample, in the format described in the H.264 video coding standard. It will be a function that converts it into a presentation.

バッファモニタ126は、ブロックバッファ125のコンテンツに関する情報を受信し、この情報および場合によっては他の情報に基づいて、本明細書で説明されるような、要求すべきブロックの選択を決定するのに使用されるブロック選択器123へ、入力を与える。 The buffer monitor 126 receives information about the contents of the block buffer 125 and, based on this information and possibly other information, can determine the selection of the block to request, as described herein. An input is given to the block selector 123 to be used.

本明細書で使用される用語では、各ブロックは、受信機が通常の速度でそのブロックに含まれるメディアを再生するのにかかるであろう時間の量を表す、「再生時間」または「継続時間」を有する。いくつかの場合には、ブロック内のメディアの再生は、前のブロックからのすでに受信されているデータに依存し得る。まれに、ブロック中のメディアの一部の再生は、後続のブロックに依存することがあり、この場合、ブロックの再生時間は、後続のブロックを参照することなくブロック内で再生され得るメディアに関して定義され、後続のブロックの再生時間は、後続のブロックを受信した後にのみ再生できるこのブロック内のメディアの再生時間の分だけ増える。後続ブロックに依存するブロック中のメディアを含むのはまれであるので、本開示の残りの部分では、1つのブロック中のメディアは後続のブロックに依存しないことを仮定するが、この変形が以下で説明される実施形態に容易に追加され得ることを当業者が認識するであろうことに留意されたい。 In the terminology used herein, each block represents a “play time” or “duration” that represents the amount of time that the receiver will take to play the media contained in that block at normal speed. Is included. In some cases, playback of media in a block may depend on data that has already been received from the previous block. In rare cases, playback of a portion of media in a block may depend on subsequent blocks, where the playback time of the block is defined with respect to media that can be played within the block without reference to the subsequent block. The playback time of the subsequent block is increased by the playback time of the media in this block that can be played back only after the subsequent block is received. Since it is rare to include media in a block that depends on a succeeding block, the rest of this disclosure assumes that the media in one block does not depend on the succeeding block. It should be noted that those skilled in the art will recognize that they can be readily added to the described embodiments.

受信機は、「一時停止」、「早送り」、「巻き戻し」などのような制御を有し得る。これは、異なるレートでの再生によるブロックの消費をもたらし得るが、受信機が、シーケンス中の最後のブロックを除く合計の再生時間以下の合計時間でブロックの各々の連続的なシーケンスを取得し復号できる場合、受信機は、ストールすることなくメディアをユーザに提示することができる。本明細書のいくつかの説明では、メディアストリーム中の特定の位置は、メディア中の特定の「時間」と呼ばれ、これは、メディア再生の開始から、ビデオストリーム中の特定の位置に到達する時間までに経過した時間に対応する。メディアストリーム中の時間または位置は、従来からの概念である。たとえば、ビデオストリームが毎秒24個のフレームを含む場合、最初のフレームは、t=0.0秒という位置または時間を有すると言われることがあり、241番目のフレームは、t=10.0秒という位置または時間を有すると言われることがある。当然、フレームベースのビデオストリームでは、位置または時間は連続的である必要はなく、それは、241番目のフレームの最初のビットから242番目のフレームの最初のビットの直前までのストリーム中のビットの各々が、すべて同じ時間の値を有し得るからである。 The receiver may have controls such as “pause”, “fast forward”, “rewind” and the like. This may result in block consumption due to playback at different rates, but the receiver will acquire and decode each successive sequence of blocks in a total time that is less than or equal to the total playback time excluding the last block in the sequence. If possible, the receiver can present the media to the user without stalling. In some descriptions herein, a particular location in the media stream is referred to as a particular “time” in the media, which reaches a particular location in the video stream from the beginning of media playback. Corresponds to the time elapsed up to the time. Time or position in the media stream is a conventional concept. For example, if the video stream contains 24 frames per second, the first frame may be said to have a position or time of t = 0.0 seconds, and the 241st frame will have a position or time of t = 10.0 seconds. May be said to have. Of course, in a frame-based video stream, the position or time need not be contiguous, as each of the bits in the stream from the first bit of the 241st frame to just before the first bit of the 242nd frame Because they can all have the same time value.

上記の用語を採用すると、ブロック要求ストリーミングシステム(BRSS)は、1つまたは複数のコンテンツサーバ(たとえば、HTTPサーバ、FTPサーバなど)へ要求を行う、1つまたは複数のクライアントを含む。取込システムは、1つまたは複数の取込プロセッサを含み、取込プロセッサは、コンテンツを(リアルタイムでまたはそれ以外で)受信し、BRSSによる使用のためにコンテンツを処理し、コンテンツサーバへアクセス可能な記憶装置に、場合によっては取込プロセッサによって生成されたメタデータとともに、コンテンツを記憶する。 Adopting the above terms, a block request streaming system (BRSS) includes one or more clients that make requests to one or more content servers (eg, HTTP servers, FTP servers, etc.). The capture system includes one or more capture processors that can receive content (in real time or otherwise), process the content for use by BRSS, and access the content server Content is stored in a separate storage device, possibly with metadata generated by the capture processor.

BRSSは、コンテンツサーバと協調するコンテンツキャッシュも含み得る。コンテンツサーバおよびコンテンツキャッシュは、URLを含むHTTP要求の形態でファイルまたはセグメントに対する要求を受信する、従来のHTTPサーバおよびHTTPキャッシュであってよく、URLによって示されるファイルまたはセグメントのすべてよりも少数のものを要求するために、バイト範囲も含み得る。クライアントは、HTTPサーバの要求を行い、そうした要求への応答を処理する、従来のHTTPクライアントを含んでよく、ここでHTTPクライアントは、要求を作成し、その要求をHTTPクライアントへ渡し、HTTPクライアントから応答を取得し、クライアントデバイスによる再生のために表示プレーヤーへとその応答を与えるためにその応答を処理する(たとえば、記憶する、変換するなど)、新規のクライアントシステムによって動作させられる。通常、クライアントシステムは、どのメディアが必要になるかを事前には知らず(必要性はユーザの入力に応じたものであり得るので、ユーザの入力が変化するので、など)、そのため、受信されるとすぐに、または受信の後まもなく、メディアが「消費される」という点で、これは「ストリーミング」システムと呼ばれる。その結果、応答の遅延および帯域幅の制約は、プレゼンテーションに遅延を引き起こすことがあり、たとえば、ユーザがプレゼンテーションを見ている箇所にストリームが追いつくと、プレゼンテーションに一時停止が生じる。 The BRSS may also include a content cache that cooperates with the content server. A content server and content cache may be a traditional HTTP server and HTTP cache that receives requests for files or segments in the form of HTTP requests that contain URLs, fewer than all of the files or segments indicated by the URL May also include byte ranges. Clients may include traditional HTTP clients that make HTTP server requests and process responses to such requests, where the HTTP client creates a request, passes the request to the HTTP client, and from the HTTP client. Operated by a new client system that obtains the response and processes (eg, stores, transforms, etc.) the response to provide it to the display player for playback by the client device. Typically, the client system does not know in advance what media will be needed (since the user's input will change because the need may be in response to the user's input, etc.) and is therefore received This is called a “streaming” system in that the media is “consumed” as soon as it is received or shortly after reception. As a result, response delays and bandwidth constraints can cause delays in the presentation, for example, if the stream catches up where the user is watching the presentation, the presentation is paused.

良好な品質であると知覚される表示を提供するために、クライアント側、取込側、またはこれら両方のいずれかにおいて、いくつかの細部がBRSSに実装され得る。一部の場合には、実装される細部は、ネットワークにおけるクライアントとサーバとのインターフェースを考慮して、およびそれに対応するようになされる。いくつかの実施形態では、クライアントシステムと取込システムの両方が改善を認識し、他の実施形態では、片側のみが改善を認識する。そのような場合、片側が改善を認識していなくても、システム全体が改善から利益を得て、他の場合には、双方が改善を認識する場合にのみ利益が生じるが、片側が改善を認識しなくても、システムは破綻することなく動作し続ける。 In order to provide a display that is perceived as being of good quality, some details may be implemented in the BRSS either on the client side, on the capture side, or both. In some cases, the details that are implemented are made to account for and correspond to the client-server interface in the network. In some embodiments, both the client system and the capture system are aware of the improvement, and in other embodiments, only one side is aware of the improvement. In such cases, even if one side is not aware of the improvement, the entire system will benefit from the improvement, and in other cases it will only benefit if both sides recognize the improvement, but one side will improve. Even if you don't recognize it, the system will continue to work without breaking down.

図3に示されるように、様々な実施形態に従って、取込システムは、ハードウェアコンポーネントとソフトウェアコンポーネントの組合せとして実装され得る。取込システムは、本明細書で論じられた方法の任意の1つまたは複数をシステムに実行させるように実行され得る、命令のセットを含み得る。システムは、コンピュータの形態である固有の機械として実現され得る。システムは、サーバコンピュータ、パーソナルコンピュータ(PC)、または、そのシステムによってとられるべき動作を規定する命令のセット(順次的な命令または他の命令)を実行できる、任意のシステムであってよい。さらに、単一のシステムのみが示されているが、「システム」という用語は、本明細書で論じられた方法の任意の1つまたは複数を実行するための命令のセット(または複数のセット)を、個々にまたは一緒に実行する、システムの任意の集合も含むと、解釈されるべきである。 As shown in FIG. 3, in accordance with various embodiments, the capture system may be implemented as a combination of hardware and software components. The capture system may include a set of instructions that may be executed to cause the system to perform any one or more of the methods discussed herein. The system can be implemented as a unique machine in the form of a computer. The system may be a server computer, a personal computer (PC), or any system capable of executing a set of instructions (sequential instructions or other instructions) that define the actions to be taken by the system. Further, although only a single system is shown, the term `` system '' refers to a set (or sets) of instructions for performing any one or more of the methods discussed herein. Should be construed to include any collection of systems that execute individually or together.

取込システムは、取込プロセッサ302(たとえば、中央演算処理装置(CPU))、実行中のプログラムコードを記憶できるメモリ304、およびディスク記憶装置306を含んでよく、これらのすべてが、バス300を介して互いに通信する。システムはさらに、ビデオディスプレイユニット308(たとえば液晶ディスプレイ(LCD)または陰極線管(CRT))を含み得る。システムはまた、英数字入力デバイス310(たとえばキーボード)、および、コンテンツのソースを受信しかつコンテンツ記憶装置に配信するための、ネットワークインターフェースデバイス312を含み得る。 The capture system may include a capture processor 302 (e.g., a central processing unit (CPU)), a memory 304 capable of storing program code being executed, and a disk storage 306, all of which are connected to the bus 300. Communicate with each other via The system may further include a video display unit 308 (eg, a liquid crystal display (LCD) or a cathode ray tube (CRT)). The system may also include an alphanumeric input device 310 (eg, a keyboard) and a network interface device 312 for receiving and delivering content sources to content storage devices.

ディスク記憶ユニット306は、本明細書で説明される方法または機能の任意の1つまたは複数を具現化する、命令(たとえばソフトウェア)の1つまたは複数のセットが記憶され得る、機械可読媒体を含み得る。命令はまた、システムによるその実行の間、完全にまたは少なくとも部分的に、メモリ304内、および/または取込プロセッサ302内に存在してよく、メモリ304および取込プロセッサ302も、機械可読媒体を構成する。 The disk storage unit 306 includes a machine-readable medium on which one or more sets of instructions (e.g., software) may be stored that embodies any one or more of the methods or functions described herein. obtain. The instructions may also be fully or at least partially in memory 304 and / or in capture processor 302 during its execution by the system, and memory 304 and capture processor 302 may also be machine readable media. Configure.

図4に示されるように、様々な実施形態に従って、クライアントシステムは、ハードウェアコンポーネントとソフトウェアコンポーネントの組合せとして実装され得る。クライアントシステムは、本明細書で論じられた方法の任意の1つまたは複数をシステムに実行させるように実行され得る、命令のセットを含み得る。システムは、コンピュータの形態である固有の機械として実現され得る。システムは、サーバコンピュータ、パーソナルコンピュータ(PC)、または、そのシステムによってとられるべき動作を規定する命令のセット(順次的な命令または他の命令)を実行できる、任意のシステムであってよい。さらに、単一のシステムのみが示されているが、「システム」という用語は、本明細書で論じられた方法の任意の1つまたは複数を実行するための命令のセット(または複数のセット)を、個々にまたは一緒に実行する、システムの任意の集合も含むと、解釈されるべきである。 As shown in FIG. 4, according to various embodiments, a client system may be implemented as a combination of hardware and software components. A client system may include a set of instructions that may be executed to cause the system to perform any one or more of the methods discussed herein. The system can be implemented as a unique machine in the form of a computer. The system may be a server computer, a personal computer (PC), or any system capable of executing a set of instructions (sequential instructions or other instructions) that define the actions to be taken by the system. Further, although only a single system is shown, the term `` system '' refers to a set (or sets) of instructions for performing any one or more of the methods discussed herein. Should be construed to include any collection of systems that execute individually or together.

クライアントシステムは、クライアントプロセッサ402(たとえば、中央演算処理装置(CPU))、実行中のプログラムコードを記憶できるメモリ404、およびディスク記憶装置406を含んでよく、これらのすべてが、バス400を介して互いに通信する。システムはさらに、ビデオディスプレイユニット408(たとえば液晶ディスプレイ(LCD)または陰極線管(CRT))を含み得る。システムはまた、英数字入力デバイス410(たとえばキーボード)、および、要求を送信し応答を受信するための、ネットワークインターフェースデバイス412を含み得る。 The client system may include a client processor 402 (eg, a central processing unit (CPU)), a memory 404 capable of storing executing program code, and a disk storage 406, all of which are connected via the bus 400. Communicate with each other. The system may further include a video display unit 408 (eg, a liquid crystal display (LCD) or a cathode ray tube (CRT)). The system may also include an alphanumeric input device 410 (eg, a keyboard) and a network interface device 412 for sending requests and receiving responses.

ディスク記憶ユニット406は、本明細書で説明される方法または機能の任意の1つまたは複数を具現化する、命令(たとえばソフトウェア)の1つまたは複数のセットが記憶され得る、機械可読媒体を含み得る。命令はまた、システムによるその実行の間、完全にまたは少なくとも部分的に、メモリ404内、および/またはクライアントプロセッサ402内に存在してよく、メモリ404およびクライアントプロセッサ402も、機械可読媒体を構成する。 The disk storage unit 406 includes a machine readable medium on which one or more sets of instructions (e.g., software) may be stored that embodies any one or more of the methods or functions described herein. obtain. The instructions may also be fully or at least partially in memory 404 and / or client processor 402 during its execution by the system, and memory 404 and client processor 402 also constitute a machine-readable medium. .

3GPPファイルフォーマットの使用法
3GPPファイルフォーマット、または、MP4ファイルフォーマットまたは3GPP2ファイルフォーマットのようなISOベースのメディアファイルフォーマットに基づく任意の他のファイルが、以下の特徴を伴うHTTPストリーミングのためのコンテナフォーマットとして使用され得る。要求されたようなファイルまたはメディアセグメントの適切な断片をクライアントがダウンロードできるように、セグメントインデックスが、時間オフセットおよびバイト範囲をシグナリングするために各セグメント中に含まれ得る。メディアプレゼンテーション全体のグローバルなプレゼンテーションのタイミング、および、各々の3GPファイルまたはメディアセグメント内でのローカルなタイミングは、正確に揃えられ得る。1つの3GPファイルまたはメディアセグメント内のトラックは、正確に揃えられ得る。複数の表現にまたがるトラックも、グローバルな時間軸に表現の各々を割り当てることによって揃えられ得るので、表現の切り替えはシームレスであり得るとともに、異なる表現の中のメディアコンポーネントの結合プレゼンテーションが同期され得る。 Using the 3GPP file format
Any other file based on 3GPP file format or ISO based media file format such as MP4 file format or 3GPP2 file format may be used as a container format for HTTP streaming with the following features: A segment index may be included in each segment to signal the time offset and byte range so that the client can download the appropriate piece of the file or media segment as requested. The timing of the global presentation of the entire media presentation and the local timing within each 3GP file or media segment can be precisely aligned. Tracks within one 3GP file or media segment can be precisely aligned. Tracks that span multiple representations can also be aligned by assigning each of the representations to a global timeline, so that the switching of representations can be seamless and the combined presentation of media components in different representations can be synchronized.

ファイルフォーマットは、以下の特性を伴う適応ストリーミングのためのプロファイルを含み得る。すべてのムービーデータはムービーフラグメントに含まれてよく、「moov」ボックスはサンプル情報を何ら含まなくてよい。オーディオおよびビデオサンプルデータは、TS26.244で規定されるようなプログレッシブダウンロードプロファイルに対する要件と同様の要件を伴って、インターリーブされ得る。「moov」ボックスは、ファイルの始めに置かれてよく、それに続いて、セグメントインデックスとも呼ばれ、含んでいるセグメント中の各フラグメントまたはフラグメントの少なくとも1つのサブセットに対する時間およびバイト範囲のオフセット情報を含む、フラグメントオフセットデータが置かれてよい。 The file format may include a profile for adaptive streaming with the following characteristics: All movie data may be contained in a movie fragment, and the “moov” box may not contain any sample information. Audio and video sample data may be interleaved with requirements similar to those for progressive download profiles as specified in TS26.244. The “moov” box may be placed at the beginning of the file, followed by the segment index, containing time and byte range offset information for each fragment or at least a subset of the fragments in the containing segment Fragment offset data may be placed.

既存のプログレッシブダウンロードプロファイルに従うファイルを、Media Presentation Descriptionが参照することも可能であり得る。この場合、クライアントは、Media Presentation Descriptionを使用して、複数の利用可能なバージョンの中から適切な代替のバージョンを単に選択することができる。クライアントはまた、プログレッシブダウンロードプロファイルに適合するファイルとともにHTTP partial get要求を使用して、各代替的なバージョンのサブセットを要求し、これによって、より効率性の低い形式の適応ストリーミングを実施し得る。この場合、プログレッシブダウンロードプロファイル中のメディアを含む異なる表現は、共通のグローバルな時間軸を依然として堅持して、複数の表現の間のシームレスな切り替えを可能にし得る。 It may be possible for the Media Presentation Description to refer to a file that conforms to an existing progressive download profile. In this case, the client can simply select an appropriate alternative version from a plurality of available versions using the Media Presentation Description. The client may also use an HTTP partial get request with a file that conforms to the progressive download profile to request a subset of each alternative version, thereby implementing a less efficient form of adaptive streaming. In this case, different representations including media in the progressive download profile may still maintain a common global timeline and allow seamless switching between multiple representations.

進化した方法の概要
以下のセクションでは、改善されたブロック要求ストリーミングシステムのための方法が説明される。これらの改善の一部は、適用形態における必要性に応じて、これらの改善の他のものとともに、またはそれを伴わずに使用され得ることを理解されたい。一般的な動作では、受信機は、データの特定のブロックまたはブロックの部分に対する、サーバまたは他の送信機の要求を行う。セグメントとも呼ばれるファイルは、複数のブロックを含んでよく、メディアプレゼンテーションの1つのプレゼンテーションと関連付けられる。 Overview of Advanced Method In the following section, a method for an improved block request streaming system is described. It should be understood that some of these improvements can be used with or without others of these improvements, depending on the needs of the application. In general operation, a receiver makes a server or other transmitter request for a particular block or portion of data. A file, also called a segment, may contain multiple blocks and is associated with one presentation of a media presentation.

好ましくは、再生時間または復号時間から、セグメント内の対応するブロックまたはフラグメントのバイトオフセットへの対応付けを提供する、「セグメントインデクシング」または「セグメントマップ」とも呼ばれるインデクシング情報が生成される。このセグメントインデクシングは、セグメント内、通常はセグメントの始め(セグメントマップの少なくとも一部は始めにある)に含まれてよく、小さいことが多い。セグメントインデックスはまた、別個のインデックスセグメントまたはファイルにおいて提供され得る。特に、セグメントインデックスがセグメントに含まれる場合、受信機は、このセグメントマップの一部またはすべてを迅速にダウンロードして、続いてこれを使用して、時間オフセットと、ファイル内でのそれらの時間オフセットと関連付けられるフラグメントの対応するバイト位置との対応付けを決定することができる。 Preferably, indexing information, also referred to as “segment indexing” or “segment map”, is generated that provides a mapping from playback time or decoding time to the byte offset of the corresponding block or fragment within the segment. This segment indexing may be included within the segment, usually at the beginning of the segment (at least part of the segment map is at the beginning) and is often small. The segment index can also be provided in a separate index segment or file. In particular, if a segment index is included in a segment, the receiver can quickly download some or all of this segment map and then use it to use time offsets and their time offsets in the file Can be determined for the corresponding byte position of the fragment associated with.

受信機は、バイトオフセットを使用して、関心のある時間オフセットと関連付けられない他のフラグメントと関連付けられるデータのすべてをダウンロードする必要なく、特定の時間オフセットと関連付けられるフラグメントからのデータを要求することができる。このようにして、セグメントマップまたはセグメントインデクシングは、関心のある現在の時間オフセットに関連するセグメントの部分に直接アクセスするための受信機の能力を大きく向上させることができ、コンテンツザッピング時間の改善、ネットワークの状態が変化するに従ってある表現から別の表現へ迅速に変更する能力、および、受信機において再生されていないメディアをネットワークリソースがダウンロードするという無駄の削減を含む、利益を伴う。 The receiver uses the byte offset to request data from a fragment associated with a specific time offset without having to download all of the data associated with other fragments that are not associated with the time offset of interest. Can do. In this way, the segment map or segment indexing can greatly improve the receiver's ability to directly access the portion of the segment associated with the current time offset of interest, improving content zapping time, There are benefits, including the ability to quickly change from one representation to another as the state changes, and the reduction in waste of network resources downloading media that has not been played at the receiver.

ある表現(本明細書では「切替元」表現と呼ばれる)から別の表現(本明細書では「切替先」表現と呼ばれる)への切り替えが考えられる場合、切替先表現の再生がランダムアクセスポイントからシームレスに開始できるように切替元表現の中のメディアがあるプレゼンテーション時間までダウンロードされるという意味で、シームレスな切り替えが可能にされることを確実にするために、セグメントインデックスはまた、切替先表現の中のランダムアクセスポイントの開始時間を特定して、切替元表現の中の要求されるべきデータの量を特定するために使用され得る。 When switching from one representation (referred to herein as a `` switching source '' representation) to another representation (referred to herein as a `` switching destination '' representation) is considered, playback of the switching destination representation is performed from a random access point. To ensure that seamless switching is possible in the sense that the media in the source representation is downloaded until a certain presentation time so that it can start seamlessly, the segment index is also It can be used to identify the start time of the random access point in and identify the amount of data to be requested in the switch source representation.

それらのブロックは、要求する受信機が受信機のユーザに対する出力を生成するために必要とするビデオメディアまたは他のメディアのセグメントを表す。コンテンツを送信するサーバから受信機がコンテンツを受信するときなどは、メディアの受信機はクライアントデバイスであり得る。例には、セットトップボックス、コンピュータ、ゲームコンソール、特別な設備を備えたテレビジョン、ハンドヘルドデバイス、特別な設備を備えた携帯電話、または他のクライアント受信機がある。 These blocks represent segments of video media or other media that the requesting receiver needs to produce output for the user of the receiver. The media receiver may be a client device, such as when a receiver receives content from a server that transmits the content. Examples are set-top boxes, computers, game consoles, televisions with special equipment, handheld devices, mobile phones with special equipment, or other client receivers.

多くの進化したバッファ管理方法が本明細書で説明される。たとえば、バッファ管理方法は、クライアントが、連続的に再生されるように時間内に受信され得る最高のメディア品質のブロックを要求することを可能にする。可変のブロックサイズという特徴は、圧縮効率を改善する。要求の頻度を制限しながら、要求するデバイスにブロックを送信するために複数の接続をもてることで、改善された送信性能が実現する。データの部分的に受信されたブロックは、メディアプレゼンテーションを継続するために使用され得る。最初に接続をブロックの特定のセットに専用とする必要なく、接続は複数のブロックに対して再使用され得る。複数のクライアントによる複数の可能なサーバの中からのサーバの選択の一貫性が改善され、これは、近くのサーバにおける重複するコンテンツの頻度を下げ、サーバがファイル全体を含む確率を上げる。クライアントは、メディアブロックを含むファイルのURLに組み込まれる、メタデータ(利用可能なメディア符号化物など)に基づいてメディアブロックを要求することができる。システムは、コンテンツの再生がメディア再生において後続の一時停止を引き起こすことなく開始できるまでに必要とされるバッファリング時間の量の、計算および最小化を実現することができる。利用可能な帯域幅のより大きな部分が、必要であれば、最も近い再生時間を伴うブロックに割り振られ得るように、利用可能な帯域幅は、複数のメディアブロックの間で共有され、各ブロックの再生時間が近づくにつれて調整され得る。 A number of advanced buffer management methods are described herein. For example, the buffer management method allows the client to request the highest media quality block that can be received in time to be played continuously. The variable block size feature improves compression efficiency. Improved transmission performance is achieved by having multiple connections to transmit blocks to requesting devices while limiting the frequency of requests. Partially received blocks of data can be used to continue the media presentation. A connection can be reused for multiple blocks without having to first dedicate the connection to a particular set of blocks. The consistency of server selection among multiple possible servers by multiple clients is improved, which reduces the frequency of duplicate content in nearby servers and increases the probability that the server will contain the entire file. The client can request a media block based on metadata (such as available media encodings) embedded in the URL of the file containing the media block. The system can achieve calculation and minimization of the amount of buffering time required before content playback can be started without causing a subsequent pause in media playback. The available bandwidth is shared among multiple media blocks so that a larger portion of the available bandwidth can be allocated to the block with the closest playback time if necessary. It can be adjusted as the playback time approaches.

HTTPストリーミングは、メタデータを利用し得る。プレゼンテーションレベルのメタデータには、たとえば、ストリーム継続時間、利用可能な符号化物(ビットレート、コーデック、空間解像度、フレームレート、言語、メディアタイプ)、各符号化物のためのストリームメタデータに対するポインタ、および、コンテンツ保護(デジタル著作権管理(DRM)情報)がある。ストリームメタデータは、セグメントファイルのURLであり得る。 HTTP streaming can make use of metadata. Presentation level metadata includes, for example, stream duration, available encodings (bit rate, codec, spatial resolution, frame rate, language, media type), pointers to stream metadata for each encoding, and , Content protection (digital rights management (DRM) information). The stream metadata may be a segment file URL.

セグメントメタデータは、セグメント内の要求に対するバイト範囲対時間情報、およびランダムアクセスポイント(RAP)または他の探索点の識別情報を含んでよく、この情報の一部またはすべてが、セグメントインデクシングまたはセグメントマップの一部であり得る。 Segment metadata may include byte range versus time information for requests within a segment, and random access point (RAP) or other search point identification information, some or all of which may be segment indexing or segment maps Can be part of

ストリームは、同じコンテンツの複数の符号化物を含み得る。各符号化物は次いで、セグメントへと分解されてよく、各セグメントは記憶単位またはファイルに対応する。HTTPの場合、セグメントは通常、URLによって参照され得るリソースであり、そのようなURLの要求は、要求応答メッセージのエンティティボディとしてのセグメントの返信をもたらす。セグメントは、複数のピクチャのグループ(GoP)を含み得る。各GoPはさらに、複数のフラグメントを含んでよく、ここで、セグメントインデクシングは、各フラグメントに対する時間/バイトオフセット情報を提供し、すなわち、インデクシングの単位はフラグメントである。 A stream may include multiple encodings of the same content. Each encoding may then be broken down into segments, each segment corresponding to a storage unit or file. In HTTP, a segment is usually a resource that can be referenced by a URL, and a request for such a URL results in the return of the segment as the entity body of the request response message. A segment may include a group of multiple pictures (GoP). Each GoP may further include a plurality of fragments, where segment indexing provides time / byte offset information for each fragment, ie, the indexing unit is a fragment.

フラグメントまたはフラグメントの部分は、スループットを向上させるために、並列のTCP接続を通じて要求され得る。これは、ボトルネックリンク上の接続を共有するとき、または、混雑が原因で接続が失われるときに生じる問題を軽減することができるので、全体的な速度および配信の信頼性が向上し、これは、コンテンツザッピング時間の速度および信頼性をかなり向上させることができる。帯域幅は、過剰な要求によるレイテンシと引き換えであり得るが、スタベーションの危険を高め得る、はるかに未来のものに対する要求を行うことを避けるように、注意が払われなければならない。 Fragments or portions of fragments can be requested through parallel TCP connections to improve throughput. This can alleviate problems that occur when sharing connections on bottleneck links or when connections are lost due to congestion, thus improving overall speed and reliability of delivery. Can significantly improve the speed and reliability of content zapping time. Bandwidth can be in exchange for latency due to excessive demand, but care must be taken to avoid making demands on the far future that can increase the risk of starvation.

同じサーバ上でのセグメントに対する複数の要求は、反復的なTCP始動遅延を避けるためにパイプライン化され得る(現在の要求が完了する前に次の要求を行う)。連続的なフラグメントに対する要求は、1つの要求へと統合され得る。 Multiple requests for segments on the same server can be pipelined to avoid repetitive TCP startup delays (making the next request before the current request is completed). Requests for consecutive fragments can be combined into a single request.

一部のCDNは、大きなファイルを好み、範囲要求を最初に見たときに、元のサーバからのファイル全体のバックグラウンドフェッチを引き起こし得る。しかしながら、大半のCDNは、データが利用可能になるとキャッシュからの範囲要求にサービスする。したがって、クライアント要求の何らかの部分を、セグメントファイル全体のためのものにすることが有利であり得る。これらの要求は、必要であれば後で取り消され得る。 Some CDNs prefer large files and can cause a background fetch of the entire file from the original server when the range request is first viewed. However, most CDNs serve range requests from the cache when data is available. Therefore, it may be advantageous to make some part of the client request for the entire segment file. These requests can be canceled later if necessary.

有効な切替点は、ターゲットストリーム中の探索点、たとえば、具体的にはRAPであり得る。固定されたGoP構造、または複数のストリームにわたるRAPの整列(メディアの開始に基づく、またはGoPに基づく)のような、様々な実装形態が可能である。 A valid switching point may be a search point in the target stream, for example, specifically a RAP. Various implementations are possible, such as a fixed GoP structure or alignment of RAPs across multiple streams (based on media initiation or based on GoP).

一実施形態では、セグメントおよびGoPは、異なるレートのストリームにまたがって揃えられ得る。この実施形態では、GoPは、可変のサイズであってよく、複数のフラグメントを含んでよいが、フラグメントは異なるレートのストリームの間で揃えられない。 In one embodiment, the segments and GoP may be aligned across different rate streams. In this embodiment, the GoP may be of variable size and may include multiple fragments, but the fragments are not aligned between different rate streams.

いくつかの実施形態では、ファイル冗長性が有利に利用され得る。これらの実施形態では、データの冗長なバージョンを生成するために、抹消符号が各フラグメントに付加される。好ましくは、ソースのフォーマッティングは、FECの使用が原因で変更されず、FEC修復データを含む追加の修復セグメントが、たとえば、元の表現の従属の表現として、取込システムにおける追加のステップとして生成され利用可能にされる。そのフラグメントに対するソースデータのみを使用してフラグメントを再構築することが可能であるクライアントは、サーバからのセグメント内のフラグメントに対するソースデータのみを要求することができる。サーバが利用不可能である場合、またはサーバに対する接続が遅い場合、これらはソースデータに対する要求の前または後のいずれかに判定され得るが、追加の修復データが、修復セグメントからのフラグメントのために要求されてよく、このことは、フラグメントを復元するのに十分なデータを信頼性をもって配信するための時間を減らし、場合によってはFEC復号を使用して、受信されたソースと修復データの組合せを使用し、フラグメントのソースデータを復元する。さらに、フラグメントが切迫した状態になると、すなわち、その再生時間が差し迫ると、フラグメントの復元を可能にするために追加の修復データが要求されてよく、このことは、リンク上でのそのフラグメントに対するデータの占有率を上げるが、リンク上の他の接続を切断して帯域幅を解放するよりも効率的である。これはまた、並列接続の使用によるスタベーションの危険を軽減し得る。 In some embodiments, file redundancy can be exploited advantageously. In these embodiments, an erasure code is added to each fragment to generate a redundant version of the data. Preferably, the source formatting is not changed due to the use of FEC, and an additional repair segment containing FEC repair data is generated as an additional step in the capture system, for example as a subordinate representation of the original representation. Made available. A client that is able to reconstruct a fragment using only the source data for that fragment can request only the source data for the fragment in the segment from the server. If the server is unavailable, or if the connection to the server is slow, these can be determined either before or after the request for source data, but additional repair data is not available due to fragments from the repair segment. This may be required, which reduces the time to reliably deliver enough data to recover the fragment and, in some cases, uses FEC decoding to combine the received source and repair data. Use and restore fragment source data. In addition, when a fragment becomes imminent, i.e., when its playback time is imminent, additional repair data may be required to allow the fragment to be restored, which means that for the fragment on the link Increases data occupancy, but is more efficient than disconnecting other connections on the link to free up bandwidth. This may also reduce the risk of starvation due to the use of parallel connections.

フラグメントフォーマットは、リアルタイムトランスポート制御プロトコルRTCPを通じて実現されたオーディオ/ビデオの同期を伴う、リアルタイムトランスポートプロトコル(RTP)パケットの記憶されたストリームであり得る。 The fragment format may be a stored stream of real-time transport protocol (RTP) packets with audio / video synchronization implemented through the real-time transport control protocol RTCP.

セグメントフォーマットはまた、MPEG-2 TSの内部タイミングを通じて実現されたオーディオ/ビデオの同期を伴う、MPEG-2 TSパケットの記憶されたストリームであり得る。 The segment format may also be a stored stream of MPEG-2 TS packets with audio / video synchronization implemented through MPEG-2 TS internal timing.

ストリーミングをより効率的にするためのシグナリングの使用および/またはブロックの作成
多数の特徴が、改善された性能を提供するために、ブロック要求ストリーミングシステムにおいて使用されることもまたは使用されないこともある。性能は、ストールすることなくプレゼンテーションを再生するための能力、帯域幅の制約の中でメディアデータを取得すること、および/または、クライアント、サーバ、および/または取込システムにおいて、限られたプロセッサリソース内でメディアデータを取得することと関連し得る。これらの特徴の一部が、ここで説明される。 Use of signaling and / or block creation to make streaming more efficient A number of features may or may not be used in a block request streaming system to provide improved performance. Performance is the ability to play presentations without stalling, obtaining media data within bandwidth constraints, and / or limited processor resources in client, server, and / or capture systems It may be related to obtaining media data within. Some of these features are described here.

セグメント内のインデクシング
ムービーフラグメントに対するpartial GET要求を編成するために、クライアントは、ファイルまたはセグメント内のフラグメントに含まれるすべてのメディアコンポーネントの復号時間またはプレゼンテーション時間におけるバイトオフセットおよび開始時間、ならびにまた、どのフラグメントがランダムアクセスポイントを開始するまたは含むか(および代替的な表現の切替点として適切であるか)を知らされてよく、この情報は、セグメントインデクシングまたはセグメントマップと呼ばれることが多い。復号時間またはプレゼンテーション時間における開始時間は、直接的に表現されてよく、または、基準時間に対するデルタとして表現されてよい。 In order to organize a partial GET request for an indexed movie fragment within a segment, the client is responsible for the byte offset and start time in the decoding time or presentation time of all media components contained in the file or fragment within the segment, and also which fragment May be informed of whether to start or include a random access point (and is suitable as an alternative representation switch point), this information is often referred to as segment indexing or segment map. The start time at the decoding time or the presentation time may be expressed directly or may be expressed as a delta with respect to the reference time.

この時間およびバイトオフセットのインデクシング情報は、ムービーフラグメントごとの少なくとも8バイトのデータを必要とし得る。ある例として、500msのムービーフラグメントを伴う、単一のファイルに含まれる2時間のムービーに対して、これは、全体で約112キロバイトのデータである。プレゼンテーションを開始するときにこのデータのすべてをダウンロードすることは、大きな追加の始動遅延をもたらし得る。しかしながら、時間およびバイトのオフセットデータは階層的に符号化され得るので、クライアントは、クライアントが開始することを望むプレゼンテーション中の地点に関連する、時間およびオフセットのデータの小さな塊を迅速に見つけることができる。情報はまた、セグメントインデックスのいくつかの改良版が、メディアデータとインターリーブされて配置され得るように、セグメント内に分布し得る。 This time and byte offset indexing information may require at least 8 bytes of data per movie fragment. As an example, for a two hour movie contained in a single file with a 500 ms movie fragment, this is about 112 kilobytes of data in total. Downloading all of this data when starting a presentation can result in significant additional startup delays. However, since time and byte offset data can be encoded hierarchically, the client can quickly find a small chunk of time and offset data associated with the point in the presentation that the client wishes to begin. it can. The information may also be distributed within the segment so that some refinement of the segment index can be placed interleaved with the media data.

表現が時間的に複数のセグメントへとセグメント化される場合、各セグメントに対する完全な時間およびオフセットのデータはすでに十分小さいことがあるので、この階層的コーディングの使用は必要ではないことがあることに留意されたい。たとえば、セグメントが上の例で2時間ではなく1分である場合、時間-バイトオフセットのインデクシング情報は1キロバイト程度のデータであり、これは通常、単一のTCP/IPパケットに収まり得る。 If the representation is segmented into multiple segments in time, the use of this hierarchical coding may not be necessary because the complete time and offset data for each segment may already be small enough. Please keep in mind. For example, if the segment is 1 minute instead of 2 hours in the above example, the time-byte offset indexing information is on the order of 1 kilobyte of data, which can typically fit in a single TCP / IP packet.

異なる選択肢は、フラグメントの時間およびバイトのオフセットデータを3GPPファイルに追加することが可能である。 Different options can add fragment time and byte offset data to the 3GPP file.

まず、Movie Fragment Random Access Box(「MFRA」)が、この目的で使用され得る。MFRAは、ムービーフラグメントを使用してファイル中のランダムアクセスポイントを読者が発見するのを支援し得る、表を提供する。この機能を支援するように、MFRAはついでに、ランダムアクセスポイントを含むMFRAボックスのバイトオフセットを含む。MFRAは、ファイルの終わりに、またはその近くに配置され得るが、そうであるとは限らない。Movie Fragment Random Access Offset Boxに対するファイルの終わりからスキャンして、その中のサイズ情報を使用することによって、Movie Fragment Random Access Boxの始まりを位置特定することが可能になり得る。しかしながら、HTTPストリーミングの終わりにMFRAを配置することは、通常、所望のデータにアクセスするための少なくとも3〜4個のHTTP要求、すなわちファイルの終わりからMFRAを要求するための少なくとも1つのHTTP要求と、MFRAを取得するための1つのHTTP要求と、ファイル中の所望のフラグメントを取得するための最後の1つのHTTP要求とを必要とする。したがって、mfraが単一の要求の中の最初のメディアデータと一緒にダウンロードされ得るので、始めに配置することが望ましいことがある。また、HTTPストリーミングのためにMFRAを使用するのは非効率であることがあり、それは、「MFRA」中の情報は時間およびmoof_offsetを除いて必要とされず、長さの代わりにオフセットを規定することはより多くのビットを必要とし得るからである。 First, a Movie Fragment Random Access Box (“MFRA”) can be used for this purpose. MFRA provides a table that can help readers find random access points in a file using movie fragments. To support this function, MFRA then includes the byte offset of the MFRA box that contains the random access point. The MFRA can be placed at or near the end of the file, but this is not always the case. By scanning from the end of the file for the Movie Fragment Random Access Offset Box and using the size information in it, it may be possible to locate the beginning of the Movie Fragment Random Access Box. However, placing MFRA at the end of HTTP streaming usually involves at least 3-4 HTTP requests to access the desired data, i.e. at least one HTTP request to request MFRA from the end of the file. , One HTTP request to get the MFRA and one last HTTP request to get the desired fragment in the file. Thus, it may be desirable to place it first because mfra can be downloaded along with the first media data in a single request. Also, using MFRA for HTTP streaming can be inefficient, as information in "MFRA" is not required except for time and moof_offset, and specifies an offset instead of a length Because it may require more bits.

第2に、Item Location Box(「ILOC」)が使用され得る。「ILOC」は、このファイルまたは他のファイルの中のメタデータリソースのディレクトリを、メタデータリソースを含むファイル、そのファイル内でのメタデータリソースのオフセット、およびメタデータリソースの長さを特定することによって提供する。たとえば、システムは、すべての外部的に参照されるメタデータリソースを1つのファイルへと統合し、それに従って、ファイルオフセットおよびファイル参照を再調整することができる。しかしながら、「ILOC」は、メタデータの位置を与えることが意図されているので、ILOCが実際のメタデータと共存することは難しいことがある。 Second, an Item Location Box (“ILOC”) may be used. “ILOC” identifies the directory of the metadata resource in this file or other files, the file containing the metadata resource, the offset of the metadata resource within that file, and the length of the metadata resource Provided by. For example, the system can consolidate all externally referenced metadata resources into a single file and readjust file offsets and file references accordingly. However, since “ILOC” is intended to give the location of the metadata, it may be difficult for ILOC to coexist with the actual metadata.

最後の、そして場合によっては最も適切なものは、正確なフラグメントの時間または継続時間とバイトオフセットとを効率的に提供する目的に特に専用である、Time Index Box(「TIDX」)と呼ばれる新たなボックスの仕様である。これは、次のセクションでより詳細に説明される。同じ機能を有する代替的なボックスは、Segment Index Box(「SIDX」)であり得る。本明細書では、別段示されない限り、これら2つは相互に交換可能であってよく、それは、両方のボックスが、正確なフラグメントの時間または継続時間とバイトオフセットとを効率的に提供するための能力を提供するからである。TIDXとSIDXとの違いは以下で与えられる。両方のボックスがセグメントインデックスを実装するので、TIDXボックスとSIDXボックスとをどのように相互に交換するかは明らかであろう。 The last, and sometimes the most appropriate, is a new one called Time Index Box (“TIDX”), which is specifically dedicated to the efficient provision of accurate fragment time or duration and byte offset. It is the specification of the box. This is explained in more detail in the next section. An alternative box having the same function may be a Segment Index Box (“SIDX”). As used herein, unless otherwise indicated, the two may be interchangeable, so that both boxes efficiently provide accurate fragment time or duration and byte offset. Because it provides the ability. The difference between TIDX and SIDX is given below. Since both boxes implement a segment index, it will be clear how to exchange TIDX boxes and SIDX boxes with each other.

セグメントインデクシング
セグメントは、特定された開始時間および特定された数のバイトを有する。複数のフラグメントは、単一のセグメントへと連結されてよく、クライアントは、要求されたフラグメントまたはフラグメントのサブセットに対応するセグメント内の特定のバイト範囲を特定する要求を出すことができる。たとえば、HTTPが要求プロトコルとして使用される場合、HTTP Rangeヘッダがこの目的で使用され得る。この手法は、クライアントが異なるフラグメントのセグメント内の位置を規定するセグメントの「セグメントインデックス」へのアクセス権を有することを必要とする。この「セグメントインデックス」は、メタデータの一部として提供され得る。この手法は、各々のブロックが別個のファイルに保存される手法と比較して、はるかに少数のファイルしか作成され管理される必要がないという結果をもたらす。非常に大量の(たとえば、1時間のプレゼンテーションに対して数千に及び得る)ファイルの作成、転送、および記憶の管理は、複雑でエラーを発生させやすいことがあるので、ファイル数の低減が利点となる。 Segment indexing A segment has a specified start time and a specified number of bytes. Multiple fragments may be concatenated into a single segment, and the client can make a request to identify a particular range of bytes within the segment corresponding to the requested fragment or subset of fragments. For example, if HTTP is used as the request protocol, the HTTP Range header may be used for this purpose. This approach requires that the client have access to the “segment index” of the segment that defines the position within the segment of the different fragments. This “segment index” may be provided as part of the metadata. This approach results in that far fewer files need to be created and managed compared to the approach where each block is stored in a separate file. Managing the creation, transfer, and storage of very large numbers of files (for example, up to thousands for a one hour presentation) can be complex and error-prone, so reducing the number of files is an advantage It becomes.

クライアントがセグメントのより小さい部分の所望の開始時間しか知らない場合、クライアントは、ファイル全体を要求し、次いでファイル全体を読み取って、適切な再生開始位置を決定することがある。帯域幅の使用率を改善するために、セグメントは、メタデータとしてインデックスファイルを含むことがあり、インデックスファイルは、個々のブロックのバイト範囲をブロックが対応する時間範囲と対応付け、これは、セグメントインデクシングまたはセグメントマップと呼ばれる。このメタデータは、XMLデータとしてフォーマットされてよく、または、たとえば3GPPファイルフォーマットのアトムおよびボックス構造に従った、バイナリであってよい。インデクシングは単純であってよく、このとき各ブロックの時間およびバイトの範囲は、ファイルの開始に対して絶対的であり、または、インデクシングは階層的であってよく、このとき一部のブロックは親ブロックへとグループ化され(かつ親ブロックは祖父母ブロックへとグループ化される、など)、所与のブロックに対する時間およびバイトの範囲は、ブロックの親ブロックの時間および/またはバイトの範囲に対して表現される。 If the client only knows the desired start time for a smaller portion of the segment, the client may request the entire file and then read the entire file to determine an appropriate playback start position. To improve bandwidth utilization, a segment may include an index file as metadata, which associates the byte range of each block with the time range that the block corresponds to, which is the segment This is called indexing or segment map. This metadata may be formatted as XML data, or may be binary, for example according to the atom and box structure of the 3GPP file format. Indexing may be simple, where the time and byte range of each block is absolute with respect to the start of the file, or the indexing may be hierarchical, with some blocks being parent Grouped into blocks (and parent blocks are grouped into grandparent blocks, etc.), and the time and byte range for a given block is relative to the time and / or byte range of the block's parent block Expressed.

例示的なインデクシングマップ構造
一実施形態では、メディアストリームの1つの表現に対する元のソースデータは、「メディアセグメント」と本明細書で呼ばれる1つまたは複数のメディアファイルに含まれてよく、各メディアセグメントは、メディアの連続的な時間セグメント、たとえば、5分のメディア再生を再生するために使用されるメディアデータを含む。 Exemplary Indexing Map Structure In one embodiment, the original source data for one representation of a media stream may be included in one or more media files, referred to herein as “media segments”, with each media segment Includes media data used to play continuous time segments of media, eg, 5 minutes of media playback.

図6は、メディアセグメントの例示的な全体的な構造を示す。各セグメント内で、始めにおいて、または、ソースセグメント全体に分散して、時間/バイトオフセットセグメントマップを含むインデクシング情報も存在し得る。一実施形態の時間/バイトオフセットセグメントマップは、時間/バイトオフセットのペア(T(0),B(0)),(T(1),B(1)),…,(T(i),B(i)),…,(T(n),B(n))のリストであってよく、T(i-1)は、すべてのメディアセグメントの中で最初のメディアの開始時間に対する、メディアのi番目のフラグメントの再生のためのセグメント内の開始時間を表し、T(i)は、i番目のフラグメントの終了時間(および、したがって、次のフラグメントの開始時間)を表し、バイトオフセットB(i-1)は、このソースセグメント内のデータの始めの対応するバイトインデックスであり、ここでメディアのi番目のフラグメントはソースセグメントの始めに対して開始し、B(i)は、i番目のフラグメントの対応する終了バイトインデックス(および、したがって、次のフラグメントの最初のバイトのインデックス)である。セグメントが複数のメディアコンポーネントを含む場合、T(i)およびB(i)は、絶対的な方式でセグメント中の各コンポーネントに対して与えられてよく、または、基準メディアコンポーネントをサービスする別のメディアコンポーネントに対して表現されてよい。 FIG. 6 shows an exemplary overall structure of the media segment. Within each segment, there may also be indexing information including a time / byte offset segment map, either at the beginning or distributed throughout the source segment. The time / byte offset segment map of one embodiment includes time / byte offset pairs (T (0), B (0)), (T (1), B (1)), ..., (T (i), B (i)), ..., (T (n), B (n)), where T (i-1) is the media for the start time of the first media among all media segments. Represents the start time in the segment for the playback of the i-th fragment, and T (i) represents the end time of the i-th fragment (and hence the start time of the next fragment) and the byte offset B ( i-1) is the corresponding byte index at the beginning of the data in this source segment, where the i th fragment of the media starts at the beginning of the source segment and B (i) is the i th The corresponding end byte index of the fragment (and therefore the index of the first byte of the next fragment ) It is. If the segment contains multiple media components, T (i) and B (i) may be given for each component in the segment in an absolute manner, or another media that services the reference media component It may be expressed for a component.

この実施形態では、ソースセグメント中のフラグメントの数はnであり、nはセグメントごとに変化し得る。 In this embodiment, the number of fragments in the source segment is n, and n can vary from segment to segment.

別の実施形態では、各フラグメントに対するセグメントインデックスにおける時間オフセットは、最初のセグメントの絶対的な開始時間および各フラグメントの継続時間によって決定され得る。この場合、セグメントインデックスは、最初のフラグメントの開始時間と、セグメントに含まれるすべてのフラグメントの継続時間とを記録することができる。セグメントインデックスはまた、フラグメントのサブセットのみを記録し得る。その場合、セグメントインデックスは、含んでいるセグメントの終わりにおいて、または次のサブセグメントの始めにおいて終了する、1つまたは複数の連続的なフラグメントとして定義されるサブセグメントの継続時間を記録する。 In another embodiment, the time offset in the segment index for each fragment may be determined by the absolute start time of the first segment and the duration of each fragment. In this case, the segment index can record the start time of the first fragment and the duration of all the fragments included in the segment. A segment index may also record only a subset of the fragments. In that case, the segment index records the duration of a subsegment, defined as one or more consecutive fragments, ending at the end of the containing segment or at the beginning of the next subsegment.

各フラグメントに対して、フラグメントが探索点、すなわち、その点より後のメディアがその点よりも前のいずれのメディアにも依存しない点で開始するかどうか、またはそれを含むかどうかを示す値も存在し得るので、そのフラグメントから先のメディアは、前のフラグメントとは独立に再生され得る。探索点は、一般に、再生がすべての前のメディアとは独立に開始できる、メディア中の点である。図6はまた、ソースセグメントに対する可能なセグメントインデクシングの単純な例を示す。その例では、時間オフセット値はミリ秒の単位であり、したがって、このソースセグメントの最初のフラグメントは、メディアの始めから20秒で開始し、最初のフラグメントは485ミリ秒という再生時間を有する。最初のフラグメントの始めのバイトオフセットは0であり、最初のフラグメントの終わり/2番目のフラグメントの始めのバイトオフセットは50,245であり、したがって、最初のフラグメントのサイズは50,245バイトである。フラグメントまたはサブセグメントがランダムアクセスポイントで開始しないが、ランダムアクセスポイントがフラグメントまたはサブセグメントに含まれる場合、開始時間と実際のRAP時間との復号時間差またはプレゼンテーション時間差が与えられ得る。これにより、このメディアセグメントへの切り替えの場合、表示からの切り替えが提示されなければならなくなるまでに、クライアントが時間を正確に知れることが可能になる。 For each fragment, there is also a value that indicates whether the fragment starts at or contains the search point, i.e., the media after that point does not depend on any media prior to that point. Because it can exist, media beyond that fragment can be played independently of the previous fragment. A search point is generally a point in the media where playback can begin independently of all previous media. FIG. 6 also shows a simple example of possible segment indexing for the source segment. In that example, the time offset value is in milliseconds, so the first fragment of this source segment starts 20 seconds from the beginning of the media, and the first fragment has a playback time of 485 milliseconds. The initial byte offset of the first fragment is 0, the end of the first fragment / 2 the initial byte offset of the second fragment is 50,245, and therefore the size of the first fragment is 50,245 bytes. If a fragment or subsegment does not start with a random access point, but a random access point is included in the fragment or subsegment, a decoding time difference or presentation time difference between the start time and the actual RAP time may be given. This allows the client to know exactly the time before switching from the display has to be presented when switching to this media segment.

単純なまたは階層的なインデクシングに加えて、またはその代わりに、デイジーチェーンインデクシングおよび/またはハイブリッドインデクシングが使用され得る。 In addition to or instead of simple or hierarchical indexing, daisy chain indexing and / or hybrid indexing may be used.

異なるトラックに対するサンプル期間は同じではないことがあるので(たとえば、ビデオサンプルは33ms表示され得るが、オーディオサンプルは80ms続き得る)、ムービーフラグメント中の異なるトラックは、厳密に同じ時間に開始し終了しないことがあり、すなわち、オーディオはビデオのわずかに前またはわずかに後に開始することがあり、それを埋め合わせるように、逆のことが先行するフラグメントについて成り立つ。曖昧さを避けるために、時間およびバイトオフセットデータにおいて規定されるタイムスタンプは、特定のトラックに対して規定されてよく、これは、各表現に対して同じトラックであり得る。通常、これはビデオトラックである。これにより、クライアントが表現を切り替えているときに、クライアントが次のビデオフレームを正確に特定することが可能になる。 Because the sample periods for different tracks may not be the same (for example, video samples may be displayed for 33ms, but audio samples may last 80ms), different tracks in a movie fragment will not start and end at exactly the same time In other words, the audio may start slightly before or slightly after the video, and the reverse holds for the preceding fragment to make up for it. To avoid ambiguity, time stamps defined in time and byte offset data may be defined for a particular track, which may be the same track for each representation. Usually this is a video track. This allows the client to accurately identify the next video frame when the client is switching representations.

トラックの時間軸とプレゼンテーション時間との厳密な関係を維持し、上記の問題にもかかわらずオーディオ/ビデオの同期の円滑な再生および維持を確実にするように、プレゼンテーションの間に注意が払われ得る。 Care can be taken during the presentation to maintain a strict relationship between the timeline of the track and the presentation time and to ensure smooth playback and maintenance of audio / video synchronization despite the above issues .

図7は、単純なインデックス700および階層的インデックス702のようないくつかの例を示す。 FIG. 7 shows some examples such as a simple index 700 and a hierarchical index 702.

セグメントマップを含むボックスの2つの具体的な例が以下で与えられ、1つはtime index box(「TIDX」)と呼ばれ、1つは(「SIDX」)と呼ばれる。この定義は、ISOベースのメディアファイルフォーマットによるボックス構造に従う。同様のシンタックスを定義するための、ならびに同じセマンティクスおよび機能を有するそのようなボックスの他の設計は、読者には明らかであろう。 Two specific examples of boxes containing segment maps are given below, one called a time index box (“TIDX”) and one called (“SIDX”). This definition follows a box structure with an ISO-based media file format. Other designs of such boxes for defining similar syntax and having the same semantics and functionality will be apparent to the reader.

Time Index Box
定義 Time Index Box
Definition

ボックスタイプ:「tidx」 Box type: “tidx”

コンテナ:ファイル Container: file

必須性:なし Essentiality: None

量:0または1のいずれかの数 Amount: number of either 0 or 1

Time Index Boxは、ファイルのいくつかの領域をプレゼンテーションのいくつかの時間間隔と関連付ける、時間およびバイトオフセットインデックスのセットを提供し得る。Time Index Boxは、参照されるデータのタイプを示す、targettypeフィールドを含み得る。たとえば、targettype「moof」を伴うTime Index Boxは、時間とバイトの両方のオフセットに関して、ファイルに含まれるメディアフラグメントに対するインデックスを提供する。Time Index Boxというtargettypeを伴うTime Index Boxは、階層的時間インデックスを構築するために使用されてよく、ファイルのユーザが、インデックスの要求される部分へと迅速に進むことを可能にする。 The Time Index Box may provide a set of time and byte offset indexes that associate several areas of the file with several time intervals of the presentation. The Time Index Box may include a targettype field that indicates the type of data being referenced. For example, the Time Index Box with targettype “moof” provides an index to the media fragments contained in the file with respect to both time and byte offsets. A Time Index Box with a target type of Time Index Box may be used to build a hierarchical time index, allowing the user of the file to go quickly to the required part of the index.

セグメントインデックスは、たとえば、次のシンタックスを含み得る。 The segment index may include the following syntax, for example.

aligned(8) class TimeIndexBox extends FullBox('frai'){
unsigned int(32) targettype; aligned (8) class TimeIndexBox extends FullBox ('frai') {
unsigned int (32) targettype;

unsigned int(32) time_reference_track_ID;
unsigned int(32) number_of_elements;
unsigned int(64) first_element_offset;
unsigned int(64) first_element_time;
for(i=1;i<=number_of_elements;i++)
{
bit(1) random_access_flag;
unsigned int(31) length;
unsigned int(32) deltaT;
}
} unsigned int (32) time_reference_track_ID;
unsigned int (32) number_of_elements;
unsigned int (64) first_element_offset;
unsigned int (64) first_element_time;
for (i = 1; i <= number_of_elements; i ++)
{
bit (1) random_access_flag;
unsigned int (31) length;
unsigned int (32) deltaT;
}
}

セマンティクス Semantics

targettype:このTime Index Boxによって参照されるボックスデータのタイプである。これは、Movie Fragment Header(「moof」)またはTime Index Box(「tidx」)のいずれかであり得る。 targettype: the type of box data referenced by this Time Index Box. This can be either a Movie Fragment Header (“moof”) or a Time Index Box (“tidx”).

time-reference_track_id:このインデックスにおける時間オフセットが規定される対象のトラックを示す。 time-reference_track_id: indicates the target track for which the time offset at this index is defined.

number_of_elements:このTime Index Boxによってインデックスが付けられる要素の数。 number_of_elements: The number of elements indexed by this Time Index Box.

first_element_offset:最初のインデックスが付けられた要素のファイルの始めからのバイトオフセット。 first_element_offset: The byte offset from the beginning of the file of the first indexed element.

first_element_time:time_reference_track_idによって特定されたトラックのMedia Headerボックス中で規定される時間軸を使用した、最初のインデックスが付けられた要素の開始時間。 first_element_time: The start time of the first indexed element using the time axis specified in the Media Header box of the track identified by time_reference_track_id.

random_access_flag:要素の開始時間がランダムアクセスポイントである場合は1である。それ以外の場合は0である。 random_access_flag: 1 if the element start time is a random access point. Otherwise it is 0.

length:インデックスが付けられた要素のバイト単位での長さ。 length: The length of the indexed element in bytes.

deltaT:この要素の開始時間と次の要素の開始時間との間の、time_reference_track_idによって特定されるトラックのMedia Headerボックス中で規定される時間軸での差分。 deltaT: The difference on the time axis specified in the Media Header box of the track specified by time_reference_track_id between the start time of this element and the start time of the next element.

Segment Index Box Segment Index Box

Segment Index Box(「sidx」)は、ムービーフラグメントのコンパクトなインデックスと、セグメント中の他のSegment Index Boxとを提供する。Segment Index Box中には2つのループ構造がある。最初のループは、サブセグメントの最初のサンプル、すなわち、2番目のループによって参照される最初のムービーフラグメント中のサンプルを記録する。2番目のループは、サブセグメントのインデックスを提供する。「sidx」ボックスのコンテナは、直接、ファイルまたはセグメントである。 The Segment Index Box ("sidx") provides a compact index of movie fragments and other Segment Index Boxes in the segment. There are two loop structures in the Segment Index Box. The first loop records the first sample of the subsegment, i.e. the sample in the first movie fragment referenced by the second loop. The second loop provides the subsegment index. The container of the “sidx” box is directly a file or a segment.

シンタックス Syntax

aligned(8) class SegmentIndexBox extends FullBox('sidx',version,0){ aligned (8) class SegmentIndexBox extends FullBox ('sidx', version, 0) {

unsigned int(32) reference_track_ID; unsigned int (32) reference_track_ID;

unsigned int(16) track_count; unsigned int (16) track_count;

unsigned int(16) reference_count; unsigned int (16) reference_count;

for(i=1; i<= track_count;i++) for (i = 1; i <= track_count; i ++)

{ {

unsigned int(32) track_ID; unsigned int (32) track_ID;

if(version==0) if (version == 0)

{ {

unsigned int(32) decoding_time; unsigned int (32) decoding_time;

}else } else

{ {

unsigned int(64) decoding_time; unsigned int (64) decoding_time;

} }

for(i=1;i<=reference_count;i++) for (i = 1; i <= reference_count; i ++)

{ {

bit (1) reference_type; bit (1) reference_type;

unsigned int(31) reference_offset; unsigned int (31) reference_offset;

unsigned int(32) subsegment_duration; unsigned int (32) subsegment_duration;

bit(1) contains_RAP; bit (1) contains_RAP;

unsigned int(31) RAP_delta_time; unsigned int (31) RAP_delta_time;

} }

セマンティクス Semantics

reference_track_IDは、参照トラックのtrack_IDを与える。 reference_track_ID gives the track_ID of the reference track.

track_count:次のループにおいてインデックスが付けられるトラックの数(1以上)。 track_count: Number of tracks (1 or more) to be indexed in the next loop.

reference_count:2番目のループにおいてインデックスが付けられる要素の数(1以上)。 reference_count: The number of elements to be indexed in the second loop (1 or more).

track_ID:トラックフラグメントがこのインデックスによって特定される最初のムービーフラグメントに含まれる、トラックのIDであり、このループ中のちょうど1つのtrack_IDがreference_track_IDに等しい。 track_ID: The ID of the track contained in the first movie fragment identified by this index, and exactly one track_ID in this loop is equal to reference_track_ID.

decoding_time:(トラックのMedia Header Boxのtimescaleフィールドにおいて記録されるような)トラックの時間軸で表現される、2番目のループ中の最初のアイテムによって参照されるムービーフラグメント中のtrack_IDによって特定されるトラック中の最初のサンプルの復号時間。 decoding_time: The track identified by the track_ID in the movie fragment referenced by the first item in the second loop, expressed in the track's time axis (as recorded in the timescale field of the track's Media Header Box) The decoding time of the first sample in

reference_type:0に設定される場合、参照がムービーフラグメント(「moof」)ボックスに対するものであることを示し、1に設定される場合、参照がセグメントインデックス(「sidx」)ボックスに対するものであることを示す。 If set to reference_type: 0, indicates that the reference is for a movie fragment (“moof”) box, and if set to 1, indicates that the reference is for a segment index (“sidx”) box. Show.

reference_offset:含んでいるSegment Index Boxの後の最初のバイトから、参照されるボックスの最初のバイトまでの、バイト単位の距離。 reference_offset: Distance in bytes from the first byte after the containing Segment Index Box to the first byte of the referenced box.

subsegment_duration:参照がSegment Index Boxに対するものである場合、このフィールドは、そのボックスの2番目のループ中のsubsegment_durationフィールドの合計を担持し、参照がムービーフラグメントに対するものである場合、このフィールドは、示されるムービーフラグメント、および、ループ中の次のエントリーによって記録される最初のムービーフラグメントまたはサブセグメントの終わりのいずれか早い方までの後続のムービーフラグメントにおける、参照トラック中のサンプルのサンプル継続時間の合計を担持し、(トラックのMedia Header Boxのtimescaleフィールドにおいて記録されるように)継続時間がトラックの時間軸で表現される。 subsegment_duration: If the reference is to a Segment Index Box, this field carries the sum of the subsegment_duration field in the box's second loop, and if the reference is to a movie fragment, this field is indicated Carries the total sample duration of samples in the reference track in the movie fragment and subsequent movie fragments up to the first movie fragment recorded by the next entry in the loop or the end of the subsegment, whichever comes first However, the duration is expressed on the time axis of the track (as recorded in the timescale field of the track's Media Header Box).

contains_RAP:参照がムービーフラグメントに対するものであるとき、reference_track_IDに等しいtrack_IDを伴うトラックのそのムービーフラグメント内のトラックフラグメントが少なくとも1つのランダムアクセスポイントを含む場合は、このビットは1であってよく、それ以外の場合はこのビットは0に設定される。参照がセグメントインデックスに対するものであるとき、そのセグメントインデックス中の参照のいずれかがこのビットを1へと設定させる場合のみ、このビットは1に設定され、それ以外の場合は0である。 contains_RAP: If the reference is to a movie fragment, this bit may be 1 if the track fragment in that movie fragment of the track with track_ID equal to reference_track_ID contains at least one random access point, otherwise In this case, this bit is set to 0. When a reference is to a segment index, this bit is set to 1 only if any reference in that segment index has this bit set to 1, otherwise it is 0.

RAP_data_time:contains_RAPが1である場合、ランダムアクセスポイント(RAP)のプレゼンテーション(複合)時間を与え、contains_RAPが0である場合、値0で保留される。時間は、このエントリーによって記録されるサブセグメントの最初のサンプルの復号時間と、reference_track_IDに等しいtrack_IDを伴うトラック中のランダムアクセスポイントのプレゼンテーション(複合)時間との差分として表される。 When RAP_data_time: contains_RAP is 1, the presentation (composite) time of a random access point (RAP) is given, and when contains_RAP is 0, the value 0 is reserved. The time is expressed as the difference between the decoding time of the first sample of the subsegment recorded by this entry and the presentation (composite) time of the random access point in the track with track_ID equal to reference_track_ID.

TIDXとSIDXの違い
TIDXおよびSIDXは、インデクシングに関して同じ機能を提供する。SIDXの最初のループは、最初のムービーフラグメントのグローバルなタイミングを追加で提供するが、グローバルなタイミングは、ムービーフラグメント自体にも、絶対的に、または基準トラックに対して相対的に含まれ得る。 Difference between TIDX and SIDX
TIDX and SIDX provide the same functionality with respect to indexing. The first loop of SIDX additionally provides global timing for the first movie fragment, but global timing can also be included in the movie fragment itself, either absolutely or relative to the reference track.

SIDXの2番目のループは、TIDXの機能を実装する。具体的には、SIDXは、reference_typeによって参照される各インデックスに対する参照の対象の混合物を有することを可能にするが、TIDXは、TIDXのみまたはMOOFのみを参照するだけである。TIDXのnumber_of_elementsは、SIDXのreference_countに対応し、TIDXのtime-reference_track_idはSIDXのreference_track_IDに対応し、TIDXのfirst_element_offsetは2番目のループの最初のエントリーのreference_offsetに対応し、TIDXのfirst_element_timeは最初のループのreference_trackのdecoding_timeに対応し、TIDXのrandom_access_flagは、SIDXでは必ずしもRAPがフラグメントの始めに配置されなくてもよいのでRAP_delta_timeを必要とするという追加の自由度を伴って、SIDXのcontains_RAPに対応し、TIDXのlengthはSIDXのreference_offsetに対応し、かつ最後に、TIDXのdeltaTはSIDXのsubsegment_durationに対応する。したがって、2つのボックスの機能は等価である。 The second loop of SIDX implements the function of TIDX. Specifically, SIDX allows to have a mixture of references for each index referenced by reference_type, while TIDX only refers to TIDX only or MOOF only. TIDX number_of_elements corresponds to SIDX reference_count, TIDX time-reference_track_id corresponds to SIDX reference_track_ID, TIDX first_element_offset corresponds to reference_offset of the first entry in the second loop, TIDX first_element_time corresponds to the first loop TIDX random_access_flag corresponds to SIDX contains_RAP, with the additional freedom that RAP_delta_time is required because TIDX does not necessarily have to be placed at the beginning of the fragment. The length of TIDX corresponds to reference_offset of SIDX, and finally, deltaT of TIDX corresponds to subsegment_duration of SIDX. Thus, the functions of the two boxes are equivalent.

可変のブロックサイジングおよびサブGoPブロック
ビデオメディアにとって、ビデオ符号化構造と要求のブロック構造との関係は重要であり得る。たとえば、ランダムアクセスポイント(「RAP」)のような探索点で各ブロックが開始し、かつ各ブロックは、等しい期間のビデオ時間を表す場合、ビデオメディア中の少なくともいくつかの探索点の配置は固定され、かつ探索点はビデオ符号化物内で一定の間隔で発生する。ビデオ符号化の当業者によく知られているように、探索点がビデオフレーム間の関係に従って配置されると、特に、前のフレームとの共通点がほとんどないフレームに配置されると、圧縮効率が改善され得る。ブロックが等しい量の時間を表すというこの要件は、したがって、圧縮が準最適となり得るように、ビデオ符号化に対する制約を課す。 Variable Block Sizing and Sub-GoP Blocks For video media, the relationship between the video coding structure and the required block structure can be important. For example, if each block starts at a search point such as a random access point ("RAP") and each block represents an equal duration of video time, the placement of at least some search points in the video media is fixed And search points occur at regular intervals in the video code. As is well known to those skilled in the art of video coding, compression efficiency is improved when search points are arranged according to the relationship between video frames, especially in frames that have little in common with previous frames. Can be improved. This requirement that the blocks represent an equal amount of time therefore imposes constraints on video coding so that compression can be suboptimal.

探索点が固定された位置にあることを要求するのではなく、ビデオプレゼンテーション内の探索点の配置がビデオ符号化システムによって選ばれることを可能にするのが望ましい。ビデオ符号化システムが探索点を選ぶのを可能にすることで、ビデオ圧縮が改善されるので、所与の利用可能な帯域幅を使用してより高品質のビデオメディアを提供することができ、ユーザ体験の改善をもたらす。現在のブロック要求ストリーミングシステムは、すべてのブロックが同じ継続時間(ビデオ時間に関して)であること、および各ブロックが探索点で開始しなければならないことを要求し得るので、これは既存のシステムの欠点である。 Rather than requiring the search points to be at a fixed location, it is desirable to allow the placement of the search points within the video presentation to be chosen by the video encoding system. By allowing the video encoding system to choose a search point, video compression is improved, so that higher quality video media can be provided using a given available bandwidth, Improve user experience. This is a drawback of existing systems because current block request streaming systems can require that all blocks have the same duration (in terms of video time) and that each block must start at a search point. It is.

上記のものに対する利点を提供する新規のブロック要求ストリーミングシステムが、ここで説明される。一実施形態では、ビデオコンポーネントの第1のバージョンのビデオ符号化プロセスは、圧縮効率を最適化するために、探索点の配置を選ぶように構成され得るが、探索点と探索点の間の長さに対する最大値があるという要件を伴う。この後者の要件は、符号化プロセスによる探索点の選択を制約するので、圧縮効率を下げる。しかしながら、この圧縮効率の低下は、探索点と探索点の間の長さに対する最大値が小さすぎなければ(たとえば、約1秒より長い)、通常の固定された位置が探索点に対して要求される場合に引き起こされるものよりも小さい。さらに、探索点と探索点の間の長さに対する最大値が数秒である場合、探索点の配置が完全に自由である場合と比較した圧縮効率の低下は、一般に非常に小さい。 A novel block request streaming system that provides advantages over the above will now be described. In one embodiment, the video encoding process of the first version of the video component may be configured to choose the placement of search points to optimize compression efficiency, but the length between search points With the requirement that there is a maximum value for This latter requirement restricts the selection of search points by the encoding process and thus reduces the compression efficiency. However, this reduction in compression efficiency requires that a normal fixed position is required for the search point unless the maximum value for the length between the search points is too small (eg, longer than about 1 second). Is smaller than what is caused. Furthermore, when the maximum value for the length between search points is several seconds, the reduction in compression efficiency is generally very small compared to the case where the search points are completely free.

この実施形態を含む多くの実施形態では、一部のRAPが探索点ではない、すなわち、探索点として選ばれない、2つの連続する探索点の間にあるRAPであるフレームが存在し得る、ということがあり得るが、それは、たとえば、RAPが周囲の探索点に時間的に近すぎるから、または、RAPの前または後の探索点のRAPとの間のメディアデータの量が小さすぎるからである。 In many embodiments, including this embodiment, some RAPs are not search points, i.e. there may be a frame that is not a search point and is a RAP between two consecutive search points. This could be because, for example, the RAP is too close in time to the surrounding search points, or because the amount of media data between the search points before or after the RAP is too small .

メディアプレゼンテーションのすべての他のバージョン内の探索点の配置は、最初の(たとえば、最高のメディアデータレートの)バージョンの探索点と同じになるように制約され得る。これは、エンコーダに探索点の自由な選択を許容する場合と比較して、これらの他のバージョンに対する圧縮効率を下げる。 The placement of search points within all other versions of the media presentation may be constrained to be the same as the search points of the first (eg, highest media data rate) version. This reduces the compression efficiency for these other versions compared to allowing the encoder to freely select search points.

探索点の使用は通常、フレームが独立に復号可能であることを必要とし、これは一般に、そのフレームの圧縮効率を低くする。独立に復号可能であることを要求されないフレームは、他のフレーム中のデータに関して符号化されてよく、これは一般に、符号化されるべきフレームと基準フレームとの共通性の量に応じた量だけ、そのフレームの圧縮効率を上げる。探索点の配置の効率的な選択は、前のフレームとの共通性が低いフレームを、探索点フレームとして優先的に選ぶので、独立に復号可能な方法でフレームを符号化することによって引き起こされる圧縮効率の低下を最小限にする。 The use of search points usually requires that the frame be independently decodable, which generally reduces the compression efficiency of that frame. Frames that are not required to be independently decodable may be encoded with respect to data in other frames, generally in an amount that depends on the amount of commonality between the frame to be encoded and the reference frame , Increase the compression efficiency of that frame. Efficient selection of search point placement is preferentially selected as search point frames for frames that have low commonality with previous frames, so compression caused by encoding the frames in an independently decodable manner Minimize efficiency loss.

しかしながら、あるフレームと可能性のある基準フレームとの共通性のレベルは、コンテンツの異なる表現と高く相関付けられ、それは、元のコンテンツが同一であるからである。結果として、第1の変形における探索点と同じ位置にあるように、他の変形における探索点を制約することは、圧縮効率に大きな差を生まない。 However, the level of commonality between a frame and a possible reference frame is highly correlated with different representations of content because the original content is the same. As a result, constraining search points in other deformations so that they are in the same position as the search points in the first deformation does not make a large difference in compression efficiency.

探索点の構造は好ましくは、ブロック構造を決定するために使用される。好ましくは、各探索点はブロックの始まりを決定し、2つの連続する探索点の間のデータを包含する1つまたは複数のブロックが存在し得る。良好な圧縮を伴う符号化のために、探索点と探索点の間の長さは固定されないので、すべてのブロックが同じ再生期間を有することは要求されない。いくつかの実施形態では、ブロックは、コンテンツのバージョンの間で揃えられ、すなわち、コンテンツの1つのバージョンにおいてフレームの特定のグループにまたがるブロックがある場合、コンテンツの別のバージョンにおいてフレームの同じグループにまたがるブロックがある。コンテンツの所与のバージョンに対するブロックは重複せず、コンテンツの各々のフレームは各バージョンのちょうど1つのブロックの中に含まれる。 The structure of the search points is preferably used to determine the block structure. Preferably, each search point determines the beginning of the block and there may be one or more blocks that contain data between two consecutive search points. For coding with good compression, the length between search points is not fixed, so it is not required that all blocks have the same playback period. In some embodiments, the blocks are aligned between versions of the content, i.e. if there is a block that spans a particular group of frames in one version of the content, it is in the same group of frames in another version of the content. There are blocks that span. The blocks for a given version of content do not overlap, and each frame of content is contained within exactly one block of each version.

探索点と探索点の間の可変の長さ、およびしたがって可変の長さのGoPの効率的な使用を可能にする可能化機構は、セグメントに含まれ得る、または他の手段によってクライアントに提供され得る、セグメントインデクシングまたはセグメントマップであり、すなわち、これは、プレゼンテーションの各ブロックの開始時間および継続時間を含む、提供され得るこのプレゼンテーション中のこのセグメントと関連付けられるメタデータである。クライアントは、プレゼンテーションがセグメント内にある特定の点で開始することをユーザが要求した場合、プレゼンテーションが開始すべきブロックを決定するときにこのセグメントインデクシングデータを使用することができる。そのようなメタデータが提供されない場合、プレゼンテーションは、コンテンツの始めにおいてのみ開始することができ、または、ランダムに、もしくは、所望の点に近い近似的な点において(たとえば、要求された開始点(時間的な)を平均のブロック期間で割って開始ブロックのインデックスを与えることによって、開始ブロックを選ぶことによって)開始することができる。 A variable length between search points, and thus an enabling mechanism that allows efficient use of variable length GoPs can be included in the segment or provided to the client by other means Segment indexing or segment map to get, i.e. this is the metadata associated with this segment in this presentation that can be provided, including the start time and duration of each block of the presentation. The client can use this segment indexing data when determining which block the presentation should start if the user requests that the presentation start at a particular point in the segment. If no such metadata is provided, the presentation can start only at the beginning of the content, or randomly or at an approximate point close to the desired point (e.g., the requested starting point ( You can start by picking the starting block) by dividing the (temporal) by the average block duration to give the starting block index.

一実施形態では、各ブロックは、別個のファイルとして提供され得る。別の実施形態では、複数の連続的なブロックが単一のファイルへと統合されてセグメントを形成することができる。この第2の実施形態では、各ブロックの開始時間および継続時間と、ブロックが開始するファイル内のバイトオフセットとを含む、各バージョンのメタデータが提供され得る。このメタデータは、初期プロトコル要求に応答して提供されてよく、すなわち、セグメントまたはファイルとは別々に入手可能であってよく、または、たとえばファイルの始まりにおいて、同じファイルまたはセグメント内にブロック自体として含まれ得る。当業者には明らかなように、このメタデータは、メタデータをクライアントにトランスポートするのに必要とされるネットワークリソースを減らすために、gzipまたはデルタ符号化などの圧縮された形式でまたはバイナリ形式で符号化されてよい。 In one embodiment, each block may be provided as a separate file. In another embodiment, multiple consecutive blocks can be combined into a single file to form a segment. In this second embodiment, each version of the metadata may be provided, including the start time and duration of each block and the byte offset within the file where the block starts. This metadata may be provided in response to the initial protocol request, i.e. may be available separately from the segment or file, or as a block itself within the same file or segment, for example at the beginning of the file. May be included. As will be apparent to those skilled in the art, this metadata is in a compressed or binary format, such as gzip or delta encoding, to reduce the network resources required to transport the metadata to the client. May be encoded.

図6は、ブロックが可変サイズであるセグメントインデクシングの例を示し、ブロックの範囲は、部分的なGoP、すなわち、1つのRAPと次のRAPとの間の部分的な量のメディアデータである。この例では、探索点は、RAPインジケータによって示され、1というRAPインジケータ値は、ブロックがRAPまたは探索点で開始すること、またはそれを含むことを示し、0というRAPインジケータは、ブロックがRAPまたは探索点を含まないことを示す。この例では、最初の3個のブロック、すなわち、バイト0〜157,033は第1のGoPを含み、これは、プレゼンテーション継続時間が1.623秒であり、プレゼンテーション時間はコンテンツの中で20秒から21.623秒まで続く。この例では、最初の3個のブロックのうちの最初のブロックは、0.485秒というプレゼンテーション時間を含み、セグメント中のメディアデータの最初の50,245バイトを含む。この例では、ブロック4、5、および6は第2のGoPを含み、ブロック7および8は第3のGoPを含み、ブロック9、10、および11は第4のGoPを含む。探索点として指定されないメディアデータ中に他のRAPが存在することがあるので、セグメントマップ中でRAPとしてシグナリングされないことに留意されたい。 FIG. 6 shows an example of segment indexing where the block is variable size, where the range of the block is a partial GoP, ie a partial amount of media data between one RAP and the next. In this example, the search point is indicated by a RAP indicator, a RAP indicator value of 1 indicates that the block starts at or contains a RAP or search point, and a RAP indicator of 0 indicates that the block is RAP or Indicates that the search point is not included. In this example, the first three blocks, bytes 0-157,033, contain the first GoP, which has a presentation duration of 1.623 seconds, and the presentation time is 20 to 21.623 seconds in the content Continue. In this example, the first of the first three blocks includes a presentation time of 0.485 seconds and includes the first 50,245 bytes of media data in the segment. In this example, blocks 4, 5, and 6 include the second GoP, blocks 7 and 8 include the third GoP, and blocks 9, 10, and 11 include the fourth GoP. Note that other RAPs may be present in the media data that are not specified as search points, so they are not signaled as RAPs in the segment map.

図6を再び参照すると、クライアントまたは受信機がメディアプレゼンテーションの中で約22秒の時間オフセットで開始するコンテンツにアクセスすることを望む場合、クライアントは、後でより詳細に説明されるMPDのような、他の情報をまず使用して、関連するメディアデータがこのセグメント内にあるとまず判定することができる。クライアントは、セグメントの第1の部分をダウンロードして、セグメントインデクシングを取得することができ、セグメントインデクシングはこの場合、たとえばHTTPバイト範囲要求を使用すると、わずか数バイトである。セグメントインデクシングを使用して、クライアントは、クライアントがダウンロードすべき最初のブロックが、最大でも22秒の時間オフセットを伴いRAPで開始する、すなわち探索点である最初のブロックであると判定することができる。この例では、ブロック5が22秒よりも小さい時間オフセットを有し、すなわち、ブロック5の時間オフセットは21.965秒であるが、セグメントインデクシングは、ブロック5がRAPで開始しないことを示すので、代わりに、セグメントインデクシングに基づいて、クライアントは、ブロック4をダウンロードすることを選択する。それは、ブロック4の開始時間が最大でも22秒であり、すなわち、ブロック4の時間オフセットが21.623秒でありRAPで開始するからである。したがって、セグメントインデクシングに基づいて、クライアントは、バイトオフセット157,034において開始するHTTP範囲要求を行う。 Referring back to FIG. 6, if the client or receiver wants to access content that starts at a time offset of about 22 seconds in the media presentation, the client will resemble an MPD like that described in more detail later. Other information can first be used to first determine that the associated media data is in this segment. The client can download the first part of the segment to obtain the segment indexing, which is only a few bytes in this case, for example using an HTTP byte range request. Using segment indexing, the client can determine that the first block that the client should download is the first block that starts at the RAP with a time offset of at most 22 seconds, ie, is the search point. . In this example, block 5 has a time offset that is less than 22 seconds, i.e. the block 5 time offset is 21.965 seconds, but segment indexing indicates that block 5 does not start with a RAP, instead Based on the segment indexing, the client chooses to download block 4. That is because the start time of block 4 is at most 22 seconds, that is, the time offset of block 4 is 21.623 seconds and starts with RAP. Thus, based on segment indexing, the client makes an HTTP range request starting at byte offset 157,034.

セグメントインデクシングが利用可能ではなければ、クライアントはこのデータをダウンロードする前にすべての前の157,034バイトのデータをダウンロードしなければならなかった可能性があり、はるかに長い始動時間またはチャネルザッピング時間、および有用ではないデータの無駄なダウンロードにつながる。あるいは、セグメントインデクシングが利用可能ではなければ、クライアントは、所望のデータがセグメント内で開始する箇所を概算することができるが、概算は不正確であることがあり、クライアントは適切な時間を逃して、そして手戻りが必要になることがあり、これはやはり始動遅延を増やす。 If segment indexing is not available, the client may have had to download all previous 157,034 bytes of data before downloading this data, much longer startup time or channel zapping time, and This leads to wasted downloads of unuseful data. Alternatively, if segment indexing is not available, the client can approximate where the desired data begins in the segment, but the estimation may be inaccurate and the client misses the appropriate time. And rework may be required, which again increases the start-up delay.

一般に、各ブロックは、前のブロックと一緒にメディアプレーヤーによって再生され得る、メディアデータの部分を包含する。したがって、セグメント内に含まれるかまたは他の手段を通じてクライアントに提供されるかのいずれかである、ブロッキング構造および、セグメントインデクシングブロッキング構造のクライアントへのシグナリングは、高速なチャネルザッピングと、ネットワークの変動および途絶に直面したときのシームレスな再生とを行うためのクライアントの能力を大きく改善することができる。セグメントインデクシングによって可能にされるような、可変の長さのブロックおよびGoPの部分のみを包含するブロックのサポートは、ストリーミング体験を大きく改善することができる。たとえば、図6と、プレゼンテーションの中の約22秒においてクライアントが再生を開始することを望む上の例とを再び参照すると、クライアントは、1つまたは複数の要求を通じて、ブロック4内のデータを要求し、次いでこれを、再生の開始が可能になるとすぐにメディアプレーヤーに与えることができる。したがって、この例では、再生は、ブロック4の42,011バイトがクライアントにおいて受信されるとすぐに開始し、したがって、高速なチャネルザッピング時間を可能にする。代わりに、クライアントが、再生が開始しようとする前にGoP全体を要求することが必要であったとすると、これは144,211バイトのデータなので、チャネルザッピング時間はより長くなったであろう。 In general, each block contains a portion of media data that can be played by the media player along with the previous block. Thus, signaling to clients in a blocking structure and segment indexing blocking structure, either contained within a segment or provided to the client through other means, provides fast channel zapping, network fluctuations and This greatly improves the client's ability to perform seamless playback in the face of disruption. Support for variable length blocks and blocks that contain only the GoP portion, as enabled by segment indexing, can greatly improve the streaming experience. For example, referring again to FIG. 6 and the above example where the client wishes to begin playback at approximately 22 seconds in the presentation, the client requests the data in block 4 through one or more requests. This can then be given to the media player as soon as playback can begin. Thus, in this example, playback begins as soon as 42,011 bytes of block 4 are received at the client, thus allowing fast channel zapping time. Instead, if the client needed to request the entire GoP before playback was about to begin, the channel zapping time would have been longer because this was 144,211 bytes of data.

他の実施形態では、RAPまたは探索点はまた、ブロックの中央部に存在することがあり、そのRAPまたは探索点がブロックまたはフラグメント内のどこにあるかを示す、セグメントインデクシング中のデータがあり得る。他の実施形態では、時間オフセットは、ブロック内の最初のフレームのプレゼンテーション時間の代わりに、ブロック内の最初のフレームの復号時間であり得る。 In other embodiments, the RAP or search point may also be in the middle of the block, and there may be data during segment indexing that indicates where the RAP or search point is within the block or fragment. In other embodiments, the time offset may be the decoding time of the first frame in the block instead of the presentation time of the first frame in the block.

図8(a)および図8(b)は、複数のバージョンまたは表現にわたって揃えられた探索点構造の可変ブロックサイジングの例を示し、図8(a)は、複数のバージョンのメディアストリームにわたって揃えられた探索点を伴う可変ブロックサイジングを示し、一方図8(b)は、複数のバージョンのメディアストリームにわたって揃えられない探索点を伴う可変ブロックサイジングを示す。 Figures 8 (a) and 8 (b) show examples of variable block sizing of search point structures aligned across multiple versions or representations, and Figure 8 (a) is aligned across multiple versions of a media stream. FIG. 8 (b) illustrates variable block sizing with search points that are not aligned across multiple versions of the media stream.

時間が秒単位で上部に示されており、2つの表現に対する2つのセグメントのブロックおよび探索点が、この時間軸に対するそれらのタイミングに関して、左から右へと示されるので、示される各ブロックの長さは、再生時間に比例し、ブロック中のバイトの数には比例しない。この例では、2つの表現の両方のセグメントに対するセグメントインデクシングは、探索点に対して同じ時間オフセットを有するが、場合によっては、探索点のと探索点の間に異なる数のブロックまたはフラグメントを有し、各ブロック中のメディアデータの異なる量が原因で、ブロックに対する異なるバイトオフセットを有する。この例では、クライアントが約23秒のプレゼンテーション時間で表現1から表現2に切り替えることを望む場合、クライアントは、表現1に対するセグメント中のブロック1.2まで要求し、ブロック2.2で開始する表現2に対するセグメントの要求を開始することができるので、表現1の中の探索点1.2と同時期のプレゼンテーションで切り替えが発生し、これは、表現2の中の探索点2.2と同時である。 The time is shown at the top in seconds and the two segment blocks and search points for the two representations are shown from left to right with respect to their timing relative to this time axis, so the length of each block shown The size is proportional to the playback time, not the number of bytes in the block. In this example, segment indexing for both segments of the two representations has the same time offset with respect to the search point, but in some cases has a different number of blocks or fragments between the search point and the search point. , Due to different amounts of media data in each block, have different byte offsets for the block. In this example, if the client wants to switch from representation 1 to representation 2 with a presentation time of about 23 seconds, the client will request up to block 1.2 in the segment for representation 1 and the segment for representation 2 starting at block 2.2. Since the request can be initiated, a switch occurs in the presentation at the same time as search point 1.2 in representation 1, which is simultaneous with search point 2.2 in representation 2.

前述のことから明らかなように、説明されるブロック要求ストリームシステムは、コンテンツ内の特定の位置に探索点を配置するようにビデオ符号化を制約せず、このことは、既存のシステムの問題の1つを軽減する。 As is apparent from the foregoing, the block request stream system described does not constrain video coding to place search points at specific locations within the content, which is a problem with existing systems. Reduce one.

上で説明された実施形態では、同じコンテンツプレゼンテーションの様々な表現に対する探索点が揃えられるように編成される。しかしながら、多くの場合、この整列の要件を緩和することが好ましい。たとえば、探索点が揃えられた表現を生成する能力を有さない符号化ツールが、表現を生成するために使用されていることがある。別の例として、コンテンツプレゼンテーションは、異なる表現の間の探索点の整列を伴わずに、異なる表現へと独立に符号化され得る。別の例として、表現は、レートがより低いと、かつ切り替えられなければならないことが頻繁であると、または、早送りまたは巻き戻しまたは高速探索のようなトリックモードをサポートするための探索点をより頻繁に含むと、より多くの探索点を含み得る。したがって、コンテンツプレゼンテーションに対する様々な表現にわたって揃えられていない探索点の効率的およびシームレスな処理をブロック要求ストリーミングシステムが行うことを可能にする方法を提供することが望ましい。 In the embodiment described above, the search points for different representations of the same content presentation are organized. However, in many cases it is preferable to relax this alignment requirement. For example, an encoding tool that does not have the ability to generate an expression with aligned search points may be used to generate the expression. As another example, content presentations can be independently encoded into different representations without alignment of search points between the different representations. As another example, the representation may be at a lower rate and more often to be switched, or more search points to support trick modes such as fast forward or rewind or fast search. If included frequently, more search points may be included. Accordingly, it would be desirable to provide a method that allows a block request streaming system to efficiently and seamlessly process search points that are not aligned across various representations for a content presentation.

この実施形態では、複数の表現にわたる探索点の配置は揃わなくてよい。ブロックは、新たなブロックが各探索点で開始するように構築されるので、表現の異なるバージョンのブロック間の整列はなくてよい。異なる表現の間のそのような揃えられない探索点構造の例が、図8(b)に示される。時間が秒単位で上部に示されており、2つの表現に対する2つのセグメントのブロックおよび探索点が、この時間軸に対するそれらのタイミングに関して、左から右へと示されるので、示される各ブロックの長さは、再生時間に比例し、ブロック中のバイトの数には比例しない。この例では、2つの表現の両方のセグメントに対するセグメントインデクシングは、場合によっては、探索点に対して異なる時間オフセットを有し、また、場合によっては、探索点と探索点の間に異なる数のブロックまたはフラグメントを有し、各ブロック中のメディアデータの異なる量が原因で、ブロックに対する異なるバイトオフセットを有する。この例では、約25秒のプレゼンテーション時間においてクライアントが表現1から表現2に切り替えることを望む場合、クライアントは、表現1に対するセグメント中のブロック1.3まで要求し、ブロック2.3で開始する表現2に対するセグメントの要求を開始することができるので、表現1の中のブロック1.3の再生の最中にある、表現2の中の探索点2.3と同時期のプレゼンテーションにおいて切り替えが発生し、したがって、ブロック1.2に対するメディアの一部は再生されない(しかし、再生されないブロック1.3のフレームに対するメディアデータは、再生されるブロック1.3の他のフレームを復号するための受信機バッファへとロードされなければならないことがある)。 In this embodiment, the arrangement of search points over a plurality of expressions need not be uniform. Since the blocks are constructed such that a new block starts at each search point, there may be no alignment between blocks of different representations. An example of such an unaligned search point structure between different representations is shown in FIG. 8 (b). The time is shown at the top in seconds and the two segment blocks and search points for the two representations are shown from left to right with respect to their timing relative to this time axis, so the length of each block shown The size is proportional to the playback time, not the number of bytes in the block. In this example, segment indexing for both segments of the two representations sometimes has different time offsets relative to the search point, and in some cases, a different number of blocks between the search points. Or have fragments and have different byte offsets for the blocks due to different amounts of media data in each block. In this example, if the client wants to switch from representation 1 to representation 2 at a presentation time of about 25 seconds, the client will request up to block 1.3 in the segment for representation 1 and the segment for representation 2 starting at block 2.3. Since the request can be initiated, a switch occurs in the presentation at the same time as the search point 2.3 in representation 2 during playback of block 1.3 in representation 1, and thus the media for block 1.2 Some are not played (but media data for frames in block 1.3 that are not played may have to be loaded into a receiver buffer to decode other frames in block 1.3 that are played).

この実施形態では、以前に選択されたバージョンとは異なる表現からのブロックを選択することを要求されたときは常に、その最初のフレームが、最後に選択されたブロックの最後のフレームの後続のフレームよりも遅くない、最遅のブロックが選ばれるように、ブロック選択器123の動作が修正され得る。 In this embodiment, whenever it is requested to select a block from a different representation than the previously selected version, its first frame is a frame that follows the last frame of the last selected block. The operation of the block selector 123 can be modified so that the latest, not later, block is selected.

この最後に説明された実施形態は、最初のバージョン以外のバージョン内へと探索点の配置を制約するという要件を取り除くことができるので、これらのバージョンに対する圧縮効率が上がり、所与の利用可能な帯域幅に対するより高品質なプレゼンテーションと、改善されたユーザ体験とをもたらす。さらなる考慮事項は、コンテンツの複数の符号化物(バージョン)にわたる探索点の整列という機能を実行するビデオ符号化ツールが、広く利用可能ではないことがあるということであり、したがって、この最後に説明された実施形態の利点は、現在利用可能なビデオ符号化ツールが使用され得るというものである。別の利点は、異なるバージョンのコンテンツの符号化が、異なるバージョンに対する符号化プロセス間の調整を何ら必要とすることなく、並列に進行し得るということである。別の利点は、追加のバージョンのコンテンツが、特定の探索点の位置のリストを符号化ツールに提供する必要なく、より後の時点で符号化されプレゼンテーションに追加され得るということである。 This last described embodiment can remove the requirement of constraining the placement of search points into versions other than the first version, thus increasing the compression efficiency for these versions and the given available Provides higher quality presentations for bandwidth and improved user experience. A further consideration is that video encoding tools that perform the function of alignment of search points across multiple encodings (versions) of content may not be widely available and are therefore discussed at this end. An advantage of this embodiment is that currently available video encoding tools can be used. Another advantage is that the encoding of different versions of content can proceed in parallel without requiring any coordination between the encoding processes for the different versions. Another advantage is that additional versions of content can be encoded and added to the presentation at a later point in time without having to provide the encoding tool with a list of specific search point locations.

一般に、ピクチャがピクチャのグループ(GoP)として符号化される場合、シーケンス中の最初のピクチャは探索点であり得るが、常にそうである必要はない。 In general, if a picture is encoded as a group of pictures (GoP), the first picture in the sequence may be a search point, but this need not always be the case.

最適なブロック区分
ブロック要求ストリーミングシステムにおける関心事である1つの問題は、符号化されたメディア、たとえばビデオメディアの構造と、ブロック要求に使用されるブロック構造との相互作用である。ビデオ符号化の当業者によく知られているように、各ビデオフレームの符号化された表現に対して必要とされるビットの数が、場合によってはかなり、フレームごとに変動するということが多い。結果として、受信されたデータの量と、そのデータによって符号化されるメディアの継続時間との関係は、単純ではないことがある。さらに、ブロック要求ストリーミングシステム内のブロックへのメディアデータの分割は、複雑さの次元をさらに大きくする。具体的には、いくつかのシステムでは、ブロックのメディアデータは、ブロック全体が受信されるまで再生されないことがあり、たとえば、ブロック内のメディアデータの構成、または抹消符号を使用するブロック内のメディアサンプル間の依存関係が、この性質をもたらし得る。ブロックサイズとブロック継続時間とのこれらの複雑な相互作用、および、再生を開始する前にブロック全体を受信する潜在的な必要性の結果として、再生が開始する前にメディアデータがバッファリングされる、保守的な手法をクライアントシステムが採用することが一般的である。そのようなバッファリングは、長いチャネルザッピング時間、したがって悪いユーザ体験をもたらす。 Optimal block partitioning One issue of concern in block request streaming systems is the interaction between the structure of the encoded media, eg video media, and the block structure used for the block request. As is well known to those skilled in video coding, the number of bits required for the coded representation of each video frame often varies considerably from frame to frame in some cases. . As a result, the relationship between the amount of data received and the duration of the media encoded by that data may not be simple. Furthermore, the division of media data into blocks within the block request streaming system further increases the complexity dimension. Specifically, in some systems, media data in a block may not be played until the entire block is received, for example, the composition of media data in the block, or media in a block that uses an erasure code Dependencies between samples can provide this property. As a result of these complex interactions of block size and block duration and the potential need to receive the entire block before starting playback, the media data is buffered before playback begins Generally, a client system adopts a conservative method. Such buffering results in long channel zapping times and thus a bad user experience.

Pakzadは、データストリームの背後にある構造に基づいて連続的なブロックへとデータストリームをどのように区分するかを決定するための、新しく効率的な方法である「ブロック区分方法」について説明しており、さらに、ストリーミングシステムの状況において、これらの方法のいくつかの利点について説明している。Pakzadのブロック区分方法をブロック要求ストリーミングシステムに適用するための、本発明のさらなる実施形態がここで説明される。この方法は、メディアデータの任意の所与の要素(たとえば、ビデオフレームまたはオーディオサンプル)の再生時間が、任意の隣接するメディアデータ要素の再生時間とは与えられた閾値未満の分だけ異なるように、概略的なプレゼンテーション時間の順序で提示されるようにメディアデータを並べるステップを含み得る。そのように順序付けられたメディアデータは、Pakzadの言葉遣いにおけるデータストリームであると見なされてよく、このデータストリームに適用されるPakzadの方法のいずれも、データストリームによってブロック境界を特定する。隣接するブロック境界の任意のペアの間のデータは、本開示の言葉遣いでは「ブロック」であると見なされ、本開示の方法は、ブロック要求ストリーミングシステム内のメディアデータのプレゼンテーションを提供するために適用される。本開示を読んだ当業者に明らかなように、Pakzadにおいて開示される方法のいくつかの利点が次いで、ブロック要求ストリーミングシステムの状況で実現され得る。 Pakzad describes a new and efficient way to determine how to partition a data stream into continuous blocks based on the structure behind the data stream, the “block partitioning method” And further describes some of the advantages of these methods in the context of streaming systems. Further embodiments of the present invention will now be described for applying Pakzad's block partitioning method to block request streaming systems. This method ensures that the playback time of any given element of media data (e.g., a video frame or audio sample) differs from the playback time of any adjacent media data element by less than a given threshold. , Arranging the media data to be presented in a general presentation time order. Such ordered media data may be considered as a data stream in Pakzad wording, and any of the Pakzad methods applied to this data stream identifies block boundaries by the data stream. Data between any pair of adjacent block boundaries is considered a “block” in the language of this disclosure, and the disclosed method is intended to provide a presentation of media data in a block request streaming system. Applied. As will be apparent to those skilled in the art having read this disclosure, several advantages of the method disclosed in Pakzad can then be realized in the context of a block request streaming system.

Pakzadにおいて説明されるように、部分的なGoPまたはGoPよりも多くの部分を包含するブロックを含む、セグメントのブロック構造の決定は、高速なチャネルザッピング時間を可能にするためのクライアントの能力に影響を与え得る。Pakzadでは、目標の始動時間が与えられた場合に、クライアントが任意の探索点における表現のダウンロードを開始し、目標の始動時間が経過した後に再生を開始すると、時間的な各点において、クライアントがダウンロードしたデータの量が、少なくとも目標のダウンロードレートとダウンロードの開始から経過した時間との積である限り、再生がシームレスに継続することを確実にする、ブロック構造と目標のダウンロードレートとを提供する。クライアントが目標の始動時間および目標のダウンロードレートへのアクセス権を有することが有利であり、それは、このことが、時間的に最早の点でいつ表現の再生を開始するかを決定するための手段をクライアントに与え、上で説明された条件をダウンロードが満たす限り表現の再生をクライアントが続けることを可能にするからである。したがって、後で説明される方法は、目標の始動時間および目標のダウンロードレートをMedia Presentation Descriptionに含めるための手段を提供するので、上で説明された目的のために使用され得る。 As described in Pakzad, the determination of the block structure of a segment, including blocks that contain partial GoPs or more parts than GoP, affects the client's ability to enable fast channel zapping time Can give. In Pakzad, when a target start time is given, the client starts to download the expression at an arbitrary search point, and when playback starts after the target start time elapses, at each point in time, the client Provide a block structure and target download rate to ensure that playback continues seamless as long as the amount of downloaded data is at least the product of the target download rate and the time elapsed since the start of the download . Advantageously, the client has access to the target start time and target download rate, which is a means for determining when this will start playing the representation at the earliest point in time. Because it allows the client to continue playing the representation as long as the download satisfies the conditions described above. Thus, the methods described later can be used for the purposes described above as they provide a means for including the target start time and target download rate in the Media Presentation Description.

メディアプレゼンテーションデータのモデル
図5は、セグメントおよびmedia presentation description(「MPD」)ファイル、ならびに、MPDファイル内のセグメントの内訳、タイミング、および他の構造を含む、図1に示されるコンテンツ記憶装置のあり得る構造を示す。MPD構造またはファイルの可能な実装形態の詳細が、ここで説明される。多くの例において、MPDはファイルとして説明されるが、非ファイル構造も使用され得る。 Media Presentation Data Model Figure 5 shows the content storage device shown in Figure 1, including segments and media presentation description ("MPD") files, as well as the breakdown, timing, and other structure of the segments in the MPD file The resulting structure is shown. Details of possible implementations of MPD structures or files will now be described. In many examples, the MPD is described as a file, but non-file structures can also be used.

図に示されるように、コンテンツ記憶装置110は、複数のソースセグメント510、MPD 500、および修復セグメント512を保持する。MPDは、期間記録501を含んでよく、期間記録501が今度は、初期化セグメント504およびメディアセグメント505への参照のようなセグメント情報503を含む、表現記録502を含んでよい。 As shown in the figure, the content storage device 110 holds a plurality of source segments 510, an MPD 500, and a repair segment 512. The MPD may include a period record 501, which in turn may include a representation record 502 that includes segment information 503, such as a reference to an initialization segment 504 and a media segment 505.

図9(a)は例示的なメタデータテーブル900を示し、一方、図9(b)はHTTPストリーミングクライアント902がHTTPストリーミングサーバ906への接続を通じてどのようにメタデータテーブル900およびメディアブロック904を取得するかの例を示す。 FIG. 9 (a) shows an exemplary metadata table 900, while FIG. 9 (b) shows how HTTP streaming client 902 obtains metadata table 900 and media block 904 through a connection to HTTP streaming server 906. Here is an example of how to do it.

本明細書で説明される方法では、クライアントが利用可能なメディアプレゼンテーションの表現に関する情報を含む、「Media Presentation Description」が提供される。表現は、クライアントが異なる代替物の中から1つを選択するという意味で代替物であることがあり、または、表現は、クライアントが、各々が場合によっては代替物のセットからのものである、表現のいくつかを選択し、それらを一緒に提示するという意味で相補的であることがある。表現は、有利にはグループへと割り当てられてよく、クライアントは、1つのグループ中の表現について、それらの表現が各々互いに代替物であることを理解するようにプログラムまたは構成され、一方、異なるグループからの表現は、2つ以上の表現が一緒に提示されるようにされる。言い換えると、グループ中の2つ以上の表現がある場合、クライアントはそのグループから1つの表現を選び、次のグループから1つの表現を選ぶなどして、1つの表現を形成する。 In the method described herein, a “Media Presentation Description” is provided that includes information about the representation of the media presentation that is available to the client. The representation may be an alternative in the sense that the client chooses one of the different alternatives, or the representation is from a set of alternatives, each possibly by the client, May be complementary in the sense that some of the representations are selected and presented together. Expressions may advantageously be assigned to groups, and clients may be programmed or configured to understand that the expressions in one group are each alternative to each other, while different groups The expression from is such that two or more expressions are presented together. In other words, if there are two or more expressions in a group, the client selects one expression from the group, selects one expression from the next group, etc. to form one expression.

表現を記述する情報は、有利には、表現、ビデオフレームレート、ビデオ解像度、およびデータレートを復号するために必要とされるコーデックのプロファイルおよびレベルを含む、適用されたメディアコーデックの詳細情報を含み得る。Media Presentation Descriptionを受信するクライアントは、この情報を使用して、表現が復号または提示に適しているかどうかを事前に判定することができる。区別する情報が表現のバイナリデータのみに含まれる場合、すべての表現からのバイナリデータを要求し、適格性についての情報を発見するために関連する情報を解析し抽出することが必要であるので、上記のことは利点となる。これらの複数の要求およびデータの解析および抽出には、ある時間がかかることがあり、これは、長い始動時間を、したがって悪いユーザ体験をもたらす。 The information describing the representation advantageously includes details of the applied media codec, including the codec profile and level required to decode the representation, video frame rate, video resolution, and data rate. obtain. A client receiving a Media Presentation Description can use this information to determine in advance whether the representation is suitable for decoding or presentation. If the information to be distinguished is only included in the binary data of the representation, it is necessary to request binary data from all representations and analyze and extract the relevant information to find information about eligibility, The above is an advantage. Analysis and extraction of these multiple requests and data can take some time, which results in long start-up times and thus a bad user experience.

加えて、Media Presentation Descriptionは、時刻に基づいてクライアント要求を制約する情報を含み得る。たとえば、ライブサービスに対しては、クライアントは、「現在のブロードキャスト時間」に近いプレゼンテーションの要求する部分へと制約され得る。ライブブロードキャストでは、現在のブロードキャスト時間よりも前に、与えられた閾値を超えてブロードキャストされた、コンテンツに対するサービングインフラストラクチャからのデータを除去することが望ましいことがあるので、上記のことは利点となる。これは、サービングインフラストラクチャ内での記憶リソースの再使用のために望ましいことがある。これはまた、提供されるサービスのタイプに応じて望ましいことがあり、たとえば、いくつかの場合には、プレゼンテーションは、受信するクライアントデバイスの何らかの加入モデルが原因で、ライブのみで利用可能にされることがあり、一方、他のメディアプレゼンテーションは、ライブおよびオンデマンドで利用可能にされてよく、他のプレゼンテーションは、第1のクラスのクライアントデバイスに対してはライブのみで、第2のクラスのクライアントデバイスに対してはオンデマンドのみで、第3のクラスのクライアントデバイスに対してはライブまたはオンデマンドのいずれかの組合せで利用可能にされ得る。メディアプレゼンテーションデータのモデル(以下の)で説明される方法は、クライアントが、サービングインフラストラクチャにおいて利用可能ではないことがあるデータについて、ユーザに対する要求を行いかつオファーを調整することを避けられるように、上記のような方針をクライアントが知らされることを可能にする。ある代替形態として、たとえば、クライアントは、このデータが利用可能ではないという通知をユーザに提示することができる。 In addition, the Media Presentation Description may include information that restricts client requests based on time. For example, for a live service, the client may be constrained to the requested portion of the presentation that is close to the “current broadcast time”. This is an advantage because in live broadcasts it may be desirable to remove data from the serving infrastructure for content that was broadcast beyond a given threshold prior to the current broadcast time. . This may be desirable for storage resource reuse within the serving infrastructure. This may also be desirable depending on the type of service provided, for example, in some cases the presentation is made available only live due to some subscription model of the receiving client device. While other media presentations may be made available live and on demand, other presentations are only live for first class client devices and second class clients It can be made available on demand only for devices and in a combination of either live or on demand for a third class of client devices. The method described in the media presentation data model (below) is such that clients can avoid making requests to users and adjusting offers for data that may not be available in the serving infrastructure. Allows the client to be informed of the above policy. As an alternative, for example, the client may present a notification to the user that this data is not available.

本発明のさらなる実施形態では、メディアセグメントは、ISO/IEC 14496-12に記載されるISOベースのメディアファイルフォーマットに、または、派生した仕様(3GPP Technical Specification 26.244に記載される3GPファイルフォーマットなど)に準拠し得る。(上記の)Usage of 3GPP File Formatセクションは、ブロック要求ストリーミングシステム内でのこのファイルフォーマットのデータ構造の効率的な使用を可能にする、ISOベースのメディアファイルフォーマットに対する新規の改善を記載する。この参照文献に記載されるように、メディアプレゼンテーションの時間セグメントとファイル内のバイト範囲との間の高速および効率的な対応付けを可能にする情報が、ファイル内で与えられ得る。メディアデータ自体が、ISO/IEC14496-12において定義されるムービーフラグメントの構築に従って構造化され得る。時間およびバイトのオフセットを提供するこの情報は、階層的に、または情報の単一のブロックとして構造化され得る。この情報は、ファイルの始めに与えられ得る。Usage of 3GPP File Formatセクションに記載されるような効率的な符号化を使用してこの情報を準備することは、ブロック要求ストリーミングシステムによって使用されるファイルダウンロードプロトコルがHTTPである場合に、たとえばHTTP partial GET要求を使用して、クライアントが上記の情報を迅速に取り出すことを可能にして、これは、始動、探索、またはストリーム切替の時間を短くするので、改善されたユーザ体験をもたらす。 In a further embodiment of the invention, the media segment is in an ISO-based media file format as described in ISO / IEC 14496-12 or in a derived specification (such as the 3GP file format described in 3GPP Technical Specification 26.244). Can be compliant. The Usage of 3GPP File Format section (above) describes a new improvement to the ISO-based media file format that allows efficient use of the data structure of this file format within a block request streaming system. As described in this reference, information can be provided in the file that allows a fast and efficient association between the time segment of the media presentation and the byte range in the file. The media data itself can be structured according to the construction of movie fragments as defined in ISO / IEC 14496-12. This information providing time and byte offsets can be structured hierarchically or as a single block of information. This information can be given at the beginning of the file. Preparing this information using efficient encoding as described in the Usage of 3GPP File Format section is useful when the file download protocol used by the block request streaming system is HTTP, for example HTTP partial. A GET request is used to allow the client to quickly retrieve the above information, which shortens startup, search, or stream switching times and thus provides an improved user experience.

メディアプレゼンテーション中の表現は、通常は代替物である複数の表現にわたるシームレスな切り替えを確実にして、2つ以上の表現の同期したプレゼンテーションを確実にするように、グローバルな時間軸で同期される。したがって、適応HTTPストリーミングメディアプレゼンテーション内の表現に含まれるメディアのサンプルのタイミングは、複数のセグメントにわたる連続的なグローバルな時間軸に関連し得る。 The representations in the media presentation are synchronized on a global timeline to ensure a seamless presentation across multiple representations, which are usually alternatives, and to ensure a synchronized presentation of two or more representations. Thus, the timing of media samples included in a representation within an adaptive HTTP streaming media presentation may be related to a continuous global timeline across multiple segments.

複数のタイプのメディア、たとえば、オーディオおよびビデオを含む、符号化されたメディアのブロックは、異なるタイプのメディアに対して、異なるプレゼンテーション終了時間を有し得る。ブロック要求ストリーミングシステムでは、そのようなメディアブロックは、各メディアタイプが継続的に再生されるような方式で連続的に再生され得るので、1つのブロックからの1つのタイプのメディアサンプルは、別のタイプの先行するブロックのメディアサンプルの前に再生されてよく、これは本明細書では「連続ブロック接合」と呼ばれる。代替形態として、そのようなメディアブロックは、1つのブロックの任意のタイプの最早のサンプルが、先行するブロックの任意のタイプの最遅のサンプルより後に再生されるような方式で再生されてよく、これは、本明細書では「不連続ブロック接合」と呼ばれる。連続ブロック接合は、両方のブロックが、順番に符号化された、同じコンテンツアイテムおよび同じ表現からのメディアを含む場合、または他の場合に、適切であり得る。通常、1つの表現の中で、連続ブロック接合は、2つのブロックを接合するときに適用され得る。ブロック境界においてメディアトラックを揃える必要なく、既存の符号化が適用されることが可能でありかつセグメント化が行われることが可能であるので、上記のことは有利である。これは、図10に示されており、ビデオストリーム1000は、RAP 1204のようなRAPを伴う、ブロック1202および他のブロックを含む。 Multiple types of media, eg, blocks of encoded media, including audio and video, may have different presentation end times for different types of media. In a block request streaming system, such media blocks can be played continuously in such a way that each media type is played continuously, so one type of media sample from one block It may be played before a media sample of a type of preceding block, referred to herein as a “continuous block join”. Alternatively, such media blocks may be played in such a way that the earliest sample of any type in one block is played after the earliest sample of any type in the preceding block, This is referred to herein as a “discontinuous block junction”. Continuous block concatenation may be appropriate if both blocks contain media from the same content item and the same representation, encoded in sequence, or otherwise. Usually, within one representation, continuous block joining can be applied when joining two blocks. This is advantageous because existing coding can be applied and segmentation can be performed without having to align the media tracks at the block boundaries. This is illustrated in FIG. 10, where the video stream 1000 includes block 1202 and other blocks with a RAP, such as RAP 1204.

Media Presentation Description
メディアプレゼンテーションは、HTTPストリーミングサーバ上のファイルの構造化された集合体として見られ得る。HTTPストリーミングクライアントは、ストリーミングサービスをユーザに提示するのに十分な情報をダウンロードすることができる。代替的な表現は、3GPPファイルフォーマットに準拠する、または、少なくとも、3GPファイルから、または3GPファイルへ簡単に変換され得るデータ構造の良好に定義されたセットに準拠する、1つもしくは複数の3GPファイルまたは3GPファイルの一部からなり得る。 Media Presentation Description
A media presentation can be viewed as a structured collection of files on an HTTP streaming server. The HTTP streaming client can download enough information to present the streaming service to the user. An alternative representation is one or more 3GP files that conform to the 3GPP file format, or at least conform to a well-defined set of data structures that can be easily converted to or from 3GP files. Or it can consist of part of a 3GP file.

メディアプレゼンテーションは、media presentation descriptionによって記述され得る。Media Presentation Description (MPD)は、適切なファイル要求、たとえばHTTP GET要求を構築して適切な時間にデータにアクセスし、ストリーミングサービスをユーザに提供するためにクライアントが使用できる、メタデータを含み得る。media presentation descriptionは、適切な3GPPファイルおよびファイルの断片を選択するために、HTTPストリーミングクライアントに対して十分な情報を提供することができる。アクセス可能となるようにクライアントにシグナリングされる単位は、セグメントと呼ばれる。 A media presentation can be described by a media presentation description. The Media Presentation Description (MPD) may include metadata that can be used by the client to construct an appropriate file request, eg, an HTTP GET request, to access the data at the appropriate time and provide a streaming service to the user. The media presentation description can provide enough information for the HTTP streaming client to select the appropriate 3GPP file and file fragment. The unit that is signaled to the client to be accessible is called a segment.

media presentation descriptionはとりわけ、次のような要素および属性を含み得る。 The media presentation description can include, among other things, the following elements and attributes:

MediaPresentationDescriptionの要素
ストリーミングサービスをエンドユーザに提供するためにHTTPストリーミングクライアントによって使用されるメタデータをカプセル化する要素。MediaPresentationDescriptionの要素は、次の属性および要素の1つまたは複数を含み得る。 MediaPresentationDescription element An element that encapsulates metadata used by an HTTP streaming client to provide a streaming service to an end user. The MediaPresentationDescription element may include one or more of the following attributes and elements:

Version:拡張性を確実にするためのプロトコルのバージョン数。 Version: Number of protocol versions to ensure extensibility.

PresentationIdentifier:プレゼンテーションが他のプレゼンテーションの中から一意に識別され得るようにする情報。私的なフィールドまたは名称も含み得る。 PresentationIdentifier: Information that allows a presentation to be uniquely identified from other presentations. It may also contain private fields or names.

UpdateFrequency: media presentation descriptionの更新頻度。すなわち、実際のmedia presentation descriptionをクライアントがどの程度頻繁にリロードし得るかである。存在しない場合、メディアプレゼンテーションは不変であり得る。メディアプレゼンテーションを更新することは、メディアプレゼンテーションがキャッシュされ得ないことを意味し得る。 UpdateFrequency: Update frequency of media presentation description. That is, how often the client can reload the actual media presentation description. If not present, the media presentation can be immutable. Updating the media presentation may mean that the media presentation cannot be cached.

MediaPresentationDescriptionURI:media presentation descriptionに日付を付けるためのURI。 MediaPresentationDescriptionURI: URI to date the media presentation description.

Stream:ストリームまたはメディアプレゼンテーションのタイプ、すなわち、ビデオ、オーディオ、またはテキストを記述する。ビデオストリームタイプは、オーディオを含んでよくかつテキストを含んでよい。 Stream: Describes the type of stream or media presentation: video, audio, or text. The video stream type may include audio and text.

Service:追加の属性を伴うサービスタイプを記述する。サービスタイプは、ライブおよびオンデマンドであり得る。これは、何らかの現在の時間を超える探索およびアクセスが許可されないことをクライアントに知らせるために使用され得る。 Service: Describes the service type with additional attributes. Service types can be live and on-demand. This can be used to inform the client that searching and access beyond some current time is not allowed.

MaximumClientPreBufferTime:クライアントがメディアストリームを事前にバッファリングできる最大の時間の量。このタイミングは、クライアントがこの最大の事前バッファリング時間を超えてダウンロードすることを制約される場合、ストリーミングをプログレッシブダウンロードに対して差別化し得る。事前バッファリングに関する制約が適用され得ないことを示す値は、存在しなくてよい。 MaximumClientPreBufferTime: The maximum amount of time that the client can pre-buffer the media stream. This timing can differentiate streaming from progressive download if the client is constrained to download beyond this maximum pre-buffering time. There may not be a value indicating that constraints on pre-buffering cannot be applied.

SafetyGuardIntervalLiveService:サーバにおけるライブサービスの最大のターンアラウンドタイムについての情報。現在の時間において情報のいずれがすでにアクセス可能かを示すものをクライアントに提供する。この情報は、クライアントおよびサーバがUTC時間で動作することが予測され、厳密な時間同期が行われていない場合、必要であり得る。 SafetyGuardIntervalLiveService: Information about the live service's maximum turnaround time on the server. Provide the client with an indication of which information is already accessible at the current time. This information may be necessary if the client and server are expected to operate in UTC time and are not strictly time synchronized.

TimeShiftBufferDepth:クライアントが現在の時間に対してライブサービスをどの程度まで前に戻せるかについての情報。この深度の延長によって、タイムシフト視聴およびキャッチアップサービスが、サービスのプロビジョニングの具体的な変更を伴わずに可能にされ得る。 TimeShiftBufferDepth: Information about how far the live service can be moved back to the client for the current time. This extension of depth may allow time-shifted viewing and catch-up services without specific changes in service provisioning.

LocalCachingPermitted:このフラグは、ダウンロードされたデータが再生された後でHTTPクライアントがそのデータをローカルにキャッシュできるかどうかを示す。 LocalCachingPermitted: This flag indicates whether the HTTP client can cache the data locally after the downloaded data has been played.

LivePresentationInterval:StartTimeおよびEndTimeを規定することによって表現が利用可能であり得る時間間隔を含む。StartTimeは、サービスの開始時間を示し、EndTimeはサービスの終了時間を示す。EndTimeが規定されない場合、終了時間は現時点では未知であり、UpdateFrequencyが、サービスの実際の終了時間の前にクライアントが終了時間に対するアクセス権を得ることを確実にし得る。 LivePresentationInterval: Contains the time interval during which the representation can be used by specifying StartTime and EndTime. StartTime indicates the start time of the service, and EndTime indicates the end time of the service. If EndTime is not specified, the end time is currently unknown, and UpdateFrequency may ensure that the client gets access to the end time before the actual end time of the service.

OnDemandAvailabilityInterval:プレゼンテーション間隔は、ネットワーク上でのサービスの利用可能性を示す。複数のプレゼンテーション間隔が提供され得る。HTTPクライアントは、任意の規定された時間枠の外側でサービスにアクセスすることが不可能であり得る。OnDemand Intervalのプロビジョニングによって、追加のタイムシフト視聴が規定され得る。この属性はまた、ライブサービスのために存在し得る。ライブサービスのために存在する場合、サーバは、すべての与えられた利用可能な間隔の間に、オンデマンドサービスとしてサービスにアクセスすることができる。したがって、LivePresentationIntervalは、いずれのOnDemandAvailabilityIntervalとも重複し得ない。 OnDemandAvailabilityInterval: The presentation interval indicates the availability of the service on the network. Multiple presentation intervals may be provided. HTTP clients may be unable to access services outside any defined time frame. With provision of OnDemand Interval, additional time-shifted viewing can be defined. This attribute may also be present for live services. If present for a live service, the server can access the service as an on-demand service during all given available intervals. Therefore, LivePresentationInterval cannot overlap with any OnDemandAvailabilityInterval.

MPDFileInfoDynamic:メディアプレゼンテーション中のファイルのデフォルトの動的な構造を記述する。さらなる詳細が以下で与えられる。MPDレベルに対するデフォルトの仕様は、いくつかのまたはすべての代替的な表現に対して同じ規則が使用される場合には、不必要な繰返しをなくし得る。 MPDFileInfoDynamic: Describes the default dynamic structure of a file during media presentation. Further details are given below. The default specification for the MPD level may eliminate unnecessary repetition if the same rules are used for some or all alternative representations.

MPDCodecDescription:メディアプレゼンテーション中の主要なデフォルトのコーデックを記述する。さらなる詳細が以下で与えられる。MPDレベルに対するデフォルトの仕様は、いくつかのまたはすべての表現に対して同じコーデックが使用される場合には、不必要な繰返しをなくし得る。 MPDCodecDescription: Describes the main default codec during media presentation. Further details are given below. The default specification for the MPD level may eliminate unnecessary repetition if the same codec is used for some or all representations.

MPDMoveBoxHeaderSizeDoesNotChange:MoveBox Headerがメディアプレゼンテーション全体の中の個々のファイルの間でサイズが変化するかどうかを示すためのフラグ。このフラグは、ダウンロードを最適化するために使用されてよく、特定のセグメントフォーマット、特にセグメントがmoovヘッダを含むフォーマットの場合にのみ存在し得る。 MPDMoveBoxHeaderSizeDoesNotChange: A flag that indicates whether the MoveBox Header changes size between individual files in the entire media presentation. This flag may be used to optimize downloads and may only be present in certain segment formats, particularly if the segment includes a moov header.

FileURIPattern:メディアプレゼンテーション内のファイルに対する要求メッセージを生成するためにクライアントによって使用されるパターン。異なる属性は、メディアプレゼンテーション内のファイルの各々の固有のURIの生成を可能にする。基本URIはHTTP URIであり得る。 FileURIPattern: A pattern used by clients to generate request messages for files in a media presentation. Different attributes allow the generation of a unique URI for each of the files in the media presentation. The base URI can be an HTTP URI.

Alternative Representation:表現のリストを記述する。
Alternative Representationの要素 Alternative Representation: Describes a list of expressions.
Alternative Representation elements

1つの表現に対するすべてのメタデータをカプセル化するXML要素。Alternative Representationの要素は、次の属性および要素を含み得る。 An XML element that encapsulates all metadata for a single representation. Alternative Representation elements may include the following attributes and elements:

RepresentationID:メディアプレゼンテーション内のこの特定のAlternative Representationに対する固有のID。 RepresentationID: A unique ID for this particular Alternative Representation within the media presentation.

FilesInfoStatic:1つの代替的なプレゼンテーションのすべてのファイルの開始時間およびURIの明示的なリストを提供する。ファイルのリストの不変のプロビジョニングは、メディアプレゼンテーションの正確なタイミングの記述という利点をもたらし得るが、代替的な表現が多くのファイルを含む場合には特に、さほど小型ではないことがある。また、ファイル名は任意の名前を有し得る。 FilesInfoStatic: Provides an explicit list of all file start times and URIs in one alternative presentation. Immutable provisioning of a list of files may provide the advantage of describing the exact timing of a media presentation, but may not be as small, especially when alternative representations contain many files. The file name can have any name.

FilesInfoDynamic:1つの代替的なプレゼンテーションの開始時間およびURIのリストを構築するための暗黙的な方法を提供する。ファイルのリストの動的なプロビジョニングは、より小型の表現という利点をもたらし得る。開始時間の順序のみが与えられる場合、タイミングの利点はここでも変わらず保たれるが、ファイル名は、FilePatternURIに基づいて動的に構築されることになる。各セグメントの期間のみが与えられる場合、表現は小型であり、ライブサービス内での使用に適し得るが、ファイルの生成はグローバルなタイミングによって支配され得る。 FilesInfoDynamic: provides an implicit way to build a list of alternative presentation start times and URIs. Dynamic provisioning of a list of files can provide the advantage of a smaller representation. If only the start time order is given, the timing advantage remains the same here, but the file name will be built dynamically based on the FilePatternURI. If only the duration of each segment is given, the representation is small and may be suitable for use within a live service, but file generation may be governed by global timing.

APMoveBoxHeaderSizeDoesNotChange:MoveBox HeaderがAlternative Descriptionの中の個々のファイルの間でサイズが変化するかどうかを示すフラグ。このフラグは、ダウンロードを最適化するために使用されてよく、特定のセグメントフォーマット、特にセグメントがmoovヘッダを含むフォーマットの場合にのみ存在し得る。 APMoveBoxHeaderSizeDoesNotChange: A flag that indicates whether the MoveBox Header changes size between individual files in the Alternative Description. This flag may be used to optimize downloads and may only be present in certain segment formats, particularly if the segment includes a moov header.

APCodecDescription:代替的なプレゼンテーションの中のファイルの主要なコーデックを記述する。 APCodecDescription: Describes the main codec of the file in the alternative presentation.

Media Descriptionの要素
MediaDescription:この表現に含まれるメディアに対するすべてのメタデータをカプセル化し得る要素。具体的には、この代替的なプレゼンテーションの中のトラックについての情報とともに、可能であれば、トラックの推奨されるグルーピングについての情報を含み得る。MediaDescriptionの属性は、次の属性を含む。 Media Description elements
MediaDescription: An element that can encapsulate all metadata for the media contained in this representation. Specifically, it may include information about the recommended groupings of tracks, if possible, along with information about the tracks in this alternative presentation. The attributes of MediaDescription include the following attributes.

TrackDescription:この表現に含まれるメディアに対するすべてのメタデータをカプセル化するXML属性。TrackDescriptionの属性は、次の属性を含む。 TrackDescription: An XML attribute that encapsulates all metadata for the media contained in this representation. The attributes of TrackDescription include the following attributes.

TrackID:代替的な表現の中のトラックの固有のID。これは、トラックがグルーピング記述の一部である場合に使用され得る。 TrackID: The unique ID of the track in the alternative representation. This can be used when the track is part of a grouping description.

Bitrate:トラックのビットレート。 Bitrate: The bit rate of the track.

TrackCodecDescription:このトラックにおいて使用されるコーデックに対するすべての属性を含むXML属性。TrackCodecDescriptionの属性は、次の属性を含む。 TrackCodecDescription: An XML attribute that contains all the attributes for the codec used in this track. The attributes of TrackCodecDescription include the following attributes.

MediaName:メディアタイプを定義する属性。メディアタイプは、「オーディオ」、「ビデオ」、「テキスト」、「アプリケーション」、および「メッセージ」を含む。 MediaName: An attribute that defines the media type. Media types include “audio”, “video”, “text”, “application”, and “message”.

Codec:プロファイルおよびレベルを含むコーデックタイプ。 Codec: Codec type including profile and level.

LanguageTag:適用可能な場合の言語タグ。 LanguageTag: Language tag when applicable.

MaxWidth,MaxHeight:ビデオに対する、含まれるビデオのピクセル単位の高さおよび幅。 MaxWidth, MaxHeight: The height and width of the included video in pixels relative to the video.

SamplingRate:オーディオに対するサンプリングレート。 SamplingRate: Sampling rate for audio.

GroupDescription:異なるパラメータに基づく適切なグルーピングのために推奨をクライアントに提供する属性。 GroupDescription: An attribute that provides recommendations to clients for proper grouping based on different parameters.

GroupType:クライアントがそれに基づいてどのようにトラックをグルーピングするかを決めることができるタイプ。 GroupType: A type that allows the client to decide how to group tracks based on it.

media presentation descriptionの中の情報は、有利には、ファイル/セグメントまたはその一部に対する要求を適切な時間に実行するために、HTTPストリーミングクライアントによって使用され、たとえば、アクセス帯域幅、表示能力、コーデック能力などとともに、言語などのユーザの選好に関して、その能力と一致する適切な表現からセグメントを選択する。さらに、Media Presentation descriptionは、グローバルなタイムラインに対して時間的に揃えられ対応付けられる表現を記述するので、クライアントはまた、表現を切り替えるための適切な動作を開始するための進行中のメディアプレゼンテーションの間にMPD中の情報を使用して、表現を一緒に提示し、またはメディアプレゼンテーション内を探索することができる。 The information in the media presentation description is advantageously used by HTTP streaming clients to execute requests for files / segments or parts thereof at the appropriate time, eg access bandwidth, display capabilities, codec capabilities And so on, with respect to user preferences such as language, the segment is selected from an appropriate expression that matches its ability. In addition, since the Media Presentation description describes an expression that is aligned and mapped in time to a global timeline, the client can also initiate an ongoing media presentation to initiate the appropriate action to switch the expression. During that time, the information in the MPD can be used to present the representation together or explore the media presentation.

シグナリングセグメント開始時間
表現は、時間的に複数のセグメントへと分割され得る。あるセグメントの最後のフラグメントと次のセグメントの次のフラグメントとの間に、トラック間タイミング問題が存在する。加えて、不変の継続時間のセグメントが使用される場合、別のタイミング問題が存在する。 The signaling segment start time representation may be divided into multiple segments in time. There is an inter-track timing problem between the last fragment of a segment and the next fragment of the next segment. In addition, there is another timing issue when a constant duration segment is used.

各セグメントに対して同じ継続時間を使用することは、MPDが小型および不変であるという利点を有し得る。しかしながら、各セグメントは依然として、ランダムアクセスポイントで開始し得る。したがって、ビデオ符号化は、これらの特定の点においてランダムアクセスポイントを与えるように制約されることがあり、または、実際のセグメント継続時間は、正確にはMPDにおいて規定されるようなものではないことがある。ストリーミングシステムがビデオ符号化プロセスに対して不必要な制約を課さないことが望ましいことがあるので、第2の選択肢が好まれ得る。 Using the same duration for each segment may have the advantage that the MPD is small and unchanged. However, each segment can still start with a random access point. Therefore, video encoding may be constrained to give random access points at these particular points, or the actual segment duration may not be exactly as specified in the MPD There is. The second option may be preferred because it may be desirable for the streaming system not to impose unnecessary constraints on the video encoding process.

具体的には、ファイル継続時間がd秒としてMPDにおいて規定される場合、n番目のファイルは、時間(n-1)dにおける、またはその直後のランダムアクセスポイントで開始し得る。 Specifically, if the file duration is defined in the MPD as d seconds, the nth file may start at a random access point at or immediately after time (n−1) d.

この手法では、各ファイルは、グローバルなプレゼンテーション時間を単位とするセグメントの正確な開始時間についての情報を含み得る。これをシグナリングするための3つの可能な方法は、以下のことを含む。 In this approach, each file may contain information about the exact start time of the segment in units of global presentation time. Three possible ways to signal this include:

(1)第1に、各セグメントの開始時間を、MPDにおいて規定される厳密なタイミングに制限する。しかし、そうすると、メディアエンコーダはIDRフレームの配置に関して何ら柔軟性をもつことができず、ファイルストリーミングのために特別な符号化を必要とし得る。 (1) First, the start time of each segment is limited to the exact timing specified in the MPD. However, in doing so, the media encoder may not have any flexibility with regard to the placement of IDR frames and may require special encoding for file streaming.

(2)第2に、各セグメントに対するMPDに厳密な開始時間を追加する。オンデマンドの場合には、MPDの小型性が低下し得る。ライブの場合には、これはMPDの定期的な更新を必要とすることがあり、スケーラビリティが低下し得る。 (2) Second, add a strict start time to the MPD for each segment. In the case of on-demand, the compactness of the MPD can be reduced. In the live case, this may require regular MPD updates, which can reduce scalability.

(3)第3に、表現の告知された開始時間またはMPD中のセグメントの告知された開始時間に対するグローバル時間または厳密な開始時間を、セグメントがこの情報を含むという意味で、セグメントに追加する。これは、適応ストリーミングに専用の新たなボックスに追加され得る。このボックスはまた、「TIDX」または「SIDX」ボックスによって提供されるような情報を含み得る。この第3の手法の結果は、セグメントの1つの始めに近い特定の位置を探すときに、クライアントが、MPDに基づいて、要求される探索点を含むセグメントの後続のセグメントを選び得る、というものである。この場合の単純な応答は、取り出されたセグメントの開始よりも前(すなわち、探索点の後の次のランダムアクセスポイント)に探索点を動かすことであり得る。通常、ランダムアクセスポイントは、少なくとも数秒ごとに設けられる(かつ、ランダムアクセスポイントをよりまれにすることによる符号化の利益がほとんどないことが多い)ので、最悪の場合でも、探索点は規定よりも数秒後に動かされ得る。代替的に、クライアントは、セグメントに対するヘッダ情報を取り出す際に、要求された探索点が以前のセグメントの中に実際にあると判定し、代わりにそのセグメントを要求することができる。これは、探索動作を実行するために必要とされる時間を時々増加させ得る。 (3) Third, add to the segment a global or exact start time relative to the announced start time of the expression or the announced start time of the segment in the MPD, in the sense that the segment contains this information. This can be added to a new box dedicated to adaptive streaming. This box may also contain information as provided by the “TIDX” or “SIDX” box. The result of this third approach is that when looking for a specific location near the beginning of one of the segments, the client can choose a subsequent segment of the segment containing the requested search point based on the MPD It is. A simple response in this case may be to move the search point before the start of the retrieved segment (ie, the next random access point after the search point). Random access points are usually provided at least every few seconds (and there is often little coding benefit from making random access points rarer), so even in the worst case the search points are more than specified. Can be moved after a few seconds. Alternatively, when retrieving the header information for a segment, the client can determine that the requested search point is actually in the previous segment and request that segment instead. This can sometimes increase the time required to perform the search operation.

アクセス可能なセグメントのリスト
メディアプレゼンテーションは、元のメディアコンテンツに対する何らかの異なるバージョンの符号化を各々提供する、表現のセットを含む。有利には、表現自体が、他のパラメータと比較した場合の、表現の差分パラメータについての情報を含む。表現はまた、明示的にまたは暗黙的に、アクセス可能なセグメントのリストを含む。 List of accessible segments The media presentation includes a set of representations that each provide some different version of the encoding for the original media content. Advantageously, the representation itself contains information about the differential parameter of the representation when compared to other parameters. The representation also includes a list of accessible segments, either explicitly or implicitly.

セグメントは、メタデータのみを含むタイムレスセグメントおよび主にメディアデータを含むメディアセグメントにおいて区別され得る。Media Presentation Description(「MPD」)は有利には、暗黙的にまたは明示的に、セグメントの各々に対する異なる属性を識別し割り当てる。各セグメントに有利に割り当てられた属性は、セグメントがアクセス可能である期間、それらを通じてセグメントがアクセス可能なリソースおよびプロトコルを含む。加えて、メディアセグメントは、有利には、メディアプレゼンテーション中のセグメントの開始時間、メディアプレゼンテーション中のセグメントの継続時間のような属性を割り当てられる。 Segments can be distinguished in timeless segments that contain only metadata and media segments that contain primarily media data. The Media Presentation Description (“MPD”) advantageously identifies and assigns different attributes to each of the segments, either implicitly or explicitly. Attributes that are advantageously assigned to each segment include the resources and protocols through which the segment is accessible during the period that the segment is accessible. In addition, the media segment is advantageously assigned attributes such as the start time of the segment during the media presentation, the duration of the segment during the media presentation.

OnDemandAvailabilityIntervalのようなmedia presentation description中の属性において有利に示されるように、メディアプレゼンテーションが「オンデマンド」タイプである場合、media presentation descriptionは通常、セグメント全体を記述し、セグメントがアクセス可能である場合およびセグメントがアクセス可能ではない場合を示すものを提供する。セグメントの開始時間は、有利には、同じメディアプレゼンテーションの再生を異なる時間に開始する2つのクライアントが同じmedia presentation descriptionとともに同じメディアセグメントを使用できるように、メディアプレゼンテーションの開始に対して表現される。これは、有利なことに、セグメントをキャッシュする能力を改善させる。 If the media presentation is of an “on-demand” type, as advantageously indicated in an attribute in the media presentation description, such as OnDemandAvailabilityInterval, the media presentation description typically describes the entire segment and if the segment is accessible and Provide an indication of when the segment is not accessible. The start time of the segment is advantageously expressed relative to the start of the media presentation so that two clients that start playing the same media presentation at different times can use the same media segment with the same media presentation description. This advantageously improves the ability to cache segments.

属性Serviceなどのmedia presentation description中の属性によって有利に示されるように、メディアプレゼンテーションが「ライブ」タイプである場合、実際の時刻を超えるメディアプレゼンテーションを含むセグメントは一般に、セグメントがMPDにおいて完全に記述されていても、生成されず、または少なくともアクセス可能ではない。しかしながら、メディアプレゼンテーションサービスが「ライブ」タイプであることを示すものによって、クライアントは、MPDに含まれる情報およびMPDのダウンロード時間に基づく、実時間のクライアント内部時間NOWに対するタイミング属性とともに、アクセス可能なセグメントのリストを生成することができる。実時間NOWにおいてMPDのインスタントに対して動作している参照クライアントがリソースにアクセスできるように、サーバがリソースをアクセス可能にするという意味で有利に、サーバは動作する。 If the media presentation is of a “live” type, as advantageously indicated by an attribute in the media presentation description, such as the attribute Service, a segment that contains a media presentation that exceeds the actual time is generally fully described in the MPD. Is not generated or at least not accessible. However, by indicating that the media presentation service is of the “live” type, the client can access the segment with timing attributes for the real-time client internal time NOW based on the information contained in the MPD and the MPD download time. A list of can be generated. The server advantageously operates in the sense that the server makes the resource accessible so that a reference client operating on the MPD instant in real time NOW can access the resource.

具体的には、参照クライアントは、MPDに含まれる情報およびMPDのダウンロード時間に基づく、実時間のクライアント内部時間NOWに対するタイミング属性とともに、アクセス可能なセグメントのリストを生成する。時間が進むと、クライアントは同じMPDを使用して、メディアプレゼンテーションを連続的に再生するために使用され得る新たなアクセス可能なセグメントリストを作成する。したがって、サーバは、これらのセグメントが実際にアクセス可能になる前に、MPD中のセグメントを告知することができる。これは、MPDの頻繁な更新およびダウンロードを減らすので、有利である。 Specifically, the reference client generates a list of accessible segments together with timing attributes for real-time client internal time NOW based on information contained in the MPD and MPD download time. As time progresses, the client uses the same MPD to create a new accessible segment list that can be used to continuously play the media presentation. Thus, the server can announce the segments in the MPD before these segments are actually accessible. This is advantageous because it reduces frequent updates and downloads of the MPD.

各々が開始時間tSを伴うセグメントのリストが、FileInfoStaticのような要素の中のプレイリストによって明示的に、または、FileInfoDynamicのような要素を使用することによって暗黙的に記述されると仮定する。FileInfoDynamicを使用するセグメントリストを生成するための有利な方法が、以下で説明される。この構築規則に基づいて、クライアントは、本明細書ではFileURI(r,i)と呼ばれる、各表現rに対するURIのリスト、および、インデックスiを伴う各セグメントの開始時間tS(r,i)に対するアクセス権を有する。 Assume that a list of segments each with a start time tS is described explicitly by a playlist in an element like FileInfoStatic or implicitly by using an element like FileInfoDynamic. An advantageous method for generating a segment list using FileInfoDynamic is described below. Based on this construction rule, the client has access to a list of URIs for each representation r, referred to herein as FileURI (r, i), and the start time tS (r, i) for each segment with index i. I have the right.

セグメントのアクセス可能な時間枠を作成するためにMPD中の情報を使用することは、次の規則を使用して実行され得る。 Using the information in the MPD to create an accessible time frame for the segment can be performed using the following rules.

Serviceのような属性によって有利に示されるように、「オンデマンド」タイプのサービスでは、OnDemandAvailabilityIntervalのようなMPD要素によって有利に表されるように、クライアントにおける現在の実時間NOWが利用可能な任意の範囲内にある場合、このオンデマンドプレゼンテーションのすべての記述されるセグメントがアクセス可能である。クライアントにおける現在の実時間NOWが利用可能な任意の範囲の外にある場合、このオンデマンドプレゼンテーションの記述されるセグメントのいずれもアクセス可能ではない。 For services of type “on demand”, as advantageously indicated by attributes such as Service, any current available real-time NOW at the client is available, as advantageously represented by an MPD element such as OnDemandAvailabilityInterval. If within range, all described segments of this on-demand presentation are accessible. If the current real-time NOW at the client is outside any available range, none of the described segments of this on-demand presentation is accessible.

Serviceのような属性によって有利に示されるように、「ライブ」タイプのサービスでは、開始時間tS(r,i)は、有利には、実時間で利用可能な時間を表す。利用可能開始時間は、イベントのライブサービス時間と、キャプチャ、符号化、および公開のためのサーバにおける何らかのターンアラウンドタイムとの組合せとして導出され得る。たとえば、このプロセスのための時間は、たとえばMPD中のSafetyGuardIntervalLiveServiceとして規定される安全ガードインターバルtGをたとえば使用して、MPD中で規定され得る。これは、UTC時間と、HTTPストリーミングサーバ上でのデータの利用可能性との差分を最小にする。別の実施形態では、MPDは、イベントのライブ時間とターンアラウンドタイムとの間の差分として、ターンアラウンドタイムを提供することなくMPD中のセグメントの利用可能時間を明示的に規定する。次の説明では、任意のグローバルな時間が利用可能時間として規定されると仮定される。ライブメディアブロードキャスティングの当業者は、この説明を読んだ後、media presentation description中の適切な情報からこの情報を導出することができる。 As advantageously indicated by attributes such as Service, for a “live” type service, the start time tS (r, i) advantageously represents the time available in real time. The available start time may be derived as a combination of the event's live service time and any turnaround time at the server for capture, encoding and publishing. For example, the time for this process may be defined in the MPD using, for example, a safety guard interval tG defined as SafetyGuardIntervalLiveService in the MPD. This minimizes the difference between UTC time and the availability of data on the HTTP streaming server. In another embodiment, the MPD explicitly defines the available time of a segment in the MPD as a difference between the live time of the event and the turnaround time without providing a turnaround time. In the following description, it is assumed that any global time is defined as available time. Those skilled in live media broadcasting can read this description and then derive this information from the appropriate information in the media presentation description.

LivePresentationIntervalのようなMPD要素によって有利に表されるように、クライアントにおける現在の実時間NOWがプレゼンテーション間隔の任意の範囲の外にある場合、このライブプレゼンテーションの記述されるセグメントのいずれもアクセス可能ではない。クライアントにおける現在の実時間NOWがライブプレゼンテーション間隔の中にある場合、このライブプレゼンテーションの記述されるセグメントの少なくともいくつかのセグメントがアクセス可能であり得る。 If the current real-time NOW at the client is outside any range of the presentation interval, as advantageously represented by an MPD element such as LivePresentationInterval, none of the described segments of this live presentation are accessible . If the current real-time NOW at the client is within the live presentation interval, at least some of the described segments of this live presentation may be accessible.

アクセス可能なセグメントの制約は、次の値によって支配される。 Accessible segment constraints are governed by the following values:

(クライアントに対して利用可能なものとしての)実時間NOW。 Real-time NOW (as available to clients).

たとえばmedia presentation description中のTimeShiftBufferDepthとして規定される許容されるタイムシフトバッファ深度tTSB。 For example, the allowed time shift buffer depth tTSB specified as TimeShiftBufferDepth in the media presentation description.

相対的なイベント時間t₁におけるクライアントは、(NOW-tTSB)からNOWの間隔の中で、または、継続時間dを伴うセグメントの終了時間も含まれて(NOW-tTSB-d)からNOWという間隔をもたらすような間隔の中で、開始時間tS(r,i)を伴うセグメントを要求することのみを許容され得る。 The client at the relative event time t _{1 is} in the interval from (NOW-tTSB) to NOW, or includes the end time of the segment with duration d (NOW-tTSB-d) to NOW May be allowed only to request a segment with a start time tS (r, i).

MPDの更新
いくつかの実施形態では、サーバは、たとえばサーバの位置が変わるとき、またはメディアプレゼンテーションが異なるサーバからの何らかの広告を含むとき、またはメディアプレゼンテーションの継続時間が未知であるとき、または、サーバが次のセグメントのロケータを不明瞭にすることを望むとき、ファイルまたはセグメントのロケータと、セグメントの開始時間とを事前に知らない。 Updating the MPD In some embodiments, the server may be, for example, when the server location changes, or when the media presentation includes some advertisement from a different server, or when the duration of the media presentation is unknown, or the server Does not know in advance the locator of the file or segment and the start time of the segment when it wants to obscure the locator of the next segment.

そのような実施形態では、サーバは、すでにアクセス可能である、またはMPDのこのインスタンスが公開された直後にアクセス可能になるセグメントのみを記述し得る。さらに、いくつかの実施形態では、クライアントは有利には、ユーザがメディアコンテンツの生成から可能な限り早く、含まれるメディアプログラムを体験するように、MPD中で記述されるメディアに近いメディアを消費する。MPD中の記述されるメディアセグメントの終わりに到達するとクライアントが予測するとすぐに、クライアントは有利には、MPDの新たなインスタンスを要求して、サーバが新たなメディアセグメントを記述する新たなMPDを公開したという予測のもとで連続的な再生を続ける。サーバは有利には、クライアントが連続的な更新のための手順を利用できるように、MPDの新たなインスタンスを生成し、MPDを更新する。サーバは、セグメントの生成とともにMPD更新手順を適合させることができ、共通のクライアントとして動作する参照クライアントの手順が動作し得る。 In such an embodiment, the server may only describe segments that are already accessible or that will be accessible immediately after this instance of the MPD is published. Further, in some embodiments, the client advantageously consumes media that is close to the media described in the MPD so that the user experiences the included media program as soon as possible from the generation of the media content. . As soon as the client expects to reach the end of the described media segment in the MPD, the client advantageously requests a new instance of the MPD and the server publishes a new MPD describing the new media segment Continued playback based on the prediction that The server advantageously creates a new instance of the MPD and updates the MPD so that the client can use the procedure for continuous updates. The server can adapt the MPD update procedure with the generation of the segments, and the reference client procedure acting as a common client can operate.

MPDの新たなインスタンスが前方の短い時間しか記述しない場合、クライアントは、MPDの新たなインスタンスを頻繁に要求する必要がある。これは、スケーラビリティの問題と、不必要に頻繁な要求による不必要なアップリンクおよびダウンリンクのトラフィックとをもたらし得る。 If a new instance of MPD only describes a short time ahead, the client needs to request a new instance of MPD frequently. This can lead to scalability issues and unnecessary uplink and downlink traffic due to unnecessary frequent requests.

したがって、一方では、セグメントを必ずしもまだアクセス可能にすることなく、可能な限り遠い未来のセグメントを記述すること、他方では、新たなサーバの位置を表現し、広告などの新たなコンテンツの挿入を可能にし、またはコーデックパラメータの変更を提供するために、MPDの予測されない更新を可能にすることに、意味がある。 Thus, on the one hand, you can describe future segments as far as possible without making the segments still accessible, and on the other hand, you can represent new server locations and insert new content such as advertisements It makes sense to allow unpredictable updates of the MPD to provide or change codec parameters.

さらに、いくつかの実施形態では、メディアセグメントの継続時間は、たとえば数秒の範囲のように、短いことがある。メディアセグメントの継続時間は、有利には、ライブサービス、またはセグメントの記憶または配信を扱う他の態様における端末間遅延を補償するために、または他の理由で、配信またはキャッシングの特性に対して最適化され得る適切なセグメントサイズへと調整されることについて柔軟である。セグメントがメディアプレゼンテーション継続時間と比較して小さい場合は特に、大量のメディアセグメントリソースおよび開始時間が、media presentation descriptionにおいて記述される必要がある。結果として、media presentation descriptionのサイズは大きいことがあり、これは、media presentation descriptionのダウンロード時間に悪影響を与え得るので、メディアプレゼンテーションの開始遅延、およびアクセスリンク上での帯域幅の使用率にも影響を与え得る。したがって、プレイリストを使用したメディアセグメントのリストの記述を許容するだけではなく、テンプレートまたはURL構築規則を使用することによる記述も許容することが有利である。テンプレートおよびURL構築規則は、この説明では同義的に使用される。 Further, in some embodiments, the duration of the media segment may be short, such as in the range of a few seconds. Media segment duration is advantageously optimal for delivery or caching characteristics to compensate for end-to-end delays in live services, or other aspects of segment storage or delivery, or for other reasons It is flexible to be adjusted to an appropriate segment size that can be optimized. A large amount of media segment resources and start times need to be described in the media presentation description, especially if the segment is small compared to the media presentation duration. As a result, the size of the media presentation description can be large, which can adversely affect the download time of the media presentation description, which also affects the media presentation start delay and bandwidth usage on the access link Can give. Therefore, it is advantageous not only to allow the description of a list of media segments using playlists, but also to allow description by using templates or URL construction rules. Template and URL construction rules are used interchangeably in this description.

加えて、テンプレートは、有利には、ライブの場合に現在の時間を超えるセグメントロケータを記述するために使用され得る。そのような場合、MPDの更新は、それ自体はロケータとしては不要であるとともに、セグメントリストはテンプレートによって記述される。しかしながら、それでも、表現またはセグメントの記述の変更を要求する、予測されないイベントが起こり得る。複数の異なるリソースからのコンテンツが一緒に接合される場合、たとえば、広告が挿入された場合、適応HTTPストリーミングmedia presentation descriptionの変更が必要とされ得る。異なるソースからのコンテンツは、種々の方式で異なり得る。別の理由は、ライブプレゼンテーションの間に、ある元のライブサーバから別のサーバへのフェイルオーバーを実現するために、コンテンツファイルに使用されるURLを変更する必要があり得る、ということである。 In addition, the template can advantageously be used to describe segment locators that exceed the current time when live. In such a case, the MPD update itself is not necessary as a locator, and the segment list is described by a template. However, unforeseen events may still occur that require changes to the representation or segment description. When content from multiple different resources are spliced together, for example, when an advertisement is inserted, a change to the adaptive HTTP streaming media presentation description may be required. Content from different sources can differ in various ways. Another reason is that during a live presentation, the URL used for the content file may need to be changed to achieve failover from one original live server to another.

いくつかの実施形態では、MPDが更新される場合、MPDに対する更新が、次のような意味で更新されたMPDが前のMPDに適合するように実行されることが有利である。上記の意味とは、参照クライアントおよび、したがって任意の実装されるクライアントが、前のMPDの有効時間までの任意の時間に更新されたMPDから、MPDの前のインスタンスから生成したであろうものと同一の機能をもつ、アクセス可能なセグメントのリストを生成する、という意味である。この要件は、(a)更新時間の前は古いMPDに適合しているので、クライアントが古いMPDとの同期を伴わずに新たなMPDを使用して直ちに開始できること、および(b)MPDへの実際の変更が起きる時間と更新時間が同期される必要はないことを確実にする。言い換えると、MPDへの更新は事前に告知されてよく、サーバは、新たな情報が利用可能になると、MPDの異なるバージョンを維持する必要なく、MPDの古いインスタンスを置き換えることができる。 In some embodiments, when the MPD is updated, it is advantageous that an update to the MPD is performed so that the updated MPD matches the previous MPD in the following sense. The above meaning is that the reference client, and thus any implemented client, would have generated from a previous instance of the MPD from an MPD that was updated at any time up to the validity time of the previous MPD This means that a list of accessible segments having the same function is generated. This requirement is (a) adapted to the old MPD before the update time, so that the client can start immediately using the new MPD without synchronizing with the old MPD, and (b) to the MPD Ensure that the time when the actual change occurs and the update time do not need to be synchronized. In other words, updates to the MPD may be announced in advance, and the server can replace the old instance of the MPD as new information becomes available without having to maintain a different version of the MPD.

表現のセットまたはすべての表現に対するMPDの更新にまたがるメディアのタイミングについて、2つの可能性が存在し得る。(a)既存のグローバルな時間軸がMPDの更新にまたがって続く(本明細書では「連続MPD更新」と呼ばれる)か、または(b)現在の時間軸が終了し、変更の後のセグメントで新たな時間軸が開始する(本明細書では「不連続MPD更新」と呼ばれる)かのいずれかである。 There may be two possibilities for media timing across MPD updates for a set of representations or all representations. (a) the existing global timeline follows the MPD update (referred to herein as `` continuous MPD update ''), or (b) the current timeline ends and the segment after the change Either a new time axis starts (referred to herein as “discontinuous MPD update”).

これらの選択肢の間の差は、メディアフラグメントの、および、したがってセグメントのトラックが一般に、トラック間の異なるサンプル粒度が原因で、同時に開始しかつ終了しないことを考えると明白であり得る。通常のプレゼンテーションの間、フラグメントの1つのトラックのサンプルは、前のフラグメントの別のトラックのいくつかのサンプルの前にレンダリングされ得る。すなわち、単一のトラック内には重複はないことがあるが、フラグメントの間にはある種の重複が存在する。 The difference between these options may be obvious considering that the tracks of media fragments, and thus of segments, generally do not start and end simultaneously due to different sample granularities between tracks. During a normal presentation, samples from one track of a fragment can be rendered before some samples from another track of the previous fragment. That is, there may be no overlap within a single track, but there is some overlap between fragments.

(a)と(b)の差は、そのような重複がMPDの更新にまたがって可能であり得るかどうかである。MPDの更新が、完全に別々のコンテンツの接合によるものである場合、新たなコンテンツが前のコンテンツと接合されるべき新たな符号化物を必要とするので、そのような重複は一般に実現するのが難しい。したがって、いくつかのセグメントに対する時間軸を再開することによって、メディアプレゼンテーションの不連続な更新のための能力を提供し、場合によっては更新の後の表現の新たなセットも定義することが有利である。また、コンテンツが独立に符号化されセグメント化された場合、コンテンツの前の断片のグローバルな時間軸内に収まるようにタイムスタンプを調整することも避けられる。 The difference between (a) and (b) is whether such duplication may be possible across MPD updates. If the MPD update is due to the joining of completely separate content, such duplication is generally realized because the new content requires a new encoding to be joined with the previous content. difficult. Therefore, it is advantageous to provide the ability for discontinuous updates of the media presentation by resuming the time axis for several segments and possibly also define a new set of representations after the update. . Also, if the content is independently encoded and segmented, it is also possible to avoid adjusting the time stamp so that it falls within the global time axis of the previous fragment of the content.

説明されたメディアセグメントのリストに新たなメディアセグメントを追加するだけである場合などの、更新がより小さな理由によるものである場合、または、URLの位置が変更される場合、重複および連続的な更新が許可され得る。 Duplicate and continuous updates if the update is for a smaller reason, such as simply adding a new media segment to the list of described media segments, or if the URL location changes Can be allowed.

不連続MPD更新の場合、前の表現の最後のセグメントの時間軸は、セグメント中の任意のサンプルの最遅のプレゼンテーション終了時間で終了する。次の表現の時間軸(または、より正確には、新たな期間とも呼ばれる、メディアプレゼンテーションの新たな部分の最初のメディアセグメントの最初のプレゼンテーション時間)は通常、および有利に、シームレスおよび連続的な再生が確実にされるように、最後の期間のプレゼンテーションの終わりと同じこの瞬間に開始する。 For discontinuous MPD updates, the time axis of the last segment of the previous representation ends at the latest presentation end time for any sample in the segment. The timeline of the next representation (or more precisely, the first presentation time of the first media segment of the new part of the media presentation, also called the new time period) is normal, and advantageously, seamless and continuous playback Start at this moment, the same as the end of the last period presentation, to ensure.

2つの場合が図11に示されている。 Two cases are shown in FIG.

MPDの更新をセグメント境界へと制約することが好まれかつ有利である。そのような変更または更新をセグメント境界へと制約する根拠は、次の通りである。第1に、各表現に対するバイナリメタデータ、通常はMovie Headerへの変更は、少なくともセグメント境界において起こり得る。第2に、Media Presentation Descriptionは、セグメントへのポインタ(URL)を含み得る。ある意味で、MPDは、メディアプレゼンテーションと関連付けられるすべてのセグメントファイルを一緒にグルーピングする、「包括的」データ構造である。この包含関係を維持するために、各セグメントは単一のMPDによって参照されてよく、MPDが更新されるとき、MPDは有利にはセグメント境界においてのみ更新される。 It is preferred and advantageous to constrain MPD updates to segment boundaries. The basis for constraining such changes or updates to segment boundaries is as follows. First, changes to binary metadata for each representation, usually Movie Header, can occur at least at segment boundaries. Second, the Media Presentation Description may include a pointer (URL) to the segment. In a sense, MPD is a “comprehensive” data structure that groups together all segment files associated with a media presentation. To maintain this containment relationship, each segment may be referenced by a single MPD, and when the MPD is updated, the MPD is advantageously updated only at segment boundaries.

セグメント境界は一般に揃えられる必要はないが、異なるソースから接合されたコンテンツの場合、および不連続MPD更新の場合は、一般に、セグメント境界を揃えること(具体的には、各表現の最後のセグメントが同じビデオフレームにおいて終了してよく、そのフレームのプレゼンテーション時間よりも後のプレゼンテーション開始時間を伴うオーディオサンプルを含まなくてよいこと)が理にかなう。そして、不連続更新は、期間と呼ばれる共通の瞬間において表現の新たなセットを開始することができる。表現のこの新たなセットが有効になる開始時間は、たとえば、期間開始時間によって与えられる。各表現の相対的な開始時間は0にリセットされ、期間の開始時間は、グローバルなメディアプレゼンテーションの時間軸におけるこの新たな期間に表現のセットを配置する。 Segment boundaries generally do not need to be aligned, but for content spliced from different sources, and for discontinuous MPD updates, generally align segment boundaries (specifically, the last segment of each representation It makes sense that it may end in the same video frame and not include audio samples with a presentation start time later than that frame's presentation time). Discontinuous updates can then start a new set of representations at a common moment called a period. The start time at which this new set of representations becomes valid is given, for example, by the period start time. The relative start time of each representation is reset to 0, and the start time of the period places the set of representations in this new period on the global media presentation timeline.

連続MPD更新では、セグメント境界は揃えられる必要はない。各々の代替的な表現の各セグメントは、単一のMedia Presentation Descriptionによって支配され得るので、動作しているMPDにおいて追加のメディアセグメントが記述されないという予測によって一般に引き起こされる、Media Presentation Descriptionの新たなインスタンスに対する更新要求は、消費されることが予測される表現のセットを含む表現の消費されるセットに応じて、異なる時間に起こり得る。 For continuous MPD updates, segment boundaries need not be aligned. Each segment of each alternative representation can be dominated by a single Media Presentation Description, so a new instance of Media Presentation Description is commonly caused by the prediction that no additional media segments will be described in the working MPD Update requests for can occur at different times depending on the consumed set of expressions, including the set of expressions that are expected to be consumed.

より一般的な場合のMPDの要素および属性の更新をサポートするために、表現または表現のセットだけではなく任意の要素が、有効時間と関連付けられ得る。よって、MPDのいくつかの要素が更新される必要がある場合、たとえば、表現の数が変化した場合、またはURL構築規則が変化した場合、これらの要素は各々、つながっていない有効時間を伴う要素の複数のコピーを提供することによって、規定された時間において個々に更新され得る。 In order to support updating the elements and attributes of the MPD in the more general case, any element, not just an expression or set of expressions, can be associated with a valid time. Thus, if some elements of the MPD need to be updated, for example, if the number of expressions changes, or if the URL construction rules change, these elements are each elements with unconnected lifetimes Can be updated individually at defined times by providing multiple copies of

有効時間と関連付けられる説明された要素が、メディアプレゼンテーションのグローバルな時間軸の期間において有効であるように、有効性は有利には、グローバルなメディア時間と関連付けられる。 Effectiveness is advantageously associated with global media time so that the described elements associated with valid time are valid during the global time axis of the media presentation.

上で論じられたように、一実施形態では、有効時間は、表現の完全なセットにのみ追加される。そして、各々の完全なセットが1つの期間を形成する。そして、有効時間が期間の開始時間を形成する。言い換えると、有効要素を使用する特定の場合には、表現の完全なセットは、表現のセットに対するグローバルな有効時間によって示される、時間的なある期間に有効であり得る。表現のセットの有効時間は、期間と呼ばれる。新たな期間の始めにおいて、前のセットの表現の有効性は期限切れになり、表現の新たなセットが有効になる。有効期間は好ましくはつながっていないことに再び留意されたい。 As discussed above, in one embodiment, the validity time is added only to the complete set of representations. Each complete set then forms one period. And the effective time forms the start time of the period. In other words, in the specific case of using valid elements, the complete set of representations may be valid for a period of time, as indicated by the global valid time for the set of representations. The effective time of a set of expressions is called a duration. At the beginning of a new period, the validity of the previous set of expressions expires and the new set of expressions becomes valid. Note again that the validity period is preferably not linked.

上で述べられたように、Media Presentation Descriptionに対する変更は、セグメント境界において発生するので、各表現に対して、要素に対する変更は実際に次のセグメント境界において発生する。クライアントは次いで、メディアのプレゼンテーション時間内の時間の各瞬間に対するセグメントのリストを含む、有効なMPDを形成することができる。 As mentioned above, changes to the Media Presentation Description occur at the segment boundary, so for each representation, changes to the element actually occur at the next segment boundary. The client can then form a valid MPD that includes a list of segments for each moment of time within the presentation time of the media.

不連続ブロック接合は、ブロックが異なる表現からの、または異なるコンテンツからの、たとえば、コンテンツのセグメントおよび広告からのメディアデータを含む場合、または他の場合において、適切であり得る。プレゼンテーションメタデータに対する変更がブロック境界のみで発生することが、ブロック要求ストリーミングシステムにおいて要求され得る。これは、ブロック内のメディアデコーダパラメータを更新することはブロック間のみでそれらを更新することよりも複雑であり得るので、実装上の理由で有利であり得る。この場合、規定された有効間隔の開始よりも早くない最初のブロック境界から、規定された有効間隔の終了よりも早くない最初のブロック境界まで、要素が有効であると見なされるように、上で説明された有効間隔が概算として解釈され得ることが、有利に規定され得る。 Discontinuous block joins may be appropriate when the blocks contain media data from different representations or from different content, eg, content segments and advertisements, or otherwise. It may be required in a block request streaming system that changes to presentation metadata occur only at block boundaries. This can be advantageous for implementation reasons, as updating the media decoder parameters within a block can be more complex than updating them only between blocks. In this case, the element is considered valid from the first block boundary not earlier than the start of the specified effective interval to the first block boundary not earlier than the end of the specified effective interval. It can be advantageously defined that the described effective interval can be interpreted as an approximation.

上記の例示的な実施形態は、メディアプレゼンテーションに対する変更と題された、後で提示されるセクションにおいて説明されるブロック要求ストリーミングシステムに対する新規の改善について説明する。 The above exemplary embodiments describe a new improvement to the block request streaming system described in the section presented later, titled Changes to Media Presentation.

セグメント継続時間のシグナリング
不連続更新は、期間と呼ばれるつながっていない一連の間隔へと、プレゼンテーションを実質的に分割する。各期間は、メディアサンプルタイミングに対する固有の時間軸を有する。ある期間内の表現のメディアタイミングは、有利には、各期間に対する、またはある期間中の各表現に対する、セグメント継続時間の別個の小型のリストを規定することによって、示され得る。 Segment Duration Signaling Discontinuous updates effectively divide the presentation into a series of unconnected intervals called periods. Each period has a unique time axis for media sample timing. The media timing of representations within a period can be advantageously indicated by defining a separate small list of segment durations for each period or for each representation during a period.

たとえば期間開始時間と呼ばれる、MPD内の要素と関連付けられる属性は、メディアプレゼンテーション時間内のいくつかの要素の有効時間を規定することができる。この属性は、MPDの任意の要素に追加され得る(有効性を割り当てられ得る属性が要素に変更され得る)。 For example, an attribute associated with an element in the MPD, called the period start time, can define the validity time of some elements within the media presentation time. This attribute can be added to any element of the MPD (attributes that can be assigned validity can be changed to elements).

不連続MPD更新では、すべての表現のセグメントが不連続点において終了し得る。これは一般に、不連続点の前の最後のセグメントが前のセグメントとは異なる継続時間を有することを、少なくとも示唆する。セグメント継続時間のシグナリングは、すべてのセグメントが同じ継続時間を有することを示すこと、または、各セグメントに対して別個の継続時間を示すことを伴い得る。セグメント継続時間のリストのための小型の表現を有することが望ましいことがあり、これは、セグメント継続時間の多くが同じ継続時間を有する場合に効率的である。 In discontinuous MPD update, all representation segments may end at discontinuities. This generally suggests at least that the last segment before the discontinuity has a different duration than the previous segment. Segment duration signaling may involve indicating that all segments have the same duration, or indicating a separate duration for each segment. It may be desirable to have a compact representation for a list of segment durations, which is efficient when many of the segment durations have the same duration.

1つの表現または表現のセットの中の各セグメントの継続時間は、有利には、不連続更新の開始、すなわち、MPDにおいて記述される最後のメディアセグメントまでの期間の開始からの、単一の間隔に対するすべてのセグメント継続時間を規定する、単一の文字列によって実行され得る。一実施形態では、この要素のフォーマットは、セグメント継続時間のエントリーのリストを含む生成物に準拠するテキスト文字列であり、ここで、各エントリーは、継続時間属性durと属性の任意選択の乗数multとを含み、第1のエントリーの継続時間<dur>の第1のエントリーセグメントの<mult>、第2のエントリーの継続時間<dur>の第2のエントリーセグメントの<mult>などを含むことを示す。 The duration of each segment in a representation or set of representations is advantageously a single interval from the start of the discontinuous update, i.e. the beginning of the period to the last media segment described in the MPD. Can be executed by a single string that defines all segment durations for. In one embodiment, the format of this element is a text string that conforms to a product containing a list of segment duration entries, where each entry has a duration attribute dur and an optional multiplier for the attribute mult. Including the first entry segment <mult> with the first entry duration <dur>, the second entry duration <dur> with the second entry segment <mult>, etc. Show.

各継続時間エントリーは、1つまたは複数のセグメントの継続時間を規定する。<dur>値の後に文字「*」および数が続く場合、この数は、この継続時間を伴う連続するセグメントの数を、秒単位で規定する。乗算記号「*」がない場合、セグメントの数は1である。後に続く数を伴わずに「*」が存在する場合、すべての後続のセグメントが規定された継続時間を有し、リスト中にさらなるエントリーはなくてよい。たとえば、文字列「30*」は、30秒という継続時間をすべてのセグメントが有することを意味する。文字列「30*12 10.5」は、継続時間30秒の12個のセグメントに、継続時間10.5秒の1つのセグメントが続くことを示す。 Each duration entry defines the duration of one or more segments. If the <dur> value is followed by the letter “*” and a number, this number defines the number of consecutive segments with this duration in seconds. If there is no multiplication symbol “*”, the number of segments is one. If “*” is present without a number that follows, all subsequent segments have a defined duration and there may be no further entries in the list. For example, the string “30 *” means that all segments have a duration of 30 seconds. The string “30 * 12 10.5” indicates that 12 segments with a duration of 30 seconds are followed by one segment with a duration of 10.5 seconds.

セグメント継続時間が各々の代替的な表現に対して別々に規定される場合、各間隔の中のセグメント継続時間の合計は、各表現に対して同一であり得る。ビデオトラックの場合、間隔は、各々の代替的な表現において同じフレームで終了し得る。 If the segment duration is defined separately for each alternative representation, the total segment duration in each interval may be the same for each representation. In the case of a video track, the interval may end with the same frame in each alternative representation.

当業者は、本開示を読めば、セグメント継続時間を小型に表現するための同様のまたは等価な方法を見出し得る。 Those of ordinary skill in the art, after reading this disclosure, may find similar or equivalent methods for compactly representing segment durations.

別の実施形態では、セグメントの継続時間は、単一の継続時間属性<duration>によって、最後の1つを除いて表現中のすべてのセグメントに対して一定であるものとしてシグナリングされる。次の不連続更新の開始点または新たな期間の開始が提供される限り、これは、最後のセグメントの継続時間が次の期間の開始にまで達することを示唆し、不連続更新の前の最後のセグメントの継続時間は、より短くてよい。 In another embodiment, the duration of a segment is signaled by a single duration attribute <duration> as being constant for all segments in the representation except the last one. As long as the start point of the next discrete update or the start of a new period is provided, this implies that the duration of the last segment reaches the start of the next period, and the last before the discrete update The duration of the segments may be shorter.

表現メタデータに対する変更および更新
Movie Header「moov」の変更のようなバイナリコーディングされた表現メタデータの変更を示すことは、異なる方法で達成され得る。すなわち、(a)MPDの中で参照される別個のファイル中に、すべての表現に対する1つのmoovボックスがあり得る、(b)各Alternative Representationの中で参照される別個のファイル中に、各々の代替的な表現に対する1つのmoovボックスがあり得る、(c)各セグメントがmoovボックスを含み得るので、自己完結型である、(d)MPDとともに1つの3GPファイルの中にすべての表現に対するmoovボックスがあり得る、である。 Changes and updates to representation metadata
Indicating a change in binary-coded representation metadata, such as a change in Movie Header “moov”, can be accomplished in different ways. (A) There can be one moov box for all representations in a separate file referenced in the MPD, and (b) each in a separate file referenced in each Alternative Representation. There can be one moov box for alternative representations, (c) each segment can contain a moov box, so it is self-contained, (d) moov boxes for all representations in one 3GP file with MPD There can be.

(a)および(b)の場合、単一の「moov」は有利には、「moov」ボックスの有効な時間がつながっていない限りより多くの「moov」ボックスがMPDにおいて参照され得るという意味で、上記の有効性という概念と組み合わされ得る。たとえば、期間の境界の定義によって、古い期間における「moov」の有効性は、新たな期間の開始によって期限切れになり得る。 In the case of (a) and (b), a single “moov” advantageously means that more “moov” boxes can be referenced in the MPD unless the effective times of the “moov” boxes are connected. Can be combined with the concept of effectiveness described above. For example, by defining a period boundary, the validity of “moov” in the old period may expire due to the start of a new period.

選択肢(a)の場合、単一のmoovボックスへの参照が、有効要素を割り当てられ得る。Multiple Presentationヘッダが許可され得るが、一度に1つのみが有効であり得る。別の実施形態では、ある期間中の表現のセット全体の有効時間、または上で定義されたような期間全体の有効時間が、通常はmoovヘッダとして提供される、この表現メタデータに対する有効時間として使用され得る。 For option (a), a reference to a single moov box can be assigned a valid element. Multiple Presentation headers can be allowed, but only one can be valid at a time. In another embodiment, the effective time of the entire set of representations during a period, or the effective time of the entire period as defined above, is usually the effective time for this expression metadata provided as a moov header. Can be used.

選択肢(b)の場合、各表現のmoovボックスへの参照が、有効要素を割り当てられ得る。Multiple Representationヘッダが許可され得るが、一度に1つのみが有効であり得る。別の実施形態では、表現全体の有効時間、または上で定義されたような期間全体の有効時間が、通常はmoovヘッダとして提供される、この表現メタデータに対する有効時間として使用され得る。 For option (b), a reference to the moov box of each representation can be assigned a valid element. Multiple Representation headers may be allowed, but only one may be valid at a time. In another embodiment, the lifetime of the entire representation, or the lifetime of the entire duration as defined above, can be used as the lifetime for this representation metadata, usually provided as a moov header.

選択肢(c)の場合、MPD中のシグナリングは追加されなくてよいが、moovボックスが今後来るセグメントのいずれかのために変化するかどうかを示すために、メディアストリーム中に追加のシグナリングが追加され得る。これは、「セグメントメタデータ内での更新のシグナリング」において以下でさらに説明される。 For option (c), signaling in the MPD may not be added, but additional signaling is added in the media stream to indicate whether the moov box will change for any of the upcoming segments. obtain. This is further described below in “Signaling of Updates in Segment Metadata”.

セグメントメタデータ内での更新のシグナリング
可能性のある更新についての知識を得るためにmedia presentation descriptionを頻繁に更新するのを避けるために、メディアセグメントとともに任意のそのような更新をシグナリングすることが有利である。media presentation descriptionのような更新されたメタデータが利用可能であり、アクセス可能なセグメントリストの作成を続けることを成功させるためにある時間内にアクセスされなければならないことを示し得る、追加の1つまたは複数の要素がメディアセグメント自体の中で提供され得る。加えて、そのような要素は、更新されたメタデータファイルについて、URLのようなファイル識別子、または、ファイル識別子を構築するために使用され得る情報を提供することができる。更新されたメタデータファイルは、有効間隔も付随する追加のメタデータとともに、有効間隔を示すように修正されたプレゼンテーションに対する元のメタデータファイルにおいて提供されるメタデータと等しい、メタデータを含み得る。そのような指示は、メディアプレゼンテーションに対するすべての利用可能な表現のメディアセグメントにおいて提供され得る。メディアブロック内でそのような指示を検出すると、ブロック要求ストリーミングシステムにアクセスするクライアントは、ファイルダウンロードプロトコルまたは他の手段を使用して、更新されたメタデータファイルを取り出すことができる。こうして、クライアントは、media presentation descriptionの変更についての情報と、変更が起きる、または起きた時間についての情報とを与えられる。有利には、各クライアントは、あり得る更新または変更に対して多くの回数ファイルを「ポーリング」し受信するのではなく、そのような変化が起きたときに一度だけ、更新されたmedia presentation descriptionを要求する。 Signaling of updates in the segment metadata It is advantageous to signal any such updates with the media segment to avoid frequent updates of the media presentation description to gain knowledge about possible updates It is. An additional one that may indicate that updated metadata, such as media presentation description, is available and must be accessed within a certain amount of time in order to continue creating an accessible segment list Or multiple elements may be provided within the media segment itself. In addition, such elements can provide a file identifier, such as a URL, or information that can be used to construct a file identifier for the updated metadata file. The updated metadata file may include metadata that is equivalent to the metadata provided in the original metadata file for the presentation that has been modified to show the validity interval, with additional metadata that is also accompanied by a validity interval. Such an indication may be provided in the media segment of all available representations for the media presentation. Upon detecting such an indication in the media block, a client accessing the block request streaming system can retrieve the updated metadata file using a file download protocol or other means. Thus, the client is provided with information about the change in the media presentation description and information about when the change occurred or occurred. Advantageously, each client has an updated media presentation description only once when such a change occurs, rather than “polling” and receiving the file many times for possible updates or changes. Request.

変更の例は、表現の追加または除去、ビットレート、解像度、アスペクト比、含まれるトラックまたはコーデックのパラメータの変更などの1つまたは複数の表現への変更、および、URL構築規則に対する変更、たとえば広告のための異なる発信元サーバを含む。一部の変更は、表現と関連付けられるMovie Header(「moov」)アトムのような初期化セグメントにのみ影響を与え得るが、他の変更は、Media Presentation Description(MPD)に影響を与え得る。 Examples of changes include changes to one or more representations, such as adding or removing representations, bit rate, resolution, aspect ratio, changes to included track or codec parameters, and changes to URL construction rules, such as advertising Including different origin servers. Some changes may only affect initialization segments, such as the Movie Header (“moov”) atom associated with the representation, while other changes may affect the Media Presentation Description (MPD).

オンデマンドコンテンツの場合、これらの変更およびタイミングは事前に知られていてよく、Media Presentation Descriptionにおいてシグナリングされ得る。 For on-demand content, these changes and timing may be known in advance and may be signaled in the Media Presentation Description.

ライブコンテンツでは、変更は、変更が起こる点まで知られないことがある。1つの解決法は、特定のURLにおいて利用可能なMedia Presentation Descriptionが動的に更新されるのを可能にすること、および変更を検出するためにこのMPDを定期的に要求するようにクライアントに求めることである。この解決法は、スケーラビリティ(元のサーバの負荷およびキャッシュの効率性)の観点で欠点を有する。多数の視聴者がいる状況では、キャッシュは、以前のバージョンがキャッシュからなくなった後、かつ、新たなバージョンが受信されこれらのすべてが発信元サーバに転送され得る前に、MPDに対する多くの要求を受信することができる。発信元サーバは、MPDの各々の更新されたバージョンに対するキャッシュから、常に要求を処理する必要があり得る。また、更新は、簡単には、メディアプレゼンテーションの変更と時間的に揃えられないことがある。 In live content, the change may not be known to the point where the change occurs. One solution is to allow the Media Presentation Description available at a specific URL to be dynamically updated and ask the client to periodically request this MPD to detect changes. That is. This solution has drawbacks in terms of scalability (original server load and cache efficiency). In situations where there are a large number of viewers, the cache makes many requests to the MPD after the previous version is no longer in the cache and before the new version is received and all of these can be transferred to the originating server. Can be received. The origin server may need to always process requests from the cache for each updated version of the MPD. Also, updates may not be easily aligned in time with media presentation changes.

HTTPストリーミングの利点の1つは、スケーラビリティのために、標準的なウェブインフラストラクチャおよびサービスを利用する能力であるので、好ましい解決法は、「不変の」(すなわち、キャッシュ可能な)ファイルのみを伴ってよく、ファイルが変化したかどうかを確認するためにクライアントがファイルを「ポーリング」することに依存しない。 Since one of the benefits of HTTP streaming is the ability to utilize standard web infrastructure and services for scalability, the preferred solution involves only “immutable” (ie, cacheable) files. It does not rely on the client to “polling” the file to see if it has changed.

media presentation description、および、適応HTTPストリーミングメディアプレゼンテーションにおける「moov」アトムのようなバイナリ表現メタデータを含む、メタデータの更新を解決するための解決法が論じられ提案される。 Solutions for resolving metadata updates, including media presentation descriptions and binary representation metadata such as “moov” atoms in adaptive HTTP streaming media presentations, are discussed and proposed.

ライブコンテンツの場合、MPDが構築されるとき、MPDまたは「moov」が変化し得る点は知られていないことがある。更新を確認するためのMPDの頻繁な「ポーリング」は、帯域幅およびスケーラビリティの理由で一般に避けられるべきであるので、MPDに対する更新は、セグメントファイル自体の中で「帯域内」で示されてよく、すなわち、各メディアセグメントは、更新を示すための選択肢を有し得る。上記のセグメントフォーマット(a)から(c)に応じて、異なる更新がシグナリングされ得る。 For live content, it may not be known that the MPD or “moov” can change when the MPD is built. MPD frequent “polling” to check for updates should generally be avoided for bandwidth and scalability reasons, so updates to the MPD may be indicated as “in-band” in the segment file itself. That is, each media segment may have an option to indicate an update. Depending on the segment format (a) to (c) above, different updates may be signaled.

一般に、セグメント内の信号において、以下の指示、すなわち、この表現内の次のセグメント、または現在のセグメントの開始時間よりも大きな開始時間を有する任意の次のセグメントを要求する前にMPDが更新され得るという、インジケータが有利に提供され得る。更新は事前に告知されてよく、更新が次のセグメントよりも遅い任意のセグメントにおいてのみ起こる必要があることを示す。このMPD更新はまた、メディアセグメントのロケータが変更された場合、Movie Headerのようなバイナリ表現メタデータを更新するために使用され得る。別の信号は、このセグメントが完了すると、時間を前に進めるこれ以上のセグメントが要求されるべきではないことを示し得る。 In general, in a signal within a segment, the MPD is updated before requesting the following indication: the next segment in this representation, or any next segment that has a start time greater than the start time of the current segment. An obtaining indicator may be advantageously provided. The update may be announced in advance, indicating that the update should only occur in any segment that is later than the next segment. This MPD update can also be used to update binary representation metadata such as the Movie Header if the locator of the media segment is changed. Another signal may indicate that when this segment is complete, no more segments should be required to advance time.

セグメントがセグメントフォーマット(c)に従ってフォーマットされる場合、すなわち、各メディアセグメントがムービーヘッダのような自己初期化メタデータを含み得る場合、後続のセグメントが更新されたMovie Header(moov)を含むことを示す、さらなる別の信号が追加され得る。これは、有利には、ムービーヘッダがセグメント中に含まれることを可能にするが、Movie Headerは、前のセグメントがMovie Header更新を示す場合、または、表現を切り替えるときの探索またはランダムアクセスの場合にのみ、クライアントによって要求される必要がある。他の場合には、クライアントは、ダウンロードからムービーヘッダを除外するセグメントにバイト範囲要求を出すことができるので、有利なことに帯域幅を節約する。 If the segment is formatted according to the segment format (c), i.e. each media segment may contain self-initializing metadata such as a movie header, then the subsequent segment shall contain an updated Movie Header (moov) Further additional signals can be added as shown. This advantageously allows the movie header to be included in the segment, but the Movie Header is when the previous segment indicates a Movie Header update, or when searching or random access when switching representations Only needs to be requested by the client. In other cases, the client can advantageously place a byte range request on the segment that excludes the movie header from the download, thus advantageously saving bandwidth.

さらに別の実施形態では、MPD更新の指示がシグナリングされる場合、信号はまた、更新されたMedia Presentation Descriptionに対するURLのようなロケータを含み得る。更新されたMPDは、不連続更新の場合の新たな期間および古い期間のような有効属性を使用して、更新の前と後の両方で、表現を記述することができる。これは、有利には、以下でさらに説明されるようなタイムシフト視聴を可能にするように使用され得るが、有利には、MPDの含む変更が効力をもつ前に、MPD更新が任意の時間にシグナリングされることも可能にする。クライアントは、直ちに新たなMPDをダウンロードし、それを進行中のプレゼンテーションに適用することができる。 In yet another embodiment, when an MPD update indication is signaled, the signal may also include a locator such as a URL for the updated Media Presentation Description. The updated MPD can describe the representation both before and after the update, using valid attributes such as the new period and the old period in the case of discontinuous updates. This can advantageously be used to allow time-shifted viewing as described further below, but advantageously the MPD update can take place at any time before the changes that the MPD includes become effective. Can also be signaled. The client can immediately download a new MPD and apply it to the ongoing presentation.

特定の実現形態では、media presentation descriptionに対する任意の変更、moovヘッダ、またはプレゼンテーションの終了のシグナリングは、ISOベースのメディアファイルフォーマットのボックス構造を使用してセグメントフォーマットの規則に従ってフォーマットされる、ストリーミング情報ボックスに含まれ得る。このボックスは、異なる更新のいずれに対しても固有の信号を提供し得る。 In certain implementations, any change to the media presentation description, moov header, or end-of-presentation signaling is formatted according to the segment format rules using the ISO-based media file format box structure, a streaming information box Can be included. This box may provide a unique signal for any of the different updates.

Streaming Information Box Streaming Information Box

定義 Definition

ボックスタイプ:「snif」
コンテナ:なし
必須性:なし
量:0または1 Box type: "snif"
Container: None Required: None Quantity: 0 or 1

Streaming Information Boxは、ファイルがその一部であるストリーミングプレゼンテーションについての情報を含む。 The Streaming Information Box contains information about the streaming presentation that the file is part of.

シンタックス Syntax

aligned(8) class StreamingInformationBox
extends FullBox('sinf'){
unsigned int(32) streaming_information_flags; aligned (8) class StreamingInformationBox
extends FullBox ('sinf') {
unsigned int (32) streaming_information_flags;

///以下は任意選択のフィールドである
string mpd_location
} /// The following are optional fields
string mpd_location
}

セマンティクス Semantics

streaming_information_flagsは、次のうちの0個以上の論理和を含む。 The streaming_information_flags includes zero or more of the following logical sums.

0x00000001 Movie Header更新が後に続く 0x00000001 Movie Header update follows

0x00000002 Presentation Descriptionの更新 0x00000002 Presentation Description update

0x00000004 プレゼンテーションの終了 0x00000004 End presentation

mpd_locationは、Presentation Description updateフラグが設定される場合のみ存在し、新たなMedia Presentation Descriptionに対するUniform Resource Locatorを提供する。 The mpd_location exists only when the Presentation Description update flag is set, and provides a Uniform Resource Locator for a new Media Presentation Description.

ライブサービスのためのMPD更新に対する例示的な使用事例
サービス提供者が、本明細書で説明される改善されたブロック要求ストリーミングを使用して、生のフットボールイベントを提供することを望んでいると仮定する。場合によっては、数百万のユーザが、イベントのプレゼンテーションにアクセスすることを望み得る。ライブイベントは、中断が宣告されるとき、または他の動作の一時休止のときに休憩により散発的に中断され、この間に、広告が追加され得る。通常、休憩の正確なタイミングの事前の通知は、まったくまたはほとんどない。 Example Use Case for MPD Updates for Live Services Assuming that a service provider wants to provide a live football event using the improved block request streaming described herein To do. In some cases, millions of users may wish to access the presentation of the event. Live events are sporadically interrupted by breaks when an interruption is declared or when other operations are paused, during which advertisements can be added. Usually there is no or little prior notification of the exact timing of the break.

サービス提供者は、ライブイベントの間の機器のいずれかの故障に備えて、シームレスな切り替えを可能にするために、冗長なインフラストラクチャ(たとえば、エンコーダおよびサーバ)を提供する必要があり得る。 Service providers may need to provide redundant infrastructure (eg, encoders and servers) to allow for seamless switching in case of any failure of equipment during a live event.

ユーザのアンナが、自分のモバイルデバイスでバスの中からサービスにアクセスし、サービスが直ちに利用可能になると仮定する。彼女の隣に座っているのは別のユーザのポールであり、彼は自分のラップトップコンピュータでイベントを見ている。ゴールが記録され、2人の両方が同時にこのイベントを祝う。その試合の最初のゴールはさらに興奮するものであったことをポールはアンナに話し、アンナは、時間的に30分前のイベントを見られるように、サービスを使用する。そのゴールを見た後、彼女はライブイベントに戻る。 Assume that the user's Anna accesses the service from the bus with his mobile device and the service is immediately available. Sitting next to her is another user, Paul, who is watching the event on his laptop computer. The goal is recorded and both of them celebrate the event at the same time. Paul tells Anna that the first goal of the match was even more exciting, and Anna uses the service to see the event 30 minutes in time. After seeing the goal, she returns to the live event.

その使用事例に対処するために、サービス提供者は、MPDを更新し、更新されたMPDが利用可能であることをクライアントにシグナリングし、クライアントがほぼリアルタイムでデータを提示できるようにクライアントがストリーミングサービスにアクセスすることを許容することが可能でなければならない。 To deal with its use cases, the service provider updates the MPD, signals the client that an updated MPD is available, and the client provides a streaming service so that the client can present data in near real time. It must be possible to allow access to

MPDの更新は、本明細書の他の箇所で説明されるように、セグメントの配信と同期しない方式で実行可能である。サーバは、MPDがある時間の間更新されないことを、受信機に対して保証することができる。サーバは、現在のMPDを利用し得る。しかしながら、MPDが何らかの最小更新期間の前に更新される場合、明示的なシグナリングは必要とされない。 The MPD update can be performed in a manner that is not synchronized with the distribution of the segments, as described elsewhere herein. The server can guarantee to the receiver that the MPD is not updated for some time. The server may use the current MPD. However, if the MPD is updated before some minimum update period, explicit signaling is not required.

クライアントは異なるMPD更新インスタンスについて動作していることがあり、したがってクライアントはドリフトを有し得るので、完全に同期した再生はほとんど達成されない。MPD更新を使用して、サーバは変更を伝えることができ、クライアントは、プレゼンテーションの間でも、変更を警告され得る。セグメントごとの帯域内シグナリングが、MPDの更新を示すために使用され得るので、更新は、セグメント境界に制限され得るが、それは多くの用途では許容可能であるはずである。 Fully synchronized playback is hardly achieved because the client may be operating on different MPD update instances and therefore the client may have drift. Using MPD updates, the server can communicate the changes and the client can be alerted of the changes even during the presentation. Since in-band signaling per segment can be used to indicate MPD updates, updates can be limited to segment boundaries, but it should be acceptable in many applications.

MPDの実時間での公開時間と、MPD更新が必要であることをシグナリングするためにセグメントの始めに追加される任意選択のMPD更新ボックスとを提供する、MPD要素が追加され得る。更新は、MPDのように、階層的に行われ得る。 An MPD element may be added that provides a real time publishing time of the MPD and an optional MPD update box that is added at the beginning of the segment to signal that an MPD update is required. Updates can be done hierarchically, like MPD.

MPD「公開時間」は、MPDの固有の識別子と、MPDがいつ出されたかとを提供する。それはまた、更新手順のためのアンカーを提供する。 The MPD “Public Time” provides a unique identifier for the MPD and when the MPD was issued. It also provides an anchor for the update procedure.

MPD更新ボックスは、「styp」ボックスの後のMPDにおいて見い出され、かつBox Type=「mupe」によって定義されてよく、コンテナを必要とせず、必須ではなく、かつ0または1という量を有する。MPD更新ボックスは、セグメントがその一部であるメディアプレゼンテーションについての情報を含む。 The MPD update box is found in the MPD after the “styp” box and may be defined by Box Type = “mupe”, does not require a container, is not required, and has an amount of 0 or 1. The MPD update box contains information about the media presentation that the segment is part of.

例示的なシンタックスは次の通りである。 An exemplary syntax is as follows.

aligned(8) class MPDUpdateBox aligned (8) class MPDUpdateBox

extends FullBox('mupe'){ extends FullBox ('mupe') {

unsigned int(3) mpd information flags; unsigned int (3) mpd information flags;

unsigned int(l) new-location flag; unsigned int (l) new-location flag;

unsigned int(28) latest_mpd_update time; unsigned int (28) latest_mpd_update time;

///以下は任意選択のフィールドである /// The following are optional fields

string mpd_location string mpd_location

} }

クラスMPDUpdateBoxの様々なオブジェクトのセマンティクスは次の通りであり得る。 The semantics of various objects of class MPDUpdateBox can be as follows:

mpd_information_flags:次のうちの0個以上の論理和
0x00 Media Presentation Descriptionの今の更新 mpd_information_flags: OR of zero or more of the following
0x00 Media Presentation Description now updated

0x01 Media Presentation Descriptionの前方の更新
0x02 プレゼンテーションの終了 0x01 Media Presentation Description forward update
0x02 End of presentation

0x03-0x07 予備 0x03-0x07 Reserved

new_location flag:1に設定される場合、新たなMedia Presentation Descriptionがmpd_locationで規定される新たな位置において利用可能である。 When new_location flag is set to 1, a new Media Presentation Description can be used at a new location specified by mpd_location.

latest_mpd_update time:最遅のMPDのMPD発行時間に対する、MPD更新が必要になるまでの時間(ms単位)を規定する。クライアントは、今との間の任意の時間にMPDを更新することを選び得る。 latest_mpd_update time: Specifies the time (in ms) until MPD update is required for the MPD issue time of the latest MPD. The client may choose to update the MPD at any time between now.

mpd_locationは、new_location_flagが設定される場合にのみ存在し、存在する場合、新たなMedia Presentation Descriptionに対するUniform Resource Locatorを提供する。 mpd_location exists only when new_location_flag is set, and when it exists, provides a Uniform Resource Locator for a new Media Presentation Description.

更新によって使用される帯域幅が問題である場合、サーバは、これらの一部のみが更新されるように、何らかのデバイス能力のためのMPDに提供することができる。 If the bandwidth used by the update is an issue, the server can provide the MPD for some device capability so that only some of these are updated.

タイムシフト視聴およびネットワークPVR
タイムシフト視聴がサポートされる場合、セッションの継続時間の間に、2つ以上のMPDまたはMovie Headerが有効であるということが起こり得る。この場合、必要なときにMPDを更新することによって、しかし有効性機構または期間の概念を追加することによって、有効なMPDが時間枠全体で存在することができる。これは、任意のMPDおよびMovieヘッダが、タイムシフト視聴のための有効な時間枠内にある任意の期間に対して告知されることを、サーバが確実にし得るということを意味する。利用可能なMPDおよび現在のプレゼンテーションに対するメタデータが有効であることを確実にするのは、クライアント次第である。小規模のMPD更新のみを使用した、ライブセッションのネットワークPVRセッションへの移行も、サポートされ得る。 Timeshift viewing and network PVR
If time-shifted viewing is supported, it can happen that more than one MPD or Movie Header is valid for the duration of the session. In this case, a valid MPD can exist throughout the time frame by updating the MPD when needed, but by adding the concept of validity mechanism or duration. This means that the server can ensure that any MPD and Movie headers are announced for any period that is within a valid time frame for time-shifted viewing. It is up to the client to ensure that the available MPD and metadata for the current presentation are valid. Migration of live sessions to network PVR sessions using only small MPD updates may also be supported.

特別なメディアセグメント
ISO/IEC 14496-12のファイルフォーマットがブロック要求ストリーミングシステム内で使用されるときの問題は、上で説明されたように、複数のファイル中のプレゼンテーションの単一のバージョンに対するメディアデータを、連続的な時間セグメントへと配列された状態で記憶するのが有利であり得ることである。さらに、各ファイルがランダムアクセスポイントで開始するように配列するのが有利であり得る。さらに、ビデオ符号化プロセスの間に探索点の位置を選び、プレゼンテーションを、符号化プロセスの間に行われた探索点の選択に基づいて、探索点で各々開始する複数のファイルへとセグメント化することが有利であることがあり、各ランダムアクセスポイントは、ファイルの始めに配置されることもまたはされないこともあるが、各ファイルはランダムアクセスポイントで開始する。上で説明された特性を伴う一実施形態では、プレゼンテーションメタデータ、またはMedia Presentation Descriptionは、各ファイルの正確な継続時間を含んでよく、継続時間はたとえば、ファイルのビデオメディアの開始時間と次のファイルのビデオメディアの開始時間の差を意味するものと理解される。プレゼンテーションメタデータ中のこの情報に基づいて、クライアントは、メディアプレゼンテーションのためのグローバルな時間軸と、各ファイル内のメディアのためのローカルな時間軸との対応付けを構築することが可能である。 Special media segment
The problem when the ISO / IEC 14496-12 file format is used in a block request streaming system is that the media data for a single version of the presentation in multiple files, as described above, is continuous. It may be advantageous to store the data in a time segment. Furthermore, it may be advantageous to arrange each file to start with a random access point. Further, the location of the search point is selected during the video encoding process, and the presentation is segmented into a plurality of files, each starting at the search point, based on the search point selection made during the encoding process. It may be advantageous that each random access point may or may not be located at the beginning of the file, but each file starts with a random access point. In one embodiment with the characteristics described above, the presentation metadata, or Media Presentation Description, may include the exact duration of each file, such as the start time of the video media of the file and the following: It is understood to mean the difference in the start time of the video media of the file. Based on this information in the presentation metadata, the client can construct an association between a global timeline for media presentation and a local timeline for media in each file.

別の実施形態では、プレゼンテーションメタデータのサイズは、有利には、代わりに、各々のファイルまたはセグメントが同じ継続時間を有することを規定することによって、低減され得る。しかしながら、この場合、およびメディアファイルが上の方法に従って構築される場合、ランダムアクセスポイントがファイルの開始からちょうど規定された継続時間の位置にある点に存在しないことがあるので、各ファイルの継続時間は、media presentation descriptionで規定される継続時間とは厳密には等しくないことがある。 In another embodiment, the size of the presentation metadata can be advantageously reduced instead by prescribing that each file or segment has the same duration. However, in this case, and if the media file is built according to the above method, the random access point may not exist at a point that is exactly at the specified duration from the start of the file, so the duration of each file May not be exactly equal to the duration specified in the media presentation description.

上で言及された相違にもかかわらず、ブロック要求ストリーミングシステムの正常な動作を実現するための、本発明のさらなる実施形態がここで説明される。この方法では、ファイル内のメディアのローカルな時間軸(ファイル中のメディアサンプルの復号および合成のタイムスタンプがそれに対してISO/IEC 14496-12に従って規定される、タイムスタンプ0から開始する時間軸が意図される)の、グローバルなプレゼンテーション時間軸へのマッピングを規定する要素が、各ファイル内で提供され得る。このマッピング情報は、ローカルなファイル時間軸でのタイムスタンプ0に対応する、グローバルなプレゼンテーション時間における単一のタイムスタンプを含み得る。マッピング情報は、代替的に、ローカルなファイル時間軸におけるタイムスタンプ0に対応するグローバルなプレゼンテーション時間と、ファイルの開始に対応するグローバルなプレゼンテーション時間との間の差を、プレゼンテーションメタデータで提供された情報に従って規定する、オフセット値を含み得る。 In spite of the differences noted above, further embodiments of the present invention for realizing normal operation of a block request streaming system will now be described. In this method, the local time axis of the media in the file (the time axis starting at time stamp 0, for which the time stamp for decoding and combining the media samples in the file is defined according to ISO / IEC 14496-12, An element that defines the mapping of the intended) to the global presentation timeline may be provided in each file. This mapping information may include a single time stamp at the global presentation time, corresponding to time stamp 0 on the local file timeline. The mapping information was alternatively provided in the presentation metadata as the difference between the global presentation time corresponding to time stamp 0 on the local file timeline and the global presentation time corresponding to the start of the file. It may include an offset value that is defined according to the information.

そのようなボックスの例は、たとえば、トラックフラグメントメディア調整(「tfma」)ボックスとともに、トラックフラグメント復号時間(「tfdt」)ボックスまたはトラックフラグメント調整ボックス(「tfad」)であり得る。 Examples of such boxes may be, for example, a track fragment decoding time (“tfdt”) box or a track fragment adjustment box (“tfad”) along with a track fragment media adjustment (“tfma”) box.

セグメントリスト生成を含む例示的なクライアント
例示的なクライアントがここで説明される。これは、MPDの適切な生成および更新を確実にするための、サーバのための参照クライアントとして使用され得る。 Exemplary Client Including Segment List Generation An exemplary client will now be described. This can be used as a reference client for the server to ensure proper generation and update of the MPD.

HTTPストリーミングクライアントは、MPDにおいて提供される情報によって導かれる。クライアントは、時間T、すなわち、MPDの受信に成功することが可能であった時間においてクライアントが受信したMPDへのアクセス権を有すると仮定される。受信が成功したと判定することは、クライアントが更新されたMPDを取得すること、または、前の受信の成功からMPDが更新されていないことをクライアントが確認することを含み得る。 HTTP streaming clients are guided by information provided in the MPD. The client is assumed to have access to the MPD received by the client at time T, ie, the time when it was possible to successfully receive the MPD. Determining that the reception was successful may include the client obtaining an updated MPD or the client confirming that the MPD has not been updated since the previous successful reception.

例示的なクライアントの挙動が紹介される。連続的なストリーミングサービスをユーザに提供するために、クライアントはまずMPDを解析し、場合によってはプレイリストを使用して、またはURL構築規則を使用して、以下で説明されるようなアカウントセグメントリスト生成手順を考慮して、現在のシステム時間におけるクライアントローカル時間に対する各表現のためのアクセス可能なセグメントのリストを作成する。次いで、クライアントは、表現属性中の情報および他の情報、たとえば、利用可能な帯域幅およびクライアント能力に基づいて、1つまたは複数の表現を選択する。グルーピングに応じて、表現は、単独で、または他の表現と一緒に提示され得る。 An example client behavior is introduced. To provide a continuous streaming service to the user, the client first parses the MPD, possibly using playlists, or using URL construction rules, and the account segment list as described below Consider the generation procedure, create a list of accessible segments for each representation for client local time at the current system time. The client then selects one or more representations based on information in the representation attributes and other information, eg, available bandwidth and client capabilities. Depending on the grouping, the representation may be presented alone or together with other representations.

各表現に対して、クライアントは、表現に対する「moov」ヘッダのようなバイナリメタデータ、および、もしあれば、選択された表現のメディアセグメントを取得する。クライアントは、場合によってはセグメントリストを使用して、セグメントまたはセグメントのバイト範囲を要求することによって、メディアコンテンツにアクセスする。クライアントは最初に、プレゼンテーションを開始する前にメディアをバッファリングすることができ、プレゼンテーションが開始すると、クライアントは、MPD更新手順を考慮してセグメントまたはセグメントの一部を継続的に要求することによって、メディアコンテンツの消費を続ける。 For each representation, the client obtains binary metadata, such as a “moov” header for the representation, and the media segment of the selected representation, if any. A client accesses media content by requesting a segment or segment byte range, possibly using a segment list. The client can initially buffer the media before starting the presentation, and once the presentation starts, the client can continuously request a segment or part of a segment considering the MPD update procedure, Continue to consume media content.

クライアントは、更新されたMPD情報および/または環境からの更新された情報、たとえば、利用可能な帯域幅の変化を考慮して、表現を切り替えることができる。ランダムアクセスポイントを含むメディアセグメントに対する任意の要求によって、クライアントは異なる表現に切り替えることができる。前に進むと、すなわち、現在のシステム時間(プレゼンテーションに対する時間を表すために「NOW時間」と呼ばれる)が進むと、クライアントはアクセス可能なセグメントを消費する。NOW時間の各々の進行によって、クライアントは、場合によっては、本明細書で規定される手順に従って、各プレゼンテーションに対するアクセス可能なセグメントのリストを拡張する。 The client can switch the representation in view of updated MPD information and / or updated information from the environment, eg, changes in available bandwidth. Any request for a media segment that includes a random access point can cause the client to switch to a different representation. Moving forward, ie, the current system time (called “NOW time” to represent the time for the presentation), the client consumes an accessible segment. With each progression of NOW time, the client expands the list of accessible segments for each presentation, possibly in accordance with the procedures defined herein.

メディアプレゼンテーションの終わりにまだ到達しておらず、かつ、現在の再生時間が、任意の消費しているまたは消費されるべきプレゼンテーションに対する、MPDにおいて記述されるメディアのメディアを使い果たすとクライアントが予測する閾値内に入った場合、クライアントは、新たなフェッチ時間受信時間Tによって、MPDの更新を要求することができる。受信されると、クライアントは次いで、アクセス可能なセグメントリストの生成において、場合によっては更新されるMPDおよび新たな時間Tを考慮する。図29は、クライアントでの異なる時間におけるライブサービスのための手順を示す。 Threshold at which the client predicts that the end of the media presentation has not yet been reached and the current playback time will run out of the media of the media described in the MPD for any consumed or consumed presentation In the case of entering, the client can request the MPD update by the new fetch time reception time T. Once received, the client then considers the possibly updated MPD and a new time T in generating the accessible segment list. FIG. 29 shows the procedure for live service at different times on the client.

アクセス可能なセグメントリストの生成
HTTPストリーミングクライアントがMPDへのアクセス権を有し、実時間NOWに対してアクセス可能なセグメントリストを生成することを望み得ると仮定する。クライアントはある精度でグローバルな時間基準に対して同期されるが、有利には、HTTPストリーミングサーバに対する直接の同期は必要とされない。 Generate accessible segment list
Assume that an HTTP streaming client has access to the MPD and may wish to generate a segment list accessible to real-time NOW. Clients are synchronized to a global time base with some accuracy, but advantageously no direct synchronization to the HTTP streaming server is required.

各表現に対するアクセス可能なセグメントリストは、好ましくは、セグメント開始時間およびセグメントロケータのペアのリストとして定義され、セグメント開始時間は、一般性を失うことなく、表現の開始に対するものとして定義され得る。この概念が適用される場合、表現の開始は、期間の開始と揃えられ得る。それ以外の場合は、表現の開始はメディアプレゼンテーションの始めにあり得る。 The accessible segment list for each representation is preferably defined as a list of segment start times and segment locator pairs, which can be defined as for the start of the representation without loss of generality. If this concept is applied, the start of the expression may be aligned with the start of the period. Otherwise, the start of the presentation can be at the beginning of the media presentation.

クライアントは、たとえば、本明細書でさらに定義されるような、URL構築規則およびタイミングを使用する。説明されたセグメントのリストが取得されると、このリストはさらにアクセス可能なセグメントへと制約され、これは、完全なメディアプレゼンテーションのセグメントのサブセットであり得る。構築は、クライアントのNOW時間における、時計の現在値によって支配される。一般に、セグメントは、利用可能時間のセットの中の、任意の時間NOWに対してのみ利用可能である。この枠の外にある時間NOWに対しては、セグメントは利用可能ではない。加えて、ライブサービスでは、何らかの時間checktimeが、どの程度未来までメディアが記述されるかについての情報を提供することを前提とする。checktimeは、MPDにより記録されるメディア時間軸上で定義される。クライアントの再生時間がchecktimeに達すると、クライアントは有利には新たなMPDを要求する。 The client uses URL construction rules and timing, for example, as further defined herein. Once the list of described segments is obtained, this list is further constrained to accessible segments, which may be a subset of the segments of the complete media presentation. The construction is governed by the current value of the clock at the client's NOW time. In general, a segment is only available for any time NOW in the set of available times. Segments are not available for time NOW outside this window. In addition, the live service assumes that some time checktime provides information about how far into the future the media will be described. The checktime is defined on the media time axis recorded by the MPD. When the client playback time reaches checktime, the client advantageously requests a new MPD.

次いで、セグメントリストはさらに、利用可能なメディアセグメントが、メディアセグメントの開始時間と表現の開始時間の合計が、NOW引くtimeShiftBufferDepth引く最後の記述されたセグメントの継続時間と、checktimeまたはNOWのいずれか小さい方の値との間隔に入るようなメディアセグメントのみであるように、MPD属性TimeShiftBufferDepthとともにchecktimeによって制約される。 The segment list then further shows that the available media segment is the sum of the start time of the media segment and the start time of the expression, NOW minus timeShiftBufferDepth minus the last described segment duration and either checktime or NOW It is constrained by checktime together with the MPD attribute TimeShiftBufferDepth so that only the media segment falls within the interval with the other value.

スケーラブルブロック
場合によっては、受信機において現在受信されている1つまたは複数のブロックが、プレゼンテーションの一時停止を伴わずに再生されるように時間内に完全に受信される見込みがなくなるほど、利用可能な帯域幅が少なくなる。受信機は、そのような状況を事前に検出し得る。たとえば、受信機は、6単位時間ごとに5単位のメディアを符号化するブロックを受信しており、4単位のメディアのバッファを有すると判定し得るので、受信機は、約24単位時間後に、プレゼンテーションをストールまたは一時休止させなければならないと予測し得る。十分な通知によって、受信機は、たとえば、ブロックの現在のストリームを中止することによってそのような状況に反応し、コンテンツの異なる表現からの1つまたは複数のブロック、たとえば、単位再生時間当たりにより少量の帯域幅を使用するブロックを要求することを開始し得る。たとえば、受信機が、同じサイズのブロックに対して少なくとも20%長いビデオ時間をブロックが符号化した表現へと切り替えると、受信機は、帯域幅の状況が改善されるまで、ストールする必要性をなくすことが可能であり得る。 Scalable blocks In some cases, one or more blocks that are currently received at the receiver can be used so that it is unlikely that they will be completely received in time to play without pausing the presentation Less bandwidth. The receiver can detect such a situation in advance. For example, the receiver receives a block that encodes 5 units of media every 6 unit hours and may determine that it has a buffer of 4 units of media, so after about 24 unit hours, the receiver It can be expected that the presentation must be stalled or paused. With sufficient notification, the receiver reacts to such a situation, for example, by stopping the current stream of blocks, and one or more blocks from different representations of content, for example, a smaller amount per unit playback time. May start requesting blocks that use the bandwidth of. For example, if the receiver switches to a block encoded representation of at least 20% longer video time for the same size block, the receiver will need to stall until the bandwidth situation improves. It may be possible to eliminate.

しかしながら、受信機に、中止された表現からすでに受信されていたデータを完全に廃棄させるのは、無駄であり得る。本明細書で説明されるブロックストリーミングシステムの実施形態では、各ブロック内のデータは、ブロック内のデータのいくつかのプレフィックスが、受信されているブロックの残りを伴わずにプレゼンテーションを続けるために使用され得るように、符号化され構成され得る。たとえば、スケーラブルビデオ符号化のよく知られている技法が使用され得る。そのようなビデオ符号化方法の例には、H.264 Scalable Video Coding(SVC)またはH.264 Advanced Video Coding(AVC)の時間的スケーラビリティがある。有利には、この方法は、1つまたは複数のブロックの受信が、たとえば利用可能な帯域幅の変化が原因で中止され得る場合であっても、プレゼンテーションが、受信されているブロックの一部に基づいて継続することを可能にする。別の利点は、単一のデータファイルがコンテンツの複数の異なる表現に対するソースとして使用され得る、ということである。これは、たとえば、要求された表現に対応するブロックのサブセットを選択する、HTTP partial GET要求を利用することによって可能である。 However, it may be wasteful to have the receiver completely discard data that has already been received from the canceled representation. In the block streaming system embodiments described herein, the data in each block is used by several prefixes of the data in the block to continue the presentation without the rest of the block being received. Can be encoded and configured as is possible. For example, the well-known technique of scalable video coding can be used. Examples of such video encoding methods include temporal scalability of H.264 Scalable Video Coding (SVC) or H.264 Advanced Video Coding (AVC). Advantageously, this method allows the presentation to be part of the block being received, even if reception of one or more blocks may be interrupted, for example due to changes in available bandwidth. Allows to continue based on. Another advantage is that a single data file can be used as a source for multiple different representations of content. This is possible, for example, by utilizing an HTTP partial GET request that selects a subset of blocks corresponding to the requested representation.

本明細書で詳述される1つの改善は、改善されたセグメント、スケーラブルなセグメントマップである。スケーラブルなセグメントマップは、セグメント中の異なるレイヤの位置を含むので、クライアントはそれに従ってセグメントの一部にアクセスし、レイヤを抽出することができる。別の実施形態では、セグメント中のメディアデータは、セグメントの始めから徐々にデータをダウンロードするときに、セグメントの品質が上がるように、順序付けられる。別の実施形態では、フラグメント要求がスケーラブルな手法に対応するように行われ得るように、品質の漸進的な上昇が、セグメントに含まれる各ブロックまたはフラグメントに対して適用される。 One improvement detailed herein is an improved segment, scalable segment map. A scalable segment map includes the location of different layers in the segment so that the client can access portions of the segment and extract the layers accordingly. In another embodiment, the media data in the segments is ordered so that the quality of the segments increases when data is downloaded gradually from the beginning of the segments. In another embodiment, a gradual increase in quality is applied to each block or fragment included in the segment so that the fragment request can be made to correspond to a scalable approach.

図12は、スケーラブルなブロックのある態様を示す図である。その図では、送信機1200は、メタデータ1202、スケーラブルレイヤ1(1204)、スケーラブルレイヤ2(1206)、およびスケーラブルレイヤ3(1208)を出力し、後ろに書かれたものの方が遅れている。受信機1210は次いで、メタデータ1202、スケーラブルレイヤ1(1204)、およびスケーラブルレイヤ2(1206)を使用して、メディアプレゼンテーション1212を提示することができる。 FIG. 12 is a diagram illustrating an aspect of a scalable block. In the figure, the transmitter 1200 outputs metadata 1202, scalable layer 1 (1204), scalable layer 2 (1206), and scalable layer 3 (1208), and the one written behind is delayed. Receiver 1210 can then present media presentation 1212 using metadata 1202, scalable layer 1 (1204), and scalable layer 2 (1206).

独立したスケーラビリティレイヤ
上で説明されたように、受信機がメディアデータの特定の表現の要求されたブロックを、再生のために時間内に受信することが不可能である場合に、ストールしなければならないことは、悪いユーザ体験をもたらすので、ブロック要求ストリーミングシステムにとって望ましくない。プレゼンテーションの任意の所与の部分が時間内に受信されない見込みがほとんどなくなるように、利用可能な帯域幅よりもはるかに少なくなるように、選ばれる表現のデータレートを制約することによって、ストールは回避され、減らされ、または軽減され得るが、この戦略は、利用可能な帯域幅によって原理的にサポートされ得るものよりもメディア品質が必然的にはるかに低くなるという欠点を有する。可能なものよりも低い品質のプレゼンテーションは、悪いユーザ体験として解釈され得る。したがって、ブロック要求ストリーミングシステムの設計者は、クライアント手順、クライアントのプログラミング、またはハードウェアの構成の設計において、ユーザが悪いメディア品質を被り得る、利用可能な帯域幅よりもはるかに低いデータレートを有するコンテンツバージョンを要求すること、または利用可能な帯域幅が変化するにつれてプレゼンテーションの間にユーザが高確率の一時停止を被り得る、利用可能な帯域幅に近いデータレートを有するコンテンツバージョンを要求することのいずれかの、選択に直面する。 Independent scalability layer As described above, if the receiver is unable to receive the requested block of a particular representation of media data in time for playback, it must stall Failure to do so is undesirable for block request streaming systems because it results in a bad user experience. Stalls are avoided by constraining the data rate of the chosen representation to be much less than the available bandwidth so that there is almost no chance that any given part of the presentation will not be received in time Although this strategy can be reduced, reduced or alleviated, this strategy has the disadvantage that the media quality is necessarily much lower than what can be supported in principle by the available bandwidth. A lower quality presentation than is possible can be interpreted as a bad user experience. Thus, block request streaming system designers have a data rate that is much lower than the available bandwidth that users can suffer from poor media quality in designing client procedures, client programming, or hardware configurations. Requesting a content version, or requesting a content version with a data rate close to the available bandwidth, where the user can experience a high probability of pauses during the presentation as the available bandwidth changes Either face the choice.

そのような状況を処理するために、本明細書で説明されるブロックストリーミングシステムは、受信機がレイヤ化された要求を行うことができ送信機がレイヤ化された要求に応答できるように、複数のスケーラビリティレイヤを独立に処理するように構成され得る。 In order to handle such situations, the block streaming system described herein provides multiple receivers so that receivers can make layered requests and transmitters can respond to layered requests. The scalability layers may be configured to be processed independently.

そのような実施形態では、各ブロックに対する符号化メディアデータは、本明細書では「レイヤ」と呼ばれる複数のつながっていない断片に区分され得るので、レイヤの組合せは、ブロックに対するメディアデータの全体を含み、レイヤのいくつかのサブセットを受信したクライアントは、コンテンツの表現の復号および提示を実行することができる。この手法では、ストリーム中のデータの順序は、連続的な範囲が品質的に向上するようなものであり、メタデータはこのことを反映する。 In such embodiments, the encoded media data for each block may be partitioned into a plurality of unconnected fragments, referred to herein as “layers”, so that the layer combination includes the entire media data for the block. A client that receives several subsets of layers can perform decoding and presentation of the representation of the content. In this method, the order of data in the stream is such that the continuous range improves in quality, and the metadata reflects this.

上記の性質を伴うレイヤを生成するために使用され得る技法の例は、たとえばITU-T Standards H.264/SVCにおいて記載されるような、Scalable Video Codingの技法である。上記の性質を伴うレイヤを生成するために使用され得る技法の別の例は、たとえばITU-T Standards H.264/SVCにおいて提供されるような、時間的スケーラビリティレイヤの技法である。 An example of a technique that can be used to generate a layer with the above properties is the technique of Scalable Video Coding, as described, for example, in ITU-T Standards H.264 / SVC. Another example of a technique that may be used to generate a layer with the above properties is a temporal scalability layer technique, such as provided in ITU-T Standards H.264 / SVC.

これらの実施形態では、メタデータは、MPDにおいて、またはセグメント自体において提供されてよく、このことは、任意の所与のブロックの個々のレイヤおよび/またはレイヤの組合せおよび/または複数のブロックの所与のレイヤおよび/または複数のブロックのレイヤの組合せに対する要求の構築を可能にする。たとえば、ブロックを含むレイヤが単一のファイル内に記憶されてよく、個々のレイヤに対応するファイル内のバイト範囲を規定するメタデータが提供されてよい。 In these embodiments, the metadata may be provided in the MPD or in the segment itself, which means that individual layers and / or combinations of layers and / or multiple blocks in any given block Allows construction of requirements for a given layer and / or a combination of layers of multiple blocks. For example, layers containing blocks may be stored in a single file, and metadata defining byte ranges in the file corresponding to individual layers may be provided.

バイト範囲を規定することが可能なファイルダウンロードプロトコル、たとえばHTTP1.1が、個々のレイヤまたは複数のレイヤを要求するために使用され得る。さらに、本開示を検討した当業者には明らかなように、可変サイズのブロックの構築、要求、およびダウンロード、ならびにブロックの可変の組合せに関して上で説明された技法は、この文脈においても適用され得る。 A file download protocol capable of defining byte ranges, eg HTTP 1.1, can be used to request individual layers or multiple layers. Further, as will be apparent to those skilled in the art who have reviewed the present disclosure, the techniques described above with respect to variable size block construction, requirements, and downloads, and variable combinations of blocks, may also be applied in this context. .

組合せ
上で説明されたような、レイヤへと区分されたメディアデータの使用による既存の技法と比較して、ユーザ体験の改善および/またはサービングインフラストラクチャの能力の要件の低減を達成するために、ブロック要求ストリーミングクライアントによって有利に利用され得る、いくつかの実施形態がここで説明される。 To achieve improved user experience and / or reduced serving infrastructure capability requirements compared to existing techniques through the use of media data partitioned into layers as described above Several embodiments are described herein that can be advantageously utilized by block request streaming clients.

第1の実施形態では、ブロック要求ストリーミングシステムの既知の技法は、いくつかの場合には異なるバージョンのコンテンツが異なる組合せのレイヤによって置き換えられるという修正を伴って、適用され得る。すなわち、既存のシステムがコンテンツの2つの別個の表現を提供し得る場合、ここで説明される改善されたシステムは2つのレイヤを提供することができ、既存のシステムにおけるコンテンツの1つの表現が、改善されたシステムにおける第1のレイヤに対して、ビットレート、品質および場合によっては他の尺度に関して同様である場合、既存のシステムにおけるコンテンツの第2の表現は、改善されたシステムにおける2つのレイヤの組合せに対して、ビットレート、品質および場合によっては他の尺度に関して同様である。結果として、改善されたシステム内で必要とされる記憶容量は、既存のシステムにおいて必要とされるものと比較して低減される。さらに、既存のシステムのクライアントは、1つの表現または他の表現のブロックに対する要求を出すことができるが、改善されたシステムのクライアントは、ブロックの第1のレイヤまたは両方のレイヤのいずれかに対する要求を出すことができる。結果として、2つのシステムにおけるユーザ体験は同様である。さらに、異なる品質に対しても、より高い確率でキャッシュされる共通のセグメントが使用されるので、改善されたキャッシングが実現される。 In the first embodiment, known techniques of a block request streaming system may be applied with modifications that in some cases different versions of content are replaced by different combinations of layers. That is, if an existing system can provide two separate representations of content, the improved system described herein can provide two layers, and one representation of content in an existing system can be If the first layer in the improved system is similar in terms of bit rate, quality and possibly other measures, the second representation of the content in the existing system is the two layers in the improved system. The same is true for the bit rate, quality, and possibly other measures. As a result, the storage capacity required in the improved system is reduced compared to that required in existing systems. In addition, existing system clients can issue requests for one representation or block of other representations, but improved system clients can request for either the first layer of the block or both layers. Can be issued. As a result, the user experience on the two systems is similar. Furthermore, improved caching is achieved because a common segment that is cached with higher probability is used for different qualities.

第2の実施形態では、ここで説明されるレイヤの方法を利用する改善されたブロック要求ストリーミングシステム中のクライアントは、メディア符号化のいくつかのレイヤの各々に対して別個のデータバッファを保持し得る。クライアントデバイス内のデータ管理の当業者には明らかなように、これらの「別個の」バッファは、別個のバッファに対する物理的または論理的に別個のメモリ領域の割振りによって、または、バッファリングされたデータが単一のまたは複数のメモリ領域に記憶され、異なるレイヤからのデータの分離が別個のレイヤからのデータの記憶位置に対する参照を含むデータ構造の使用を通じて論理的に達成される、他の技法によって実装され得るので、以下では、「別個のバッファ」という用語は、別個のレイヤのデータが別々に識別され得る任意の方法を含むものと理解されるべきである。クライアントは、各バッファの占有率に基づいて、各ブロックの個々のレイヤに対する要求を出し、たとえば、レイヤは、1つのレイヤからのデータに対する要求が、優先順位のより低いレイヤに対する任意のバッファの占有率がそのより低いレイヤに対する閾値より低い場合に出され得ないように、優先順序で並べられ得る。この方法では、より優先順位の高いレイヤを受信することも要求される帯域幅を利用可能な帯域幅が下回る場合、より低いレイヤのみが要求されるように、優先権が、より優先順位の低いレイヤからデータを受信することに対して与えられる。さらに、たとえば、より低いレイヤがより高い閾値を有するように、異なるレイヤと関連付けられる閾値は異なり得る。より高いレイヤに対するデータがブロックの再生時間の前に受信され得ないように、利用可能な帯域幅が変化する場合、より低いレイヤに対するデータは、必然的にすでに受信されているであろうから、プレゼンテーションはより低いレイヤ単独で継続し得る。バッファ占有率の閾値は、データのバイト、バッファに含まれるデータの再生継続時間、ブロックの数、または任意の他の適切な尺度に関して定義され得る。 In a second embodiment, a client in an improved block request streaming system that utilizes the layer method described herein maintains a separate data buffer for each of several layers of media encoding. obtain. As will be apparent to those skilled in the art of data management within a client device, these “separate” buffers can be obtained by allocating physically or logically separate memory areas to separate buffers or by buffered data. By other techniques in which data is stored in a single or multiple memory areas, and the separation of data from different layers is logically achieved through the use of data structures that include references to storage locations of data from separate layers In the following, it should be understood that the term “separate buffer” includes any way in which data of separate layers can be identified separately, as may be implemented. The client makes a request for an individual layer of each block based on the occupancy of each buffer; for example, a layer occupies any buffer for a lower priority layer when a request for data from one layer It can be ordered in order of preference so that it cannot be issued if the rate is lower than the threshold for that lower layer. In this way, the priority is lower priority so that only the lower layer is required if the available bandwidth falls below the required bandwidth to receive the higher priority layer. Given for receiving data from a layer. Further, the thresholds associated with different layers may be different, for example, so that the lower layer has a higher threshold. If the available bandwidth changes so that data for the higher layer cannot be received before the playback time of the block, the data for the lower layer will inevitably already be received, The presentation can continue on the lower layer alone. The buffer occupancy threshold may be defined in terms of bytes of data, the playback duration of data contained in the buffer, the number of blocks, or any other suitable measure.

第3の実施形態では、第1および第2の実施形態の方法は、レイヤのサブセットを各々含む複数のメディアプレゼンテーションが提供されるように(第1の実施形態のように)、かつ、第2の実施形態が表現内のレイヤのサブセットに適用されるように、組み合わされ得る。 In the third embodiment, the method of the first and second embodiments is such that a plurality of media presentations each including a subset of layers is provided (as in the first embodiment) and the second Can be combined such that the embodiments apply to a subset of the layers in the representation.

第4の実施形態では、第1の、第2の、および/または第3の実施形態の方法は、たとえば、独立した表現の少なくとも1つが、第1の、第2の、および/または第3の実施形態の技法が適用される複数のレイヤを含むように、コンテンツの複数の独立した表現が提供される実施形態と組み合わされ得る。 In the fourth embodiment, the method of the first, second, and / or third embodiment includes, for example, that at least one of the independent representations is the first, second, and / or third The embodiments can be combined with embodiments in which multiple independent representations of content are provided to include multiple layers to which the techniques of the embodiments are applied.

進化したバッファマネージャ
バッファモニタ126(図2参照)と組み合わせて、進化したバッファマネージャが、クライアント側バッファを最適化するために使用され得る。ブロック要求ストリーミングシステムは、メディア再生が迅速に開始でき円滑に継続でき、一方、最大限のメディア品質をユーザまたは宛先デバイスに同時に提供できることを、確実にするのを望み得る。これは、最高のメディア品質を有するがプレゼンテーションの一時休止を強いることなく再生されるように迅速に開始されその後で時間内に受信されることも可能であるブロックを、クライアントが要求することを必要とし得る。 Advanced Buffer Manager In combination with the buffer monitor 126 (see FIG. 2), an advanced buffer manager can be used to optimize the client-side buffer. A block request streaming system may wish to ensure that media playback can start quickly and continue smoothly, while providing maximum media quality to the user or destination device simultaneously. This requires the client to request a block that has the highest media quality but can be quickly started to be played without forcing the presentation to be paused and then received in time. It can be.

進化したバッファマネージャを使用する実施形態では、マネージャは、メディアデータのどのブロックを要求するか、および、それらの要求をいつ行うかを決定する。たとえば、進化したバッファマネージャは、提示されるべきコンテンツに対するメタデータのセットを与えられてよく、このメタデータは、コンテンツに対して利用可能な表現、および各表現に対するメタデータのリストを含む。表現に対するメタデータは、表現のデータレートおよび他のパラメータ、たとえば、ビデオ、オーディオ、または他のコーデックおよびコーデックパラメータ、ビデオ解像度、復号の複雑さ、オーディオの言語、ならびに、クライアントにおける表現の選択に影響を与え得る任意の他のパラメータについての情報を含み得る。 In embodiments that use an evolved buffer manager, the manager determines which blocks of media data are requested and when to make those requests. For example, an evolved buffer manager may be given a set of metadata for the content to be presented, which metadata includes a representation available for the content and a list of metadata for each representation. Metadata for the representation affects the data rate and other parameters of the representation, such as video, audio, or other codec and codec parameters, video resolution, decoding complexity, audio language, and representation choice at the client May include information about any other parameters that may provide

表現に対するメタデータはまた、表現がセグメント化された先のブロックの識別子を含んでよく、これらの識別子は、ブロックを要求するためにクライアントに対して必要とされる情報を提供する。たとえば、要求プロトコルがHTTPである場合、識別子は、場合によっては、URLによって識別されるファイル内のバイト範囲またはタイムスパンを特定する追加の情報を伴う、HTTL URLであってよく、このバイト範囲またはタイムスパンは、URLによって識別されるファイル内の特定のブロックを特定する。 The metadata for the representation may also include identifiers of previous blocks that the representation has been segmented into, and these identifiers provide the information needed for the client to request the block. For example, if the request protocol is HTTP, the identifier may be an HTTL URL, possibly with additional information identifying the byte range or time span in the file identified by the URL, The time span identifies a particular block within the file identified by the URL.

ある特定の実装形態では、進化したバッファマネージャは、受信機が新たなブロックに対する要求をいつ行うかを決定し、それ自体が要求の送信を処理し得る。新規の態様では、進化したバッファマネージャは、多くの帯域幅を使いすぎることと、ストリーミング再生の間にメディアを使い果たすこととのバランスをとる、バランシング比率の値に従って、新たなブロックに対する要求を行う。 In one particular implementation, the evolved buffer manager may determine when the receiver will make a request for a new block and itself handle the transmission of the request. In a new aspect, the evolved buffer manager makes a request for a new block according to a balancing ratio value that balances using too much bandwidth with running out of media during streaming playback.

ブロックバッファ125からのバッファモニタ126によって受信される情報は、各イベントの指示、メディアデータがいつ受信されるか、どれだけの量が受信されたか、メディアデータの再生がいつ開始され停止されたか、およびメディア再生の速度を含み得る。この情報に基づいて、バッファモニタ126は、現在のバッファサイズを表す変数B_current,を計算し得る。これらの例では、B_current,は、クライアントまたは他のデバイスの1つまたは複数のバッファに含まれるメディアの量を表し、追加のブロックまたは部分的なブロックが受信されていなければ1つまたは複数のバッファに記憶されるブロックまたは部分的なブロックによって表されるメディアのすべての再生にかかるであろう時間をB_current,が表すように、B_current,は時間の単位で評価され得る。したがって、B_current,は、まだ再生されていないものではなく、クライアントにおいて利用可能なメディアデータの、通常の再生速度における「再生継続時間」を表す。 Information received by the buffer monitor 126 from the block buffer 125 includes an indication of each event, when the media data is received, how much is received, when playback of the media data is started and stopped, And the speed of media playback. Based on this information, the buffer monitor 126 may calculate a variable B _current that represents the current buffer size. In these examples, B _current , represents the amount of media contained in one or more buffers of the client or other device, and one or more if no additional or partial blocks have been received such a will try time to all media playback represented by block or partially block is stored in the buffer B _current, as represented, B _current, can be evaluated in units of time. Therefore, B _current , which has not been reproduced yet, represents the “reproduction duration” of the media data available at the client at the normal reproduction speed.

時間が経過するにつれて、B_current,の値は、メディアが再生されるにつれて減少し、ブロックに対する新たなデータが受信されるたびに増加し得る。この説明では、そのブロックのデータ全体がブロック要求器124において利用可能なときにブロックが受信されると仮定されるが、代わりに他の尺度が、たとえば部分的なブロックの受信を考慮するために使用され得ることに留意されたい。実際には、ブロックの受信は、ある期間にわたって起こり得る。 As time passes, the value of B _current , decreases as the media is played and may increase each time new data for the block is received. In this description, it is assumed that a block is received when the entire data for that block is available at block requestor 124, but instead other measures may be considered, for example, to consider partial block reception. Note that it can be used. In practice, reception of a block can occur over a period of time.

図13は、メディアが再生されブロックが受信されるにつれて、B_current,の値の経時的な変動を示す。図13に示されるように、B_current,の値は、t₀未満の時間では0であり、データが受信されていないことを示す。t₀において、最初のブロックが受信され、B_currentの値は受信されたブロックの再生継続時間と等しくなるように増加する。この時点で、再生はまだ開始されていないので、B_currentの値はt₁まで不変のままであり、t₁において、2番目のブロックが到達し、B_currentがこの2番目のブロックのサイズの分だけ増える。この時点で、再生が開始し、B_currentの値はt₂まで線形に減少し始め、t₂において3番目のブロックが到達する。 FIG. 13 shows the variation over time of the value of B _current , as the media is played and the block is received. As shown in FIG. 13, the value of B _current is 0 at a time less than t ₀ , indicating that no data is received. In t _0, the first block is received, the value of B _current is increased to be equal to the playback duration of the received block. At this point, since the reproduction is not yet started, the value of B _current remains unchanged up to t _1, at t _1, the second block arrives, B _current is the size of the second block Increase by minutes. At this point, reproduction is started, the value of B _current begins to decrease linearly until t _2, at t ₂ is the third block arrives.

B_currentの経過は、この「のこぎり波」の方式で続き、ブロックが受信されるたびにステップ状に増加し(t₂、t₃、t₄、t₅およびt₆において)、それらの時間の間で、データが再生されるにつれて滑らかに減少する。この例では、再生は、コンテンツに対する通常の再生レートで進行するので、ブロック受信の間の曲線の傾きはちょうど-1であり、1秒のメディアデータが、経過するそれぞれの1秒の実時間の間に再生されることを意味する。フレームベースのメディアが、毎秒所与の数のフレーム、たとえば毎秒24フレーム再生されると、-1という傾きは、データの各々の個々のフレームの再生を示す小さなステップ関数、たとえば、各フレームが再生されるときに1秒の-1/24のステップによって近似される。 The progression of B _current continues in this “sawtooth” fashion, increasing in steps (at t ₂ , t ₃ , t ₄ , t ₅ and t ₆ ) each time a block is received, In between, it decreases smoothly as data is played back. In this example, playback proceeds at the normal playback rate for the content, so the slope of the curve during block reception is exactly -1 and 1 second of media data is taken for each elapsed 1 second of real time. It means to be played in between. When frame-based media is played for a given number of frames per second, for example 24 frames per second, a slope of -1 is a small step function that indicates the playback of each individual frame of data, for example each frame plays When approximated by a step of -1/24 of a second.

図14は、B_currentの経時的な展開の別の例を示す。その例では、最初のブロックはt₀に到達し、再生は直ちに開始する。B_currentの値が0に達する時間t₃まで、ブロックの到達および再生が続く。これが起きると、再生のためのさらなるメディアデータが利用可能ではなく、メディアプレゼンテーションの一時停止が強いられる。時間t₄において、4番目のブロックが受信され、再生が再開し得る。したがって、この例は、4番目のブロックの受信が所望されるよりも遅く、再生の一時停止と、したがって悪いユーザ体験とをもたらした事例を示す。したがって、進化したバッファマネージャおよび他の特徴の目標は、このイベントの確率を下げつつ、高いメディア品質を同時に維持することである。 FIG. 14 shows another example of the development of B _current over time. In that example, the first block reaches t ₀ and playback starts immediately. Until time t ₃ when the value of B _current reaches zero, reaching and reproducing of the block is followed. When this happens, no additional media data is available for playback, and the media presentation is forced to pause. At time t _4, the received fourth block, playback may resume. Thus, this example shows a case where receiving the fourth block is slower than desired, resulting in a pause in playback and thus a bad user experience. Thus, the goal of the evolved buffer manager and other features is to simultaneously maintain high media quality while reducing the probability of this event.

バッファモニタ126はまた、別の尺度B_ratio(t)を計算することができ、これは、期間の長さに対する所与の期間において受信されたメディアの比である。より具体的には、B_ratio(t)はT_received/(T_now-t)に等しく、ここでT_receivedは、現在の時間よりも早い何らかの時間t,から現在の時間T_nowまでの期間に受信されたメディアの量(その再生時間によって評価される)である。 The buffer monitor 126 can also calculate another measure B _ratio (t), which is the ratio of media received in a given period to the length of the period. More specifically, B _ratio (t) is equal to T _received / (T _now -t), where T _received is from any time t earlier than the current time to the current time T _now. The amount of media received (evaluated by its playback time).

B_ratio(t)は、B_currentの変化率を評価するために使用され得る。B_ratio(t)=0は、時間tからデータが受信されていない場合であり、メディアが再生していると仮定すると、B_currentは、その時間から(T_now-t)だけ減少するであろう。B_ratio(t)=1は、時間(T_now-t)の間、メディアが再生されているのと同じ量だけメディアが受信される場合であり、B_currentは、時間tにおいて時間T_nowにおける値と同じ値を有するであろう。B_ratio(t)>1は、時間(T_now-t)の間、再生するのに必要なデータよりも多くのデータが受信された例であり、B_currentは、時間tから時間T_nowまで増加する。 B _ratio (t) can be used to evaluate the rate of change of B _current . B _ratio (t) = 0 is when no data has been received from time t, and assuming that the media is playing, B _current will decrease by (T _now -t) from that time. Let's go. B _ratio (t) = 1 is when the media is received for the time (T _now -t) by the same amount as the media is being played, and B _current is at time t _{now at} time t Will have the same value as the value. B _ratio (t)> 1 is an example in which more data than the data necessary for playback is received during time (T _now -t), and B _current is from time t to time T _now To increase.

バッファモニタ126はさらに、値Stateを計算し、これは個別の値の数をとり得る。バッファモニタ126はさらに、関数NewState(B_current,B_ratio)を備え、これは、t<T_nowに対してB_currentの現在の値およびB_ratioの値が与えられると、新たなState値を出力として提供する。B_currentおよびB_ratioがこの関数にStateの現在の値とは異なる値を返させる場合は常に、新たな値がStateに割り当てられ、この新たなState値がブロック選択器123に示される。 The buffer monitor 126 further calculates the value State, which can take the number of individual values. Buffer monitor 126 further includes a function _{_{NewState (B current, B ratio)}} , which is t <T when the value of the current value and B _ratio of B _current is given to _now, outputs a new State value As offered. _Whenever B _current and B _ratio cause this function to return a value different from the current value of State, a new value is assigned to State, and this new State value is indicated to block selector 123.

関数NewStateは、ペア(B_current,B_ratio(T_now-T_x))のすべてのあり得る値の空間への参照によって評価されてよく、ここでT_xは固定された(設定された)値であってよく、または、たとえばB_currentの値からT_xの値にマッピングする構成テーブルによってB_currentから導出されてよく、または、Stateの以前の値に依存してよい。バッファモニタ126は、この空間の1つまたは複数の区分を与えられ、各区分はつながっていない領域のセットを含み、各領域はState値によって注記される。関数NewStateの評価は次いで、区分を特定しペア(B_current,B_ratio(T_now-T_x))が入る領域を決定する動作を含む。そして戻り値は、その領域と関連付けられる注記である。単純な場合には、1つの区分のみが与えられる。より複雑な場合には、区分は、NewState関数の以前の評価の時点におけるペア(B_current,B_ratio(T_now-T_x))に、または他の要因に依存し得る。 The function NewState may be evaluated by reference to the space of all possible values of the pair (B _current , B _ratio (T _now -T _x )), where T _x is a fixed (set) value it may be at, or, for example, may be derived by a configuration table that maps from the value of B _current to a value of T _x from B _current, or may depend on the previous value of State. The buffer monitor 126 is given one or more partitions of this space, each partition containing a set of unconnected regions, each region being annotated with a State value. The evaluation of the function NewState then includes the act of identifying the partition and determining the region into which the pair (B _current , B _ratio (T _now −T _x )) will fall. The return value is a note associated with the area. In simple cases, only one segment is given. In more complex cases, the partition may depend on the pair at the time of the previous evaluation of the NewState function (B _current , B _ratio (T _now −T _x )) or other factors.

特定の実施形態では、上で説明された区分は、多数のB_currentの閾値および多数のB_ratioの閾値を含む構成テーブルに基づき得る。具体的には、B_currentの閾値を、B_thresh(0)=0、B_thresh(1)、…、B_thresh(n₁)、B_thresh(n₁+1)=∞とし、ここでn₁は、B_currentの0ではない閾値の数である。B_ratioの閾値を、B_r-thresh(0)=0、B_r-thresh(1)、…、B_r-thresh(n₂)、B_r-thresh(n₂+1)=∞とし、ここでn₂は、B_ratioの閾値の数である。これらの閾値は、セルの(n₁+1)対(n₂+1)のグリッドを含む区分を定義し、ここで、j番目の行のi番目のセルは、B_thresh(i-1)<=B_current<B_thresh(i)、かつB_r-thresh(j-1)<=B_ratio<B_r-thresh(j)である領域に対応する。上で説明されたグリッドの各セルは、たとえば、メモリに記憶された特定の値と関連付けられることによって、state値によって注記され、そして関数NewStateは、値B_currentおよびB_ratio(T_now-T_x)によって示されるセルと関連付けられるstate値を返す。 In certain embodiments, the partition described above may be based on a configuration table that includes multiple B _current thresholds and multiple B _ratio thresholds. Specifically, the threshold of B _current is set as B _thresh (0) = 0, B _thresh (1), ..., B _thresh (n ₁ ), B _thresh (n ₁ +1) = ∞, where n ₁ Is the number of non-zero thresholds for B _current . The threshold of B _ratio is B _r-thresh (0) = 0, B _r-thresh (1), ..., B _r-thresh (n ₂ ), B _r-thresh (n ₂ +1) = ∞, where N ₂ is the number of threshold values of B _ratio . These thresholds define a partition containing a grid of (n ₁ +1) vs. (n ₂ +1) of cells, where the i th cell in the j th row is B _thresh (i-1) <= B _current <B _thresh (i) and B _r-thresh (j-1) <= B _ratio <B _r-thresh (j). Each cell of the grid described above is annotated with a state value, for example by being associated with a particular value stored in memory, and the function NewState has the values B _current and B _ratio (T _now -T _x Returns the state value associated with the cell indicated by).

さらなる実施形態では、ヒステリシス値が、各閾値と関連付けられ得る。この改善された方法では、関数NewStateの評価は、次のように、時間的に修正される閾値のセットを使用して構築される時間的な区分に基づき得る。NewStateの最後の評価で選ばれたセルに対応するB_currentの範囲よりも小さい各B_currentの閾値に対して、閾値は、その閾値と関連付けられるヒステリシス値を減算することによって減らされる。NewStateの最後の評価で選ばれたセルに対応するB_currentの範囲よりも大きい各B_currentの閾値に対して、閾値は、その閾値と関連付けられるヒステリシス値を換算することによって増やされる。NewStateの最後の評価で選ばれたセルに対応するB_ratioの範囲よりも小さい各B_ratioの閾値に対して、閾値は、その閾値と関連付けられるヒステリシス値を減算することによって減らされる。NewStateの最後の評価で選ばれたセルに対応するB_ratioの範囲よりも大きい各B_ratioの閾値に対して、閾値は、その閾値と関連付けられるヒステリシス値を加算することによって増やされる。修正された閾値は、NewStateの値を評価するために使用され、次いで閾値は元の値に戻される。 In further embodiments, a hysteresis value may be associated with each threshold. In this improved method, the evaluation of the function NewState may be based on a temporal partition constructed using a set of temporally modified thresholds as follows: For each B _current threshold that is smaller than the range of B _current corresponding to the cell selected in the last evaluation of NewState, the threshold is reduced by subtracting the hysteresis value associated with that threshold. For each _Bcurrent threshold that is greater than the range of _Bcurrent corresponding to the cell selected in the last evaluation of NewState, the threshold is increased by converting the hysteresis value associated with that threshold. For each B _ratio threshold smaller than the range of B _ratio corresponding to the cell selected in the last evaluation of NewState, the threshold is reduced by subtracting the hysteresis value associated with that threshold. For each B _ratio threshold that is greater than the B _ratio range corresponding to the cell selected in the last evaluation of NewState, the threshold is increased by adding the hysteresis value associated with that threshold. The modified threshold is used to evaluate the value of NewState, and then the threshold is returned to the original value.

空間の区分を定義する他の方法は、本開示を読んだ当業者には明らかであろう。たとえば、空間全体の中の半空間を定義し、多数のそのような半空間の交差部分として各々のつながっていないセットを定義するために、区分は、B_ratioとB_currentの線形結合に基づく不等式、たとえば、実数値α0、α1、およびα2に対してα1・B_ratio+α2・B_current≦α0の形式の線形の不等式の閾値の使用によって定義され得る。 Other ways of defining the spatial divisions will be apparent to those skilled in the art having read this disclosure. For example, to define a half-space within the entire space and define each unconnected set as the intersection of many such half-spaces, the partition is an inequality based on a linear combination of B _ratio and B _current For example, for real values α0, α1, and α2, it may be defined by the use of a linear inequality threshold of the form α1 · B _ratio + α2 · B _current ≦ α0.

上の説明は、基本的なプロセスについて説明するものである。本開示を読んだリアルタイムプログラミングの当業者には明らかなように、効率的な実装が可能である。たとえば、新たな情報がバッファモニタ126に与えられるたびに、たとえばブロックに対するさらなるデータが受信されない場合、NewStateが新たな値に移行する未来の時間を計算することが可能である。次いで、タイマーがこの時間に設定され、さらなる入力がないと、このタイマーの満了により、新たなState値がブロック選択器123に送信される。結果として、計算は、連続的にではなく、新たな情報がバッファモニタ126に提供されたとき、または、タイマーが満了したときにのみ実行される必要がある。 The above description describes the basic process. An efficient implementation is possible, as will be apparent to those skilled in the art of real-time programming after reading this disclosure. For example, each time new information is provided to the buffer monitor 126, it is possible to calculate the future time when NewState transitions to a new value, for example if no further data for the block is received. The timer is then set to this time and if there is no further input, a new State value is sent to the block selector 123 upon expiration of this timer. As a result, calculations need not be performed continuously, but only when new information is provided to the buffer monitor 126 or when the timer expires.

Stateの適切な値は、「Low」、「Stable」、および「Full」である。閾値の適切なセットおよび得られるセルグリッドの例が、図15に示される。 Appropriate values for State are “Low”, “Stable”, and “Full”. An example of a suitable set of thresholds and the resulting cell grid is shown in FIG.

図15において、B_currentの閾値は、「+/-値」として下に示されるヒステリシス値とともに、ミリ秒単位で水平軸上に示されている。B_ratioの閾値は、「+/-値」として下に示されるヒステリシス値とともに、パーミル単位(すなわち、1000を乗算された)で垂直軸上に示されている。State値は、「Low」、「Stable」、および「Full」に対してそれぞれ「L」、「S」、および「F」として、グリッドセルへと注記される。 In FIG. 15, the threshold of B _current is shown on the horizontal axis in milliseconds with a hysteresis value shown below as “+/− value”. The threshold for B _ratio is shown on the vertical axis in per-mil units (ie, multiplied by 1000), with the hysteresis value shown below as “+/− value”. State values are annotated to the grid cells as “L”, “S”, and “F” for “Low”, “Stable”, and “Full”, respectively.

ブロック選択器123は、新たなブロックを要求する機会があるときは常に、ブロック要求器124から通知を受信する。上で説明されたように、ブロック選択器123は、たとえば各ブロックのメディアデータレートについての情報を含む、利用可能な複数のブロックおよびそれらのブロックのためのメタデータに関する情報を与えられる。 The block selector 123 receives a notification from the block requester 124 whenever there is an opportunity to request a new block. As explained above, the block selector 123 is provided with information regarding the available blocks and metadata for those blocks, including, for example, information about the media data rate of each block.

ブロックのメディアデータレートについての情報は、特定のブロックの実際のメディアデータレート(すなわち、バイト単位のブロックサイズを秒単位の再生時間で割ったもの)、ブロックが属する表現の平均メディアデータレート、または、ブロックが属する表現を、一時休止を伴わずに持続的に再生するために必要とされる利用可能な帯域幅の尺度、または上記の組合せを含み得る。 Information about the media data rate of a block can be the actual media data rate of a particular block (i.e., the block size in bytes divided by the playback time in seconds), the average media data rate of the representation to which the block belongs, or , May include a measure of the available bandwidth needed to continuously play the representation to which the block belongs without pause, or a combination of the above.

ブロック選択器123は、バッファモニタ126によって最後に示されたState値に基づいてブロックを選択する。このState値が「Stable」である場合、ブロック選択器123は、以前の選択されたブロックと同じ表現からブロックを選択する。選択されるブロックは、それに対するメディアデータがこれまで要求されていない表現の中の、ある期間に対するメディアデータを含む(再生の順序で)最初のブロックである。 The block selector 123 selects a block based on the State value last indicated by the buffer monitor 126. If this State value is “Stable”, the block selector 123 selects a block from the same representation as the previously selected block. The selected block is the first block (in the order of playback) that contains media data for a period of time in a representation for which no media data has ever been requested.

State値が「Low」である場合、ブロック選択器123は、以前に選択されたブロックよりもメディアデータレートが低い表現からブロックを選択する。多数の要因が、この場合の表現の正確な選択に影響を与え得る。たとえば、ブロック選択器123は、入来データの統合レートを示すものを与えられてよく、その値よりも低いメディアデータレートを伴う表現を選ぶことができる。 If the State value is “Low”, the block selector 123 selects a block from an expression that has a lower media data rate than the previously selected block. A number of factors can affect the exact choice of representation in this case. For example, the block selector 123 may be given an indication of the integration rate of incoming data and can select an expression with a media data rate lower than that value.

State値が「Full」である場合、ブロック選択器123は、以前に選択されたブロックよりもメディアデータレートが高い表現からブロックを選択する。多数の要因が、この場合の表現の正確な選択に影響を与え得る。たとえば、ブロック選択器123は、入来データの統合レートを示すものを与えられてよく、その値よりも高くないメディアデータレートを伴う表現を選ぶことができる。 If the State value is “Full”, the block selector 123 selects a block from an expression that has a higher media data rate than the previously selected block. A number of factors can affect the exact choice of representation in this case. For example, the block selector 123 may be given an indication of the rate of integration of incoming data and can select an expression with a media data rate not higher than that value.

多数の追加の要因がさらに、ブロック選択器123の動作に影響を与え得る。具体的には、バッファモニタ126が「Full」状態を示し続けたとしても、選択されたブロックのメディアデータレートが上がる頻度が制限され得る。さらに、ブロック選択器123が「Full」状態の指示を受信するが、メディアデータレートのより高いブロックが利用可能ではない(たとえば、最後に選択されたブロックがすでに、利用可能な最高のメディアデータレートであったため)ことがあり得る。この場合、ブロック選択器123は、ブロックバッファ125においてバッファリングされたメディアデータの総量が上から抑えられるように、選ばれた時間だけ、次のブロックの選択を遅らせることができる。 A number of additional factors can further affect the operation of the block selector 123. Specifically, even if the buffer monitor 126 continues to indicate the “Full” state, the frequency with which the media data rate of the selected block increases can be limited. In addition, block selector 123 receives an indication of a “Full” state, but a block with a higher media data rate is not available (for example, the last selected block is already the highest media data rate available). It may be). In this case, the block selector 123 can delay the selection of the next block by the selected time so that the total amount of media data buffered in the block buffer 125 is suppressed from above.

選択プロセスの間に考慮される追加の要因が、ブロックのセットに影響を与え得る。たとえば、利用可能なブロックは、ブロック選択器123に与えられた特定の範囲内に符号化の分解能が入る表現からのブロックに限定され得る。 Additional factors that are considered during the selection process can affect the set of blocks. For example, the available blocks may be limited to blocks from representations where the encoding resolution falls within a particular range given to the block selector 123.

ブロック選択器123はまた、メディア復号のための計算リソースの利用可能性のような、システムの他の態様を監視する他のコンポーネントから入力を受け取ることができる。そのようなリソースが乏しくなると、ブロック選択器123は、復号がメタデータ内の計算の複雑さに関してより低いと示されるブロック(たとえば、分解能またはフレームレートがより低い表現は一般に、復号の複雑さがより低い)を選ぶことができる。 Block selector 123 may also receive input from other components that monitor other aspects of the system, such as the availability of computational resources for media decoding. When such resources become scarce, the block selector 123 determines that the block whose decoding is indicated as being lower in terms of computational complexity in the metadata (e.g., a representation with a lower resolution or frame rate generally has a lower decoding complexity). Lower).

上で説明された実施形態は、バッファモニタ126内のNewState関数の評価における値B_ratioの使用が、B_currentのみを考慮する方法と比較して、プレゼンテーションの開始のときの品質のより高速な向上を可能にするという、かなりの利益をもたらす。B_ratioを考慮しなければ、メディアデータレートがより高い、したがって品質がより高いブロックをシステムが選択できるようになる前に、大量のバッファリングされたデータが蓄積され得る。しかしながら、B_ratio値が大きいと、これは、利用可能な帯域幅が、以前に受信されたブロックのメディアデータレートよりもはるかに大きいこと、および、バッファリングされたデータが比較的少なくても(すなわち、B_currentに対する小さな値)、メディアデータレートがより高い、したがって品質がより高いブロックを要求することが安全なままであることを示す。同様に、B_ratio値が低い(<1、たとえば)と、これは、利用可能な帯域幅が、以前に要求されたブロックのメディアデータレートを下回ったこと、および、したがって、B_currentが高くても、たとえば、B_current=0となる点に達しメディアの再生がストールするのを避けるために、より低いメディアデータレート、したがってより低い品質へとシステムが切り替えることを示す。この改善された挙動は、ネットワーク条件、および、したがって配信速度が非常に高速かつ動的に変化し得る環境、たとえば、モバイルデバイスへのユーザのストリーミングにおいて、特に重要であり得る。 The embodiment described above shows that the use of the value B _ratio in the evaluation of the NewState function in the buffer monitor 126 is a faster improvement in quality at the start of the presentation compared to a method that only considers B _current. Can be a significant benefit. Without considering the B _ratio , a large amount of buffered data can be accumulated before the system can select a block with a higher media data rate and hence higher quality. However, a large B _ratio value means that the available bandwidth is much larger than the media data rate of the previously received block and even if the buffered data is relatively small ( That is, a small value for B _current ) indicates that it is still safe to request a block with a higher media data rate and hence higher quality. Similarly, a low B _ratio value (<1, e.g.) means that the available bandwidth has fallen below the media data rate of the previously requested block and, therefore, the B _current is high. Also indicates, for example, that the system switches to a lower media data rate and thus lower quality in order to avoid reaching a point where B _current = 0 and stalling media playback. This improved behavior can be particularly important in environments where network conditions and thus the delivery rate can change very quickly and dynamically, for example, a user's streaming to a mobile device.

別の利点は、構成データを使用して(B_current,B_ratio)の値の空間の区分を規定することによりもたらされる。そのような構成データは、プレゼンテーションメタデータの一部として、または他の動的な手段によって、バッファモニタ126に提供され得る。実際の展開では、ユーザネットワーク接続の挙動は、ユーザの間で、および単一のユーザに対しても時間とともに大きく変動し得るので、すべてのユーザに対して良好に動作する区分を予測するのは困難であり得る。そのような構成情報をユーザへ動的に提供できることで、蓄積された経験に従って、良好な構成設定が時間とともに発展することが可能になる。 Another advantage is provided by using configuration data to define a spatial partition of the value of (B _current , B _ratio ). Such configuration data may be provided to the buffer monitor 126 as part of the presentation metadata or by other dynamic means. In actual deployments, the behavior of user network connections can vary greatly over time between users and even for a single user, so predicting a segment that works well for all users is Can be difficult. The ability to dynamically provide such configuration information to the user allows good configuration settings to evolve over time according to accumulated experience.

可変の要求サイジング
各要求が単一のブロックに対するものであり、各ブロックが短いメディアセグメントを符号化する場合、高頻度の要求が必要とされ得る。メディアブロックが短い場合、ビデオ再生はブロックからブロックへと高速に移動し、このことは、表現を変更することによって、選択されたデータレートを調整または変更するためのより頻繁な機会を受信機に与え、再生がストールすることなく継続できる確率を高める。しかしながら、高頻度の要求の欠点は、サーバネットワークに対する利用可能な帯域幅がクライアントにおいて制約されるようなある種のネットワーク、たとえば、クライアントからネットワークへのデータリンクの容量が限られている、または、無線条件の変化により短期間または長期間限られるようになり得る、3Gおよび4GワイヤレスWANのようなワイヤレスWANネットワークでは、持続可能ではないことがあるということである。 Variable request sizing If each request is for a single block and each block encodes a short media segment, frequent requests may be required. If the media block is short, video playback moves faster from block to block, which gives the receiver more frequent opportunities to adjust or change the selected data rate by changing the representation. Giving and increasing the probability that regeneration can continue without stalling. However, the drawback of frequent requests is that the capacity of certain networks where the available bandwidth to the server network is constrained at the client, such as the data link capacity from the client to the network, is limited, or Wireless WAN networks, such as 3G and 4G wireless WANs, that can become limited for short or long periods due to changes in radio conditions, may be unsustainable.

高頻度の要求はまた、サービングインフラストラクチャに対する高い負荷を意味し、これは、容量の要件について関連するコストをもたらす。したがって、欠点のすべてを伴わずに、高頻度の要求の利点のいくつかを有することが望ましい。 High frequency demand also means high load on the serving infrastructure, which results in associated costs for capacity requirements. It is therefore desirable to have some of the advantages of high frequency requirements without all of the disadvantages.

ブロックストリーミングシステムのいくつかの実施形態では、高頻度の要求の柔軟性が、より低頻度の要求と組み合わされる。この実施形態では、ブロックは、上で説明されたように構築され、やはり上で説明されたように、複数のブロックを含むセグメントへと統合され得る。プレゼンテーションの始めにおいて、各要求が単一のブロックを参照する、または、複数の同時の要求がブロックの一部を要求するために行われる、上で説明されたプロセスが、高速なチャネルザッピング時間、およびしたがって、プレゼンテーションの始めにおける良好なユーザ体験を確実にするために適用される。続いて、以下で説明されるある条件が満たされると、クライアントは、単一の要求に複数のブロックを包含する要求を出すことができる。これは、ブロックがより大きなファイルまたはセグメントへと統合され、バイトまたは時間の範囲を使用して要求され得るので、可能になる。連続的なバイトまたは時間の範囲は、単一のより大きなバイトまたは時間の範囲へと統合されてよく、複数のブロックに対する単一の要求をもたらし、不連続なブロックも、1つの要求で要求され得る。 In some embodiments of block streaming systems, the flexibility of high frequency requests is combined with lower frequency requests. In this embodiment, the blocks may be constructed as described above and integrated into a segment that includes multiple blocks, also as described above. At the beginning of the presentation, each request references a single block, or multiple simultaneous requests are made to request a portion of a block, the process described above has a fast channel zapping time, And therefore applied to ensure a good user experience at the beginning of the presentation. Subsequently, if certain conditions described below are met, the client can make a request that includes multiple blocks in a single request. This is possible because blocks are integrated into a larger file or segment and can be requested using a range of bytes or time. Consecutive byte or time ranges may be combined into a single larger byte or time range, resulting in a single request for multiple blocks, and discontinuous blocks are also required in a single request. obtain.

単一のブロック(または部分的なブロック)を要求するか、または複数の連続するブロックを要求するかを判断することによって行われ得る1つの基本的な構成は、要求されるブロックが再生される可能性が高いかどうかの判断に基づく構成を有する。たとえば、別の表現へとまもなく変更する必要がある可能性が高い場合、クライアントが単一のブロック、すなわち少量のメディアデータに対する要求を行うことがより好ましい。このことの1つの理由は、別の表現への切り替えが差し迫っている可能性があるときに複数のブロックに対する要求が行われると、要求の最後の数ブロックが再生される前に切り替えが行われ得るからである。したがって、これらの最後の数ブロックのダウンロードは、切替先の表現のメディアデータの配信を遅らせることがあり、これは、メディア再生のストールを引き起こし得る。 One basic configuration that can be done by determining whether to request a single block (or partial block) or multiple consecutive blocks is that the requested block is played back It has a configuration based on determination of whether or not there is a high possibility. For example, it is more preferable that the client make a request for a single block, i.e. a small amount of media data, if it is likely that it will soon need to be changed to another representation. One reason for this is that if a request for multiple blocks is made when switching to another representation may be imminent, the switch occurs before the last few blocks of the request are replayed. Because you get. Thus, the download of these last few blocks may delay the delivery of the media data in the destination representation, which can cause media playback stalls.

しかしながら、単一のブロックに対する要求は、高頻度の要求をもたらす。一方、別の表現へとまもなく変更する必要がある可能性が低い場合、これらのブロックのすべてが再生される可能性が高いので、複数のブロックに対する要求を行うことが好ましい可能性があり、これは低頻度の要求をもたらし、表現の差し迫った変更がないことが一般的である場合は特に、要求のオーバーヘッドをかなり下げ得る。 However, requests for a single block result in frequent requests. On the other hand, if it is unlikely that you will need to change to a different representation soon, it is likely that all of these blocks will be played, so it may be preferable to make a request for multiple blocks. Can result in infrequent requests and can significantly reduce request overhead, especially when there are no immediate changes in the representation.

従来のブロック統合システムでは、各要求において要求される量は動的に調整されず、すなわち、通常は各要求がファイル全体に対するものであり、または、各要求がほぼ同じ量の表現のファイル(場合によっては時間で評価され、場合によってはバイトで評価される)に対するものである。したがって、すべての要求がより小さければ、要求のオーバーヘッドは大きく、一方すべての要求がより大きければ、このことはメディアストールのイベントの確率を上げ、かつ/または、ネットワーク条件が変化するにつれて高速に表現を変更する必要性を避けるためにより低品質の表現が選ばれる場合、より低品質なメディア再生をもたらす。 In traditional block integration systems, the amount required for each request is not dynamically adjusted, i.e., usually each request is for the entire file, or each request is a file of approximately the same amount of representation (if Is evaluated in time, and in some cases is evaluated in bytes). Thus, if all requests are smaller, the request overhead is large, while if all requests are larger, this increases the probability of a media stall event and / or expresses faster as network conditions change. If a lower quality representation is chosen to avoid the need to change the, it results in lower quality media playback.

満たされると、後続の要求に複数のブロックを参照させ得る条件の例は、バッファサイズB_currentの閾値である。B_currentが閾値を下回る場合、出された各要求は単一のブロックを参照する。B_currentが閾値以上である場合、出された各要求は複数のブロックを参照する。複数のブロックを参照する要求が出される場合、各々の単一の要求において要求されるブロックの数は、いくつかの可能な方法の1つで決定され得る。たとえば、その数は定数、たとえば2であり得る。あるいは、単一の要求において要求されるブロックの数は、バッファ状態に、具体的にはB_currentに依存し得る。たとえば、閾値の数が設定されてよく、単一の要求において要求されるブロックの数は、B_current未満の複数の閾値の最大値から導出される。 An example of a condition that, when satisfied, can cause subsequent requests to reference multiple blocks is the threshold for the buffer size _Bcurrent . If B _current is below the threshold, each issued request references a single block. If B _current is greater than or equal to the threshold, each issued request references multiple blocks. If a request is made that references multiple blocks, the number of blocks required in each single request can be determined in one of several possible ways. For example, the number can be a constant, such as 2. Alternatively, the number of blocks required in a single request may depend on the buffer state, specifically B _current . For example, the number of thresholds may be set, and the number of blocks required in a single request is derived from a plurality of threshold values below B _current .

満たされると、要求に複数のブロックを参照させ得る条件の別の例は、上で説明された変数Stateの値である。たとえば、Stateが「Stable」または「Full」であるとき、要求は複数のブロックに対して出され得るが、Stateが「Low」であるとき、すべての要求が1つのブロックに対するものであり得る。 Another example of a condition that, when satisfied, can cause a request to reference multiple blocks is the value of the variable State described above. For example, when State is “Stable” or “Full”, requests can be issued for multiple blocks, but when State is “Low”, all requests can be for one block.

別の実施形態が図16に示される。この実施形態では、次の要求が出されるべきである場合(ステップ1300で判定される)、現在のStateの値およびBcurrentが、次の要求のサイズを決定するために使用される。現在のStateの値が「Low」である場合、または現在のStateの値が「Full」であり現在の表現が利用可能な最高のものではない場合(ステップ1310で判定され、答えは「Yes」)、次の要求は、短くなるように、たとえば、次のブロック(ステップ1320で決定され要求が行われるブロック)のためのものになるように選ばれる。このことの根拠は、これらが、まもなく表現の変化がある可能性が高い場合の条件である、ということである。現在のStateの値が「Stable」である場合、または、現在のStateの値が「Full」であり現在の表現が利用可能な最高のものである場合(ステップ1310で判定され、答えは「No」)、次の要求において要求される連続的なブロックの継続時間が、何らかの固定されたα<1に対してB_currentのα部分に比例するように選ばれ(ステップ1330で判定され、ステップ1340で要求が行われるブロック)、たとえば、α=0.4に対して、B_current=5秒である場合、次の要求は約2秒のブロックに対するものであってよく、B_current=10秒である場合、次の要求は約4秒のブロックに対するものであってよい。このことの1つの根拠は、これらの条件では、B_currentに比例する時間に対して、新たな表現への切り替えが行われる可能性が低い可能性がある、ということである。 Another embodiment is shown in FIG. In this embodiment, if the next request is to be issued (determined at step 1300), the current State value and Bcurrent are used to determine the size of the next request. If the current State value is "Low", or if the current State value is "Full" and the current representation is not the highest available (as determined in step 1310, the answer is "Yes" ), The next request is chosen to be short, for example, to be for the next block (the block for which the request is made as determined in step 1320). The basis for this is that these are conditions when it is likely that there will be a change in expression soon. If the current State value is "Stable", or if the current State value is "Full" and the current representation is the highest available (as determined in step 1310, the answer is "No )), The duration of the continuous block required in the next request is chosen to be proportional to the α portion of B _current for some fixed α <1 (determined in step 1330, step 1340 For example, if B _current = 5 seconds for α = 0.4, then the next request may be for a block of about 2 seconds and if B _current = 10 seconds The next request may be for a block of about 4 seconds. One reason for this is that under these conditions, there is a low possibility that switching to a new expression will be performed for a time proportional to _Bcurrent .

柔軟なパイプライン化
ブロックストリーミングシステムは、特定の背後にあるトランスポートプロトコル、たとえば、TCP/IPを有する、ファイル要求プロトコルを使用し得る。TCP/IPまたは他のトランスポートプロトコル接続の始めにおいて、利用可能な帯域幅の完全な利用を達成するにはかなりの時間がかかり得る。これは、新たな接続が開始されるたびに「接続始動ペナルティ」をもたらし得る。たとえば、TCP/IPの場合、接続開始ペナルティは、接続を確立するための初期のTCPハンドシェイクにかかる時間と、混雑制御プロトコルが利用可能な帯域幅の完全な利用を達成するのにかかる時間の両方によって、発生する。 Flexible Pipelining Block streaming systems may use a file request protocol with a specific underlying transport protocol, eg, TCP / IP. At the beginning of a TCP / IP or other transport protocol connection, it can take a considerable amount of time to achieve full utilization of available bandwidth. This can result in a “connection initiation penalty” each time a new connection is initiated. For example, in the case of TCP / IP, the connection initiation penalty is the time it takes for the initial TCP handshake to establish a connection and the time it takes for the congestion control protocol to achieve full use of available bandwidth. Generated by both.

この場合、接続始動ペナルティが引き起こされる頻度を下げるために、単一の接続を使用して複数の要求を出すことが望ましいことがある。しかしながら、いくつかのファイルトランスポートプロトコル、たとえばHTTPは、トランスポート層接続全体を切断すること以外に、要求を取り消すための機構を提供しないので、新たな接続が古い接続の代わりに確立されるときに、接続始動ペナルティを引き起こす。出される要求は、利用可能な帯域幅が変化し、異なるメディアデータレートが代わりに要求されると判定されると取り消される必要があることがあり、すなわち、異なる表現へと切り替えるための判断がある。出された要求を取り消すための別の理由は、メディアプレゼンテーションが終了され(場合によっては、プレゼンテーション中の異なる点における同じコンテンツアイテムの、または場合によっては、新たなコンテンツアイテムの)新たなプレゼンテーションが開始されることをユーザが要求したかどうかであり得る。 In this case, it may be desirable to issue multiple requests using a single connection to reduce the frequency at which connection startup penalties are triggered. However, some file transport protocols, such as HTTP, do not provide a mechanism to cancel a request other than disconnecting the entire transport layer connection, so when a new connection is established instead of an old connection Cause a connection startup penalty. The issued request may need to be canceled if it is determined that the available bandwidth changes and a different media data rate is required instead, ie there is a decision to switch to a different representation . Another reason for canceling an issued request is that the media presentation is terminated (possibly with the same content item at a different point in the presentation, or possibly with a new content item). It may be whether the user requested to be done.

知られているように、接続開始ペナルティは、接続を開放したままにすること、および、後続の要求に対して同じ接続を再使用することによって、回避されることが可能であり、やはり知られているように、複数の要求が同じ接続において同時に出される場合、接続は完全に利用される状態を保たれ得る(HTTPの状況では「パイプライン化」として知られる技法)。しかしながら、複数の要求を同時に出すこと、またはより一般的には、前の要求がある接続を通じて完了する前に複数の要求が出されるような方式で出すことの欠点は、接続がそれらの要求に対する応答を搬送することに費やされるので、それに対して要求が出されるべき変更が望ましいものになると、もはや所望されないすでに出された要求を取り消すことが必要になる場合、接続が切断され得るということであり得る。 As is known, connection initiation penalties can be avoided by keeping the connection open and reusing the same connection for subsequent requests and is also known. As such, if multiple requests are issued simultaneously on the same connection, the connection can remain fully utilized (a technique known as “pipelining” in the HTTP context). However, the disadvantage of issuing multiple requests at the same time, or more generally in such a way that multiple requests are issued before completing through a connection with a previous request, It is spent on carrying a response, so that if the change to which a request is to be made becomes desirable, the connection can be broken if it becomes necessary to cancel an already issued request that is no longer desired possible.

出された要求が取り消される必要のある確率は、以下のような意味で、要求の提出と要求されたブロックの再生時間との時間間隔の長さに一部依存し得る。その意味とは、この時間間隔が長い場合、出された要求が取り消される必要のある確率も高い(利用可能な帯域幅がその間隔の間に変化する可能性が高いので)というものである。 The probability that an issued request needs to be canceled can depend in part on the length of the time interval between the submission of the request and the playback time of the requested block in the following sense. The implication is that if this time interval is long, there is also a high probability that the issued request needs to be canceled (since the available bandwidth is likely to change during that interval).

知られているように、一部のファイルダウンロードプロトコルは、単一の背後にあるトランスポート層接続が有利なことに複数のダウンロード要求のために使用され得る、という性質を有する。たとえば、複数の要求に対する単一の接続の再使用は、最初のもの以外の、要求について上で説明された「接続始動ペナルティ」を避けるので、HTTPは上記の性質を有する。しかしながら、この手法の欠点は、接続が各々の出された要求において要求されるデータをトランスポートすることに費やされるので、1つまたは複数の要求が取り消される必要がある場合、接続が切断され、代わりの接続が確立されるときに接続始動ペナルティを引き起こし得るか、または、クライアントがもはや必要とされないデータの受信を待機し、後続のデータの受信に遅延を引き起こし得るかのいずれかである、ということである。 As is known, some file download protocols have the property that a single underlying transport layer connection can be advantageously used for multiple download requests. For example, HTTP has the above properties because the reuse of a single connection for multiple requests avoids the “connection initiation penalty” described above for the request other than the first one. However, the disadvantage of this approach is that the connection is spent transporting the data required in each issued request, so if one or more requests need to be canceled, the connection is disconnected, Either a connection startup penalty may be caused when an alternative connection is established, or the client may wait for receipt of data that is no longer needed and cause a delay in receiving subsequent data. That is.

ここで、この欠点を引き起こすことなく接続の再使用の利点を保持し、また、接続が再使用され得る頻度をさらに向上させる、実施形態について説明する。 An embodiment will now be described that retains the benefits of connection reuse without causing this drawback, and further improves the frequency with which connections can be reused.

本明細書で説明されるブロックストリーミングシステムの実施形態は、始めの接続を特定の要求のセットに費やす必要なく、複数の要求のために接続を再使用するように構成される。基本的に、ある既存の接続ですでに出された要求がまだ完了していないが、完了に近い場合、新たな要求がその接続で出される。既存の要求が完了するまで待機しないことの1つの理由は、以前の要求が完了すると、接続速度が下がることがあり、すなわち、背後にあるTCPセッションがアイドル状態になることがあり、またはTCPのcwnd変数がかなり下がることがあり、したがって、その接続の上で出された新たな要求の初期ダウンロード速度がかなり下がるからである。追加の要求を出す前に完了の近くまで待機することの1つの理由は、前の要求が完了するかなり前に新たな要求が出されると、新たに出された要求がかなりの期間開始すらしないことがあり、新たに出された要求が開始する前のこの期間の間に、たとえば表現を切り替えるという判断が原因で、新たな要求を行うという判断がもはや有効ではないことがあり得るからである。したがって、この技法を実施するクライアントの実施形態は、接続のダウンロード能力を低速化することなく、可能な限り遅く、その接続で新たな要求を出す。 Embodiments of the block streaming system described herein are configured to reuse connections for multiple requests without having to spend the initial connection on a specific set of requests. Basically, if a request already made on an existing connection is not yet complete, but close to completion, a new request is made on that connection. One reason for not waiting for existing requests to complete is that when the previous request completes, the connection speed may slow down, i.e. the underlying TCP session may become idle, or TCP This is because the cwnd variable can drop considerably, and therefore the initial download speed of new requests issued over that connection is considerably reduced. One reason for waiting close to completion before making additional requests is that if a new request is issued long before the previous request completes, the newly issued request will not even start for a significant period of time. This is because during this period before a newly issued request begins, the decision to make a new request may no longer be valid, for example, due to the decision to switch representations. . Thus, client embodiments that implement this technique make new requests on the connection as late as possible without slowing down the download capability of the connection.

方法は、ある接続で出された最新の要求に応答して、この接続で受信されるバイトの数を監視するステップと、試験をこの数に適用するステップとを含む。これは、監視し試験するように受信機(または、可能な場合送信機)を構成させることによって行われ得る。 The method includes the steps of monitoring the number of bytes received on this connection in response to the most recent request issued on that connection and applying a test to this number. This can be done by configuring the receiver (or transmitter if possible) to monitor and test.

試験に通ると、さらなる要求がその接続で出され得る。適切な試験の1つの例は、受信されたバイトの数が、要求されたデータのサイズの固定された断片より大きいかどうかである。たとえば、この断片は80%であり得る。適切な試験の別の例は、図17において示されるように、次の計算に基づく。計算において、Rを接続のデータレートの推定値とし、Tをラウンドトリップタイム(「RTT」)の推定値とし、Xをたとえば0.5から2の間の値に設定される定数であり得る係数とし、RおよびTの推定値は定期的に更新される(ステップ1410で更新される)。Sを最後の要求において要求されたデータのサイズとし、Bを受信される要求されたデータのバイトの数とする(ステップ1420で計算される)。 Upon passing the test, further requests can be made on that connection. One example of a suitable test is whether the number of bytes received is larger than a fixed piece of the size of the requested data. For example, this fragment can be 80%. Another example of a suitable test is based on the following calculation, as shown in FIG. In the calculation, let R be an estimate of the data rate of the connection, T be an estimate of the round trip time (`` RTT ''), X be a factor that can be a constant set to a value between 0.5 and 2, for example, R and T estimates are updated periodically (updated at step 1410). Let S be the size of the data requested in the last request and B be the number of bytes of requested data received (calculated in step 1420).

適切な試験は、受信機(または可能な場合、送信機)に、不等式(S-B)<X・R・T(ステップ1430で試験される)を評価するためのルーチンを実行させることであり、「Yes」の場合、動作を行う。たとえば、その接続で出される準備ができている別の要求があるかどうかを確認するための試験が行われてよく(ステップ1440で試験される)、「Yes」の場合、その接続に対してその要求を出し(ステップ1450)、「No」の場合、プロセスはステップ1410に戻り、更新および試験を続ける。ステップ1430の試験の結果が「No」である場合、プロセスはステップ1410に戻り、更新および試験を続ける。 A suitable test is to have the receiver (or transmitter, if possible) execute a routine to evaluate the inequality (SB) <X · R · T (tested in step 1430). If “Yes”, the operation is performed. For example, a test may be performed to see if there is another request ready to be issued on the connection (tested in step 1440), and if yes, for that connection The request is made (step 1450), and if “No”, the process returns to step 1410 to continue updating and testing. If the test result in step 1430 is “No”, the process returns to step 1410 to continue updating and testing.

ステップ1430の不等式の試験(たとえば、適切にプログラムされた要素によって実行される)は、受信されるべき残りのデータの量が、Xと、1つのRTT内の現在の推定される受信レートで受信され得るデータの量とを乗じたものと等しいときに、各々の後続の要求が出されるようにする。ステップ1410のデータレートRを推定するための多数の方法が、当技術分野で知られている。たとえば、データレートはDt/tとして推定されてよく、ここでDtは、先行するt秒の間に受信されたビットの数であり、tはたとえば、1秒または0.5秒または何らかの他の期間であり得る。別の方法は、指数加重平均、または、入来するデータレートの1次無限インパルス応答(IIR)フィルタである。ステップ1410のRTT、Tを推定するための多数の方法が、当技術分野で知られている。 The inequality test of step 1430 (e.g., performed by an appropriately programmed element) is performed when the amount of remaining data to be received is X and the current estimated receive rate in one RTT. Each subsequent request is issued when equal to the amount of data that can be done. Numerous methods for estimating the data rate R of step 1410 are known in the art. For example, the data rate may be estimated as Dt / t, where Dt is the number of bits received during the preceding t seconds, and t is, for example, 1 second or 0.5 seconds or some other time period possible. Another method is an exponentially weighted average or first order infinite impulse response (IIR) filter with an incoming data rate. Numerous methods for estimating RTT, T in step 1410 are known in the art.

ステップ1430の試験は、以下でより詳細に説明されるように、あるインターフェース上のすべてのアクティブな接続の統合に対して適用され得る。 The test of step 1430 can be applied to the integration of all active connections on an interface, as described in more detail below.

方法はさらに、候補要求のリストを構築するステップと、各候補要求を要求が行われ得る対象の適切なサーバのセットと関連付けるステップと、優先順序で候補要求のリストを並べるステップとを含む。候補要求のリスト中のいくつかのエントリーは、同じ優先順位を有し得る。各候補要求と関連付けられる適切なサーバのリスト中のサーバは、ホスト名によって特定される。各ホスト名は、よく知られているように、ドメインネームシステムから取得され得るインターネットプロトコルアドレスのセットに対応する。したがって、候補要求のリストに対する各々のあり得る要求は、インターネットプロトコルアドレスのセット、具体的には、候補要求と関連付けられるサーバと関連付けられるホスト名と関連付けられるインターネットプロトコルアドレスのセットの結合物と関連付けられる。ステップ1430で説明される試験がある接続に対して満たされ、その接続で新たな要求がまだ出されていない場合は常に、その接続の宛先のインターネットプロトコルアドレスと関連付けられる候補要求のリスト上の最高の優先順位の要求が選ばれ、この要求はその接続で出される。その要求はまた、候補要求のリストから除去される。 The method further includes building a list of candidate requests, associating each candidate request with an appropriate set of servers on which requests can be made, and arranging the list of candidate requests in priority order. Several entries in the list of candidate requests may have the same priority. The server in the list of appropriate servers associated with each candidate request is identified by the host name. Each host name corresponds to a set of internet protocol addresses that can be obtained from the domain name system, as is well known. Thus, each possible request for a list of candidate requests is associated with a set of internet protocol addresses, specifically a combination of a set of internet protocol addresses associated with a host name associated with a server associated with the candidate request. . Whenever the test described in step 1430 is met for a connection and no new request has been made for that connection, the highest on the list of candidate requests associated with the destination Internet Protocol address for that connection. Priority request is selected and this request is issued on that connection. The request is also removed from the list of candidate requests.

候補要求は、候補要求のリストから除去(削除)されてよく、新たな要求が、候補リスト上の既存の要求よりも高い優先順位を伴う候補リストに追加されてよく、候補リスト上の既存の要求は、優先順位を変更されてよい。どの要求が候補要求のリスト上にあるかということの動的な性質、および、候補リスト上での要求の優先順位の動的な性質により、ステップ1430で説明されたタイプの試験がいつ満たされるかに応じて、どの要求が次に出され得るかが変化し得る。 Candidate requests may be removed (deleted) from the list of candidate requests, new requests may be added to the candidate list with a higher priority than existing requests on the candidate list, and existing requests on the candidate list may be added. Requests may be changed in priority. The dynamic nature of which requests are on the list of candidate requests and the dynamic nature of the priority of the requests on the candidate list will satisfy the type of test described in step 1430 Depending on the situation, it may change which request can be issued next.

たとえば、ある時間tにおいて、ステップ1430で説明された試験に対する答えが「Yes」である場合、出される次の要求が要求Aであり、一方、ステップ1430で説明された試験に対する答えが何らかの時間t'>tまで「Yes」ではない場合、出される次の要求は代わりに要求Bである、ということがあり得る。それは、要求Aが時間tとt'の間に候補要求のリストから除去されたため、または、要求Bが時間tとt'の間に要求Aよりも高い優先順位を伴って候補要求のリストに追加されたため、または、要求Bが時間tにおいて候補リスト上にあったが要求Aよりも優先順位が低く、時間tとt'の間に要求Bの優先順位が要求Aの優先順位よりも高くされたためである。 For example, if at some time t the answer to the test described in step 1430 is “Yes”, the next request made is request A, while the answer to the test described in step 1430 is some time t If 'Yes' until'> t, it is possible that the next request issued will be request B instead. Either request A was removed from the list of candidate requests during times t and t ', or request B was listed in the candidate requests with higher priority than request A during times t and t'. Or because request B was on the candidate list at time t but has a lower priority than request A, and the priority of request B is higher than the priority of request A between times t and t ' It was because it was done.

図18は、要求の候補リスト上の要求のリストの例を示す。この例では、3個の接続があり、A、B、C、D、E、およびFと標識された候補リスト上の6個の要求がある。候補リスト上の要求の各々は、示されるように接続のサブセットで出されることが可能であり、たとえば、要求Aは接続1で出されてよく、一方要求Fは接続2または接続3で出されることが可能である。各要求の優先順位も図18において標識されており、より小さな優先順位の値は、要求の優先順位がより高いことを示す。したがって、優先順位0を伴う要求AおよびBは優先順位が最高の要求であり、一方3という優先順位の値を伴う要求Fは候補リスト上の要求の中で最も優先順位が低い。 FIG. 18 shows an example of a request list on the request candidate list. In this example, there are 3 connections, and there are 6 requests on the candidate list labeled A, B, C, D, E, and F. Each of the requests on the candidate list can be issued on a subset of connections as shown, for example, request A may be issued on connection 1 while request F is issued on connection 2 or connection 3 It is possible. The priority of each request is also labeled in FIG. 18, and a lower priority value indicates a higher priority for the request. Therefore, requests A and B with priority 0 are the highest priority requests, while request F with a priority value of 3 has the lowest priority among the requests on the candidate list.

この時間tの点において、接続1がステップ1430で説明される試験に通ると、要求Aまたは要求Bのいずれかが接続1で出される。代わりに、接続3がこの時間tにおいてステップ1430で説明された試験に通ると、要求Dは接続3で出され得る最高の優先順位を伴う要求なので、要求Dは接続3で出される。 At time t, if connection 1 passes the test described in step 1430, either request A or request B is issued on connection 1. Instead, if connection 3 passes the test described in step 1430 at this time t, request D is issued on connection 3 because request D is the request with the highest priority that can be issued on connection 3.

すべての接続に対して、時間tからより後の時間t'までのステップ1430で説明された試験に対する答えが「No」であり、時間tとt'の間で要求Aの優先順位が0から5に変化したと仮定すると、要求Bは候補リストから除去され、優先順位0を伴う新たな要求Gが候補リストに追加される。そして、時間t'において、新たな候補リストは図19に示されるようであり得る。 For all connections, the answer to the test described in step 1430 from time t to later time t ′ is “No”, and the priority of request A from time t and t ′ is 0 Assuming that it has changed to 5, request B is removed from the candidate list and a new request G with priority 0 is added to the candidate list. Then, at time t ′, the new candidate list can be as shown in FIG.

時間t'において、接続1がステップ1430で説明された試験に通ると、優先順位4を伴う要求Cはこの時点において接続1で出され得る候補リスト上の優先順位が最高の要求なので、要求Cは接続1で出される。 At time t ′, if connection 1 passes the test described in step 1430, request C with priority 4 is request C because it is the highest priority on the candidate list that can be issued at this time for connection 1. Is issued on connection 1.

この同じ状況において、代わりに、要求Aが時間tにおいて接続1で出されたと仮定する(要求Aは、図18に示されるように、時間tにおいて接続1に対する2つの最高の優先順位の選択の1つであった)。すべての接続に対して、時間tからより後の時間t'までのステップ1430で説明された試験に対する答えが「No」であり、接続1は依然として、時間tよりも前に出された要求に対するデータを、少なくとも時間t'まで配信しているので、要求Aは少なくとも時間t'まで開始されなかったであろう。時間t'において要求Cを出すことが、時間tにおいて要求Aを出すことよりも良い判断であり、それは、t'の後で、要求Aが開始されたのと同時に要求Cが開始するから、および、その時点までには、要求Cは要求Aよりも優先順位が高くなっているからである。 In this same situation, instead suppose that request A was made on connection 1 at time t (request A is the choice of the two highest priorities for connection 1 at time t, as shown in FIG. One). For all connections, the answer to the test described in step 1430 from time t to later time t ′ is “No”, and connection 1 is still for requests made before time t Request A would not have been initiated at least until time t ′ since the data has been delivered at least until time t ′. Issuing request C at time t ′ is a better decision than issuing request A at time t because, after t ′, request C starts at the same time as request A is started. This is because, by that time, the request C has a higher priority than the request A.

別の代替形態として、ステップ1430で説明されたタイプの試験がアクティブな接続の統合に適用される場合、そのインターネットプロトコルアドレスが候補要求のリスト上の第1の要求と関連付けられる、または、前記第1の要求と同じ優先順位を伴う別の要求と関連付けられる宛先を有する接続が選ばれ得る。 As another alternative, if a test of the type described in step 1430 is applied to active connection integration, the internet protocol address is associated with the first request on the list of candidate requests, or A connection with a destination associated with another request with the same priority as one request may be chosen.

多数の方法が、候補要求のリストの構築に対して可能である。たとえば、候補リストは、時間的な順序でプレゼンテーションの現在の表現のデータの次のn個の部分に対する要求を表すn個の要求を含んでよく、ここで、データの最も早い部分に対する要求は最高の優先順位を有し、データの最も遅い部分に対する要求は最低の優先順位を有する。いくつかの場合には、nは1であり得る。nの値は、バッファサイズB_current、または変数State、または、クライアントのバッファ占有率の状態の別の尺度に依存し得る。たとえば、多数の閾値がB_currentおよび各閾値と関連付けられる値に対して設定されてよく、nの値は、B_current未満の最高の閾値と関連付けられる値であると考えられる。 A number of methods are possible for building a list of candidate requests. For example, the candidate list may include n requests that represent requests for the next n parts of the data in the current representation of the presentation in chronological order, where the request for the earliest part of the data is the highest. The request for the slowest part of the data has the lowest priority. In some cases, n may be 1. The value of n may depend on the buffer size B _current , or the variable State, or another measure of the client's buffer occupancy state. For example, multiple thresholds may be set for B _current and the value associated with each threshold, where the value of n is considered to be the value associated with the highest threshold less than B _current .

上で説明された実施形態は、接続に対する要求の柔軟な割振りを確実にし、(接続の宛先IPアドレスが要求と関連付けられるホスト名のいずれかに割り振られるIPアドレスではなかったために)最高の優先順位の要求が既存の接続に適さない場合でも、その接続を再使用することに対する選好が与えられることを確実にする。B_currentまたはStateまたはクライアントのバッファ占有率の状態の別の尺度に対するnの依存性は、時間的順序で再生されるべき次のデータの部分と関連付けられる要求の発行および完了を行う差し迫った必要がクライアントにあるときに、そのような「優先順位が狂った」要求が出されないことを確実にする。 The embodiment described above ensures flexible allocation of requests for a connection and has the highest priority (because the destination IP address of the connection was not an IP address assigned to any of the host names associated with the request) Even if the request is not suitable for an existing connection, ensure that a preference is given to reusing that connection. The dependence of n on B _current or State or another measure of the state of the client's buffer occupancy needs to be urgent to issue and complete the request associated with the next piece of data to be played in chronological order. Ensure that such “out of priority” requests are not made when at the client.

これらの方法は有利には、協調的なHTTPおよびFECと組み合わされ得る。 These methods can advantageously be combined with cooperative HTTP and FEC.

一貫性のあるサーバの選択
よく知られているように、ファイルダウンロードプロトコルを使用してダウンロードされるべきファイルは、一般に、ホスト名およびファイル名を含む識別子によって識別される。たとえば、これはHTTPプロトコルに対して当てはまり、この場合、識別子はUniform Resource Identifier(URI)である。ホスト名は、インターネットプロトコルアドレスによって識別される、複数のホストに対応し得る。たとえば、これは、複数の物理的な機械にまたがって、複数のクライアントからの要求の負荷を分散する一般的な方法である。具体的には、この手法は一般的に、コンテンツ配信ネットワーク(CDN)によって行われる。この場合、物理ホストのいずれかに対する接続で出された要求は、成功することが予想される。クライアントがそれを使用してホスト名と関連付けられるインターネットプロトコルアドレスの中から選択できる、多数の方法が知られている。たとえば、これらのアドレスは通常、ドメインネームシステムを介してクライアントに提供され、優先順序で提供される。クライアントは次いで、最高の優先順位の(第1の)インターネットプロトコルアドレスを選ぶことができる。しかしながら、一般に、この選択がどのように行われるかについて、クライアント間での協調は存在せず、その結果、異なるクライアントが異なるサーバから同じファイルを要求することがある。これにより、近くの複数のサーバのキャッシュに同じファイルが記憶されることがあり、これは、キャッシュインフラストラクチャの効率性を下げる。 Consistent Server Selection As is well known, files to be downloaded using the file download protocol are generally identified by an identifier that includes a host name and a file name. For example, this is true for the HTTP protocol, where the identifier is a Uniform Resource Identifier (URI). A host name may correspond to multiple hosts identified by an Internet protocol address. For example, this is a common way to distribute the load of requests from multiple clients across multiple physical machines. Specifically, this approach is typically performed by a content distribution network (CDN). In this case, a request issued on a connection to any of the physical hosts is expected to succeed. Numerous methods are known that allow a client to select from among internet protocol addresses that are associated with a host name. For example, these addresses are typically provided to clients via the domain name system and in priority order. The client can then choose the highest priority (first) internet protocol address. However, in general, there is no cooperation between clients as to how this selection is made, so that different clients may request the same file from different servers. This may cause the same file to be stored in the caches of multiple nearby servers, which reduces the efficiency of the cache infrastructure.

このことは、同じブロックを要求する2つのクライアントが同じサーバからこのブロックを要求する確率を有利に上げるシステムによって処理され得る。ここで説明される新規の方法は、要求されるべきファイルの識別子によって決定される方式で、かつ、インターネットプロトコルアドレスおよびファイル識別子の同じまたは同様の選択を提示された異なるクライアントが同じ選択を行うような方式で、利用可能なインターネットプロトコルアドレスの中から選択するステップを含む。 This can be handled by a system that advantageously increases the probability that two clients requesting the same block will request this block from the same server. The new method described here is in a manner determined by the identifier of the file to be requested, so that different clients presented with the same or similar selection of internet protocol address and file identifier make the same selection. And selecting from among available Internet protocol addresses.

方法の第1の実施形態が、図20を参照して説明される。クライアントはまず、ステップ1710において示されるように、インターネットプロトコルアドレスのセットIP₁、IP₂、…、IP_nを取得する。ステップ1720において判断されるように、それに対して要求が出されるべきファイルがある場合、クライアントは、ステップ1730〜1770において決定されるように、そのファイルに対する要求をどのインターネットプロトコルアドレスで出すかを決定する。インターネットプロトコルアドレスのセットおよび要求されるべきファイルの識別子が与えられると、方法は、ファイル識別子によって決定される方式で、インターネットプロトコルアドレスを順序付けるステップを含む。たとえば、各インターネットプロトコルアドレスに対して、ステップ1730において示されるように、インターネットプロトコルアドレスおよびファイル識別子の連結物を含む、バイト文字列が構築される。ステップ1740において示されるように、ハッシュ関数がこのバイト文字列に適用され、得られるハッシュ値は、ステップ1750において示されるように、固定された順序、たとえば、数値が増加する順序に従って並べられ、インターネットプロトコルアドレスの順序を生じさせる。同じハッシュ関数がすべてのクライアントにより使用され得るので、すべてのクライアントによる所与の入力に対して同じ結果がハッシュ関数により生成されることが保証される。ハッシュ関数は、クライアントのセット中のすべてのクライアントへと統計的に構成されてよく、または、クライアントのセット中のすべてのクライアントが、クライアントがインターネットプロトコルアドレスのリストを取得したときにハッシュ関数の部分的または完全な記述を取得してよく、または、クライアントのセット中のすべてのクライアントが、クライアントがファイル識別子を取得したときにハッシュ関数の部分的または完全な記述を取得してよく、または、ハッシュ関数は他の手段によって求められてよい。この順序で最初にあるインターネットプロトコルアドレスが選ばれ、このアドレスが次いで、ステップを1760および1770において示されるように、接続を確立し、ファイルのすべてまたは一部に対する要求を出すために使用される。 A first embodiment of the method is described with reference to FIG. The client first obtains a set of internet protocol addresses IP ₁ , IP ₂ ,..., IP _n as shown in step 1710. If there is a file for which a request is to be made, as determined in step 1720, the client determines which internet protocol address to issue the request for that file, as determined in steps 1730-1770. To do. Given a set of internet protocol addresses and an identifier of a file to be requested, the method includes ordering the internet protocol addresses in a manner determined by the file identifier. For example, for each internet protocol address, a byte string is constructed that includes a concatenation of the internet protocol address and the file identifier, as shown in step 1730. As shown in step 1740, a hash function is applied to this byte string, and the resulting hash values are arranged according to a fixed order, for example, the order of increasing numbers, as shown in step 1750, and the Internet Produces a protocol address order. Since the same hash function can be used by all clients, it is guaranteed that the same result will be generated by the hash function for a given input by all clients. The hash function may be statistically configured to all clients in the set of clients, or all clients in the set of clients may have a portion of the hash function when the client obtains a list of internet protocol addresses. Or all clients in the set of clients may obtain a partial or complete description of the hash function when the client obtains the file identifier, or the hash The function may be determined by other means. The first internet protocol address in this order is chosen and this address is then used to establish a connection and issue a request for all or part of the file, as shown in steps 1760 and 1770.

上の方法は、新たな接続がファイルを要求するために確立されるときに適用され得る。これはまた、多数の確立された接続が利用可能であり、これらの1つが新たな要求を出すために選ばれ得る場合に適用され得る。 The above method can be applied when a new connection is established to request a file. This can also be applied where a large number of established connections are available and one of these can be chosen to issue a new request.

さらに、確立された接続が利用可能であり、要求が優先順位の等しい候補要求のセットの中から選ばれ得る場合、たとえば、上で説明されたハッシュ値の同じ方法によって候補要求の順序が生じ、この順序で最初に現れる候補要求が選ばれる。再び、接続および要求の各々の組合せに対するハッシュを計算し、固定された順序に従ってこれらのハッシュ値を順序付け、要求と接続の組合せのセットにより生じた順序で最初に現れる組合せを選ぶことによって、方法は、優先順位の等しい接続および要求のセットの中から、接続と候補要求の両方を選択するように、組み合わされ得る。 Further, if the established connection is available and the request can be chosen from a set of candidate requests of equal priority, for example, the same method of hash values described above results in the order of candidate requests, The candidate request that appears first in this order is selected. Again, the method calculates the hash for each combination of connection and request, orders these hash values according to a fixed order, and selects the combination that appears first in the order produced by the set of request and connection combinations. Can be combined to select both connections and candidate requests from a set of equal priority connections and requests.

この方法は、次のような理由で利点がある。図1(BSI 101)または図2(BSI 101)に示されるような、ブロックサービングインフラストラクチャによって行われる通常の手法、特にCDNによって一般に行われる手法は、クライアント要求を受信する複数のキャッシングプロキシサーバを提供することになる。キャッシングプロキシサーバは、所与の要求において要求されるファイルを与えられないことがあり、この場合、そのようなサーバは通常、要求を別のサーバに転送し、通常は要求されたファイルを含む応答をそのサーバから受信し、その応答をクライアントに転送する。キャッシングプロキシサーバはまた、ファイルに対する後続の要求に直ちに応答できるように、要求されたファイルを記憶(キャッシュ)することができる。上で説明された一般的な手法は、所与のキャッシングプロキシサーバに記憶されるファイルのセットが、そのキャッシングプロキシサーバが受信した要求のセットによって大きく左右されるという性質を有する。 This method is advantageous for the following reason. The usual approach performed by the block serving infrastructure, as shown in Figure 1 (BSI 101) or Figure 2 (BSI 101), especially the approach commonly performed by CDN, is to have multiple caching proxy servers that receive client requests. Will provide. A caching proxy server may not be given the requested file in a given request, in which case such a server typically forwards the request to another server, and usually a response containing the requested file From the server and forward the response to the client. The caching proxy server can also store (cache) the requested file so that it can immediately respond to subsequent requests for the file. The general approach described above has the property that the set of files stored on a given caching proxy server is highly dependent on the set of requests received by that caching proxy server.

上で説明された方法は、次の利点を有する。クライアントのセット中のすべてのクライアントがインターネットプロトコルアドレスの同じリストを与えられると、これらのクライアントは、同じファイルに対して出されるすべての要求に対して同じインターネットプロトコルを使用する。インターネットプロトコルアドレスの2つの異なるリストがあり、各クライアントがこれらの2つのリストの1つを与えられる場合、クライアントは、同じファイルに対して出されるすべての要求に対して、多くても2つの異なるインターネットプロトコルアドレスを使用する。一般に、クライアントに与えられるインターネットプロトコルアドレスのリストが同様である場合、クライアントは、同じファイルに対して出されるすべての要求に対する与えられたインターネットプロトコルアドレスの小さなセットを使用する。近隣のクライアントはインターネットプロトコルアドレスの同様のリストを与えられる傾向があるので、近隣のクライアントが、それらのクライアントが利用可能であるキャッシングプロキシサーバの小さな部分のみからのファイルに対する要求を出す可能性が高い。したがって、ファイルをキャッシュするキャッシングプロキシサーバは少数しか存在せず、これは有利なことに、ファイルをキャッシュするために使用されるキャッシングリソースの量を最小限にする。 The method described above has the following advantages. If all clients in a set of clients are given the same list of internet protocol addresses, these clients use the same internet protocol for all requests made to the same file. If there are two different lists of internet protocol addresses and each client is given one of these two lists, the client will be at most two different for every request made to the same file Use an internet protocol address. In general, if the list of internet protocol addresses given to the client is similar, the client uses a small set of given internet protocol addresses for all requests made to the same file. Neighboring clients tend to be given a similar list of Internet protocol addresses, so neighboring clients are likely to make requests for files from only a small portion of the caching proxy server that they are available to . Thus, there are only a few caching proxy servers that cache files, which advantageously minimizes the amount of caching resources used to cache files.

好ましくは、ハッシュ関数は、様々な入力の非常に小さな部分が同じ出力にマッピングされ、様々な入力が基本的にランダムな出力にマッピングされ、インターネットプロトコルアドレスの所与のセットに対して、インターネットプロトコルアドレスの所与の1つがステップ1750で生成されたソートされたリストにおいて最初にあるファイルの比率は、リスト中のすべてのインターネットプロトコルアドレスに対してもほぼ同一であるという性質を有する。一方、所与の入力に対してハッシュ関数の出力がすべてのクライアントに対して同一であるという意味で、ハッシュ関数が決定論的であることは重要である。 Preferably, the hash function is such that a very small portion of the various inputs are mapped to the same output, the various inputs are mapped to essentially random outputs, and for a given set of internet protocol addresses, the internet protocol The ratio of the first file in the sorted list in which the given one of the addresses is generated in step 1750 has the property that it is approximately the same for all Internet protocol addresses in the list. On the other hand, it is important that the hash function is deterministic in the sense that for a given input the output of the hash function is the same for all clients.

上で説明された方法の別の利点は、次の通りである。クライアントのセット中のすべてのクライアントが、インターネットプロトコルアドレスの同じリストを与えられると仮定する。上で説明されたハッシュ関数の性質により、これらのクライアントからの異なるファイルに対する要求は、インターネットプロトコルアドレスのセットにわたって均一に分散される可能性が高く、これは転じて、要求がキャッシングプロキシサーバにわたって均一に分散されることを意味する。したがって、これらのファイルを記憶するために使用されるキャッシングリソースは、キャッシングプロキシサーバにわたって均一に分散され、ファイルに対する要求は、キャッシングプロキシサーバにわたって均一に分散される。したがって、方法は、キャッシングインフラストラクチャにわたる、記憶のバランスと負荷のバランスの両方を実現する。 Another advantage of the method described above is as follows. Assume that all clients in a set of clients are given the same list of internet protocol addresses. Due to the nature of the hash function described above, requests for different files from these clients are likely to be evenly distributed across a set of Internet protocol addresses, which in turn turns the request evenly across caching proxy servers. Means to be distributed. Thus, the caching resources used to store these files are evenly distributed across the caching proxy server, and requests for files are evenly distributed across the caching proxy server. Thus, the method achieves both storage balance and load balance across the caching infrastructure.

上で説明された手法に対する多数の変形が当業者に知られており、多くの場合、これらの変形は、所与のプロキシに記憶されるファイルのセットが、キャッシングプロキシサーバが受信した要求のセットによって少なくとも一部決定されるという性質を保持する。所与のホスト名が複数の物理キャッシングプロキシサーバに解決する一般的な事例では、すべてのこれらのサーバが、頻繁に要求される任意の所与のファイルのコピーを最終的に記憶することが一般的である。そのような重複は望ましくないことがあり、それは、キャッシングプロキシサーバ上の記憶リソースが制限され、結果として、ファイルが時々キャッシュから除去(パージ)され得るからである。ここで説明される新規の方法は、この重複が減らされるような方式で、所与のファイルに対する要求がキャッシングプロキシサーバに向けられることを確実にし、これによって、キャッシュからファイルを除去する必要性を減らし、任意の所与のファイルがプロキシキャッシュ中に存在する(すなわち、プロキシキャッシュからパージされていない)確率を高める。 Numerous variations on the techniques described above are known to those skilled in the art, and in many cases, these variations will result in the set of files stored on a given proxy being the set of requests received by the caching proxy server. Retains the property of being determined at least in part by In the common case where a given host name resolves to multiple physical caching proxy servers, it is common for all these servers to eventually store a copy of any given file that is frequently requested. Is. Such duplication may not be desirable because storage resources on the caching proxy server are limited and, as a result, files may sometimes be removed (purged) from the cache. The new method described here ensures that requests for a given file are directed to the caching proxy server in such a way that this duplication is reduced, thereby eliminating the need to remove the file from the cache. Reduce and increase the probability that any given file is in the proxy cache (ie, not purged from the proxy cache).

ファイルがプロキシキャッシュ中に存在する場合、クライアントに送信される応答はより高速になり、このことは、メディア再生の一時停止、したがって悪いユーザ体験をもたらし得る、要求されるファイルの到達の遅延の確率を減らす際に有利である。加えて、ファイルがプロキシキャッシュ中に存在しない場合、要求は別のサーバに送信されることがあり、サービングインフラストラクチャと、サーバ間のネットワーク接続の両方に、追加の負荷を引き起こす。多くの場合、要求が送信されるサーバは離れた位置にあることがあり、このサーバからキャッシングプロキシサーバへのファイルの返信は、送信コストを招き得る。したがって、ここで説明される新規の方法は、これらの送信コストの低減をもたらす。 If the file is in the proxy cache, the response sent to the client will be faster, which is the probability of delay in reaching the requested file, which can result in a pause in media playback and thus a bad user experience This is advantageous in reducing In addition, if the file is not in the proxy cache, the request may be sent to another server, causing additional load on both the serving infrastructure and the network connection between the servers. In many cases, the server to which the request is sent may be at a remote location, and returning a file from this server to the caching proxy server can incur transmission costs. Thus, the novel method described here results in a reduction in these transmission costs.

確率的なファイル全体の要求
HTTPプロトコルがRange要求とともに使用される場合の特定の問題は、サービングインフラストラクチャにスケーラビリティをもたらすために一般に使用されるキャッシュサーバの挙動である。HTTPキャッシュサーバがHTTP Rangeヘッダをサポートするのは一般的であり得るが、異なるHTTPキャッシュサーバの厳密な挙動は、実装形態により変化する。多くのキャッシュサーバの実装形態は、ファイルがキャッシュ中で利用可能な場合に、キャッシュからのRange要求にサービスする。HTTPキャッシュサーバの一般的な実装形態は常に、キャッシュサーバがファイルのコピーを有さない限り(キャッシュサーバまたは発信元サーバ)、Rangeヘッダを含む下流のHTTP要求を上流のノードに転送する。いくつかの実装形態では、Range要求に対する上流の応答はファイル全体であり、このファイル全体がキャッシュされ、下流のRange要求に対する応答がこのファイルから抽出され送信される。しかしながら、少なくとも1つの実装形態では、Range要求に対する上流の応答は単に、Range要求自体の中のデータバイトであり、これらのデータバイトはキャッシュされず、代わりに、下流のRange要求に対する応答として送信されるだけである。結果として、クライアントによるRangeヘッダの使用は、ファイル自体が決してキャッシュに運ばれず、ネットワークの望ましくないスケーラビリティの特性が失われるという結果をもたらし得る。 Probabilistic whole file request
A particular problem when the HTTP protocol is used with Range requests is the behavior of cache servers that are commonly used to provide scalability to the serving infrastructure. Although it is common for HTTP cache servers to support HTTP Range headers, the exact behavior of different HTTP cache servers varies from implementation to implementation. Many cache server implementations service Range requests from the cache when the file is available in the cache. A typical implementation of an HTTP cache server always forwards a downstream HTTP request containing a Range header to an upstream node, unless the cache server has a copy of the file (cache server or origin server). In some implementations, the upstream response to the Range request is the entire file, the entire file is cached, and the response to the downstream Range request is extracted from this file and sent. However, in at least one implementation, the upstream response to the Range request is simply the data bytes in the Range request itself, and these data bytes are not cached and are instead sent as a response to the downstream Range request. Just do. As a result, the use of the Range header by the client can result in the file itself never being cached and the undesirable scalability characteristics of the network being lost.

上では、キャッシングプロキシサーバの動作が説明され、複数のブロックの統合であるファイルからのブロックを要求する方法も説明された。たとえば、これは、HTTP Range要求ヘッダの使用によって達成され得る。そのような要求は、以下で「部分的な要求」と呼ばれる。ブロックサービングインフラストラクチャ101がHTTP Rangeヘッダに対する完全なサポートを提供しない場合に利点を有する、さらなる実施形態がここで説明される。一般に、ブロックサービングインフラストラクチャ内のサーバ、たとえばコンテンツ配信ネットワークは、部分的な要求をサポートするが、ローカルの記憶装置(キャッシュ)に部分的な要求に対する応答を記憶できない。そのようなサーバは、ファイル全体がローカルの記憶装置に記憶されない限り、要求を別のサーバに転送することによって部分的な要求を満たすことができ、ファイル全体がローカルの記憶装置に記憶される場合は、要求を別のサーバに転送することなく応答が送信され得る。 Above, the operation of a caching proxy server has been described, and a method for requesting a block from a file that is the integration of multiple blocks has been described. For example, this can be accomplished by use of an HTTP Range request header. Such a request is referred to below as a “partial request”. Further embodiments will now be described that have advantages when the block serving infrastructure 101 does not provide full support for HTTP Range headers. In general, servers in a block serving infrastructure, such as a content distribution network, support partial requests, but cannot store responses to partial requests in local storage (cache). Such a server can satisfy a partial request by forwarding the request to another server unless the entire file is stored on the local storage device, and the entire file is stored on the local storage device. The response can be sent without forwarding the request to another server.

ブロックサービングインフラストラクチャがこの挙動を示す場合、上で説明されたブロック統合の新規の改善を利用するブロック要求ストリーミングシステムは性能が悪い可能性があり、それは、部分的な要求であるすべての要求が別のサーバに転送され、キャッシングプロキシサーバによって要求がサービスされず、キャッシングプロキシサーバを1位に置くという目標を無意味にするからである。上で説明されたようなブロック要求ストリーミングプロセスの間に、クライアントは、ファイルの始めにあるブロックを、何らかの時点で要求することができる。 If the block serving infrastructure shows this behavior, the block request streaming system that utilizes the new improvements in block integration described above may have poor performance, because all requests that are partial requests This is because it is forwarded to another server and the request is not serviced by the caching proxy server, making the goal of putting the caching proxy server first place meaningless. During the block request streaming process as described above, the client can request a block at the beginning of the file at some point.

ここで説明される新規の方法によれば、ある条件が満たされた場合は常に、そのような要求が、ファイル中の最初のブロックに対する要求からファイル全体に対する要求へと変換され得る。ファイル全体に対する要求がキャッシングプロキシサーバによって受信されると、プロキシサーバは通常、応答を記憶する。したがって、完全なファイルに対するものかまたは部分的な要求かにかかわらず、後続の要求がキャッシングプロキシサーバによって直接サービスされ得るように、これらの要求の使用によって、ファイルがローカルキャッシングプロキシサーバのキャッシュに運ばれる。上記の条件は、所与のファイルと関連付けられる要求のセット、たとえば、問題のコンテンツアイテムを見ているクライアントのセットによって生成される要求のセットの間で、条件がこれらの要求の少なくとも与えられた断片に対して満たされるというものであり得る。 According to the novel method described herein, whenever a condition is met, such a request can be translated from a request for the first block in the file to a request for the entire file. When a request for the entire file is received by the caching proxy server, the proxy server typically stores the response. Thus, the use of these requests, whether subsequent to a complete file or partial request, can be serviced directly by the caching proxy server, brings the file to the local caching proxy server cache. It is. The above conditions are given at least for these requests between a set of requests associated with a given file, eg, a set of requests generated by a set of clients viewing the content item in question It can be filled against fragments.

適切な条件の例は、ランダムに選ばれた数が与えられた閾値を上回るというものである。この閾値は、ファイル全体の要求への単一のブロック要求の変換が、平均して、要求の与えられた断片に対して、たとえば10回に1回(この場合、ランダムな数が間隔[0,1]から選ばれてよく、閾値は0.9であり得る)発生するように、設定され得る。適切な条件の別の例は、ブロックと関連付けられる何らかの情報およびクライアントと関連付けられる何らかの情報に対して計算されるハッシュ関数が、値の提供されたセットのうちの1つをとることである。この方法は、頻繁に要求されるファイルに対して、ファイルがローカルのプロキシサーバのキャッシュへと運ばれるという利点を有するが、ブロック要求ストリーミングシステムの動作は、各要求が単一のブロックに対するものである標準的な動作からは大きく変更されない。多くの事例では、単一のブロック要求からファイル全体の要求への要求の変更が起きる場合、クライアント手順は別様に、ファイル内の他のブロックを要求することを続ける。そうである場合、問題のブロックがファイル全体の要求の結果としていずれの場合にも受信されるので、そのような要求は抑制され得る。 An example of a suitable condition is that a randomly chosen number is above a given threshold. This threshold is such that the conversion of a single block request to an entire file request, on average, for a given piece of request, for example, once in 10 (in this case, a random number is the interval [0 , 1], and the threshold may be 0.9). Another example of a suitable condition is that a hash function calculated for some information associated with the block and some information associated with the client takes one of the provided sets of values. Although this method has the advantage that for frequently requested files, the file is carried to the local proxy server cache, the operation of the block request streaming system is that each request is for a single block. There is no significant change from some standard behavior. In many cases, if a request change occurs from a single block request to an entire file request, the client procedure will continue to request other blocks in the file otherwise. If so, such a request may be suppressed because the block in question is received in either case as a result of the request for the entire file.

URL構築およびセグメントリスト生成および探索
セグメントリスト生成は、オンデマンドの場合のメディアプレゼンテーションの開始に対する、または実時間で表される、何らかの開始時間starttimeで開始する特定の表現に対して、特定のクライアントローカル時間NOWにおいてMPDからクライアントがセグメントリストをどのように生成し得るかという問題を扱う。セグメントリストは、メディアセグメントのリストとともに、ロケータ、たとえば、任意選択の初期の表現メタデータに対するURLを含み得る。各メディアセグメントは、starttime、継続時間、およびロケータを割り当てられていてよい。starttimeは通常、セグメント中の含まれるメディアのメディア時間の概略値を表すが、必ずしもサンプルの正確な時間を表さない。starttimeは、適切なときにダウンロード要求を出すために、HTTPストリーミングクライアントにより使用される。各々の開始時間を含むセグメントリストの生成は、様々な方法で行われ得る。URLはプレイリストとして提供されてよく、または、URL構築規則は有利には、セグメントリストの小型の表現のために使用されてよい。 URL construction and segment list generation and exploration Segment list generation is specific client local to the start of the media presentation if on demand, or to a specific representation starting at some start time starttime, expressed in real time. It deals with the problem of how a client can generate a segment list from MPD at time NOW. The segment list may include a locator, eg, a URL for optional initial presentation metadata, along with a list of media segments. Each media segment may be assigned a starttime, duration, and locator. The starttime usually represents an approximate value for the media time of the media included in the segment, but does not necessarily represent the exact time of the sample. The starttime is used by HTTP streaming clients to issue download requests when appropriate. Generation of the segment list including each start time can be done in various ways. The URL may be provided as a playlist, or URL construction rules may be advantageously used for a compact representation of the segment list.

URL構築に基づくセグメントリストは、たとえば、MPDが、FileDynamicInfoまたは等価な信号などの、特定の属性または要素によってセグメントリストをシグナリングする場合に、実行され得る。URL構築からセグメントリストを作成するための一般的な方法が、以下の「URL構築の概要」セクションで与えられる。プレイリストベースの構築は、たとえば、異なる信号によってシグナリングされ得る。セグメントリストを探索し、正確なメディア時間に到達することも、この状況において有利に実施される。 A segment list based on URL construction may be performed, for example, when the MPD signals the segment list by specific attributes or elements, such as FileDynamicInfo or equivalent signals. A general method for creating a segment list from URL construction is given in the URL Construction Overview section below. Playlist-based construction may be signaled by different signals, for example. Searching the segment list and reaching the correct media time is also advantageously implemented in this situation.

URL構築の概要
前に説明されたように、本発明の一実施形態では、クライアントデバイスがプレゼンテーションのブロックに対するファイル識別子を構築することを可能にするURL構築規則を含むメタデータが提供され得る。ここで、URL構築規則に対する変更、利用可能な符号化物の数に対する変更、ビットレート、アスペクト比、解像度、オーディオまたはビデオのコーデックもしくはコーデックのパラメータ、または他のパラメータのような、利用可能な符号化物と関連付けられるメタデータに対する変更を提供する、ブロック要求ストリーミングシステムに対するさらなる新規の改善について説明する。 URL Construction Overview As previously described, in one embodiment of the present invention, metadata may be provided that includes URL construction rules that allow a client device to construct a file identifier for a block of a presentation. Where available encodings such as changes to URL construction rules, changes to the number of available encodings, bit rate, aspect ratio, resolution, audio or video codec or codec parameters, or other parameters A further new improvement to the block request streaming system that provides changes to associated metadata is described.

この新規の改善において、プレゼンテーション全体の中の時間間隔を示す、メタデータファイルの各要素と関連付けられる追加のデータが提供され得る。この時間間隔の中では、要素は有効であると見なされてよく、その時間間隔以外では、要素は無視されてよい。さらに、一度だけ、または多くとも一度現れることが以前に許可された要素が複数回現れ得るように、メタデータのシンタックスが増強され得る。この場合、そのような要素に対して、規定された時間間隔がばらばらでなければならないということを規定する、追加の制約が提供され得る。任意の所与の時間の瞬間において、その時間間隔が所与の時間の瞬間を含む要素のみを考慮することは、元のメタデータシンタックスと矛盾しないメタデータファイルをもたらす。そのような時間間隔を、有効間隔と呼ぶ。したがって、この方法は、上で説明された種類の単一のメタデータファイルの変更の中でのシグナリングを実現する。有利には、そのような方法は、プレゼンテーション内の規定された点における、説明された種類の変更をサポートするメディアプレゼンテーションを提供するために使用され得る。 In this new improvement, additional data associated with each element of the metadata file can be provided that indicates the time intervals within the entire presentation. Within this time interval, the element may be considered valid and outside of that time interval, the element may be ignored. Furthermore, the syntax of metadata can be enhanced so that an element previously allowed to appear only once or at most once may appear multiple times. In this case, an additional constraint may be provided that specifies that the defined time intervals must be disjoint for such elements. Considering only the elements whose time interval includes a given time instant at any given time instant will result in a metadata file that is consistent with the original metadata syntax. Such a time interval is called an effective interval. This method thus implements signaling within a single metadata file change of the type described above. Advantageously, such a method may be used to provide a media presentation that supports the types of changes described at defined points in the presentation.

URL構築器
本明細書で説明されるように、ブロック要求ストリーミングシステムの一般的な特徴は、利用可能なメディア符号化物を識別しそれらの符号化物からのブロックを要求するためにクライアントによって必要とされる情報を提供する、「メタデータ」をクライアントに提供する必要があるということである。たとえば、HTTPの場合、この情報は、メディアブロックを含むファイルのURLを含み得る。所与の符号化物に対するブロックのURLを一覧にする、プレイリストファイルが提供され得る。各符号化物に対して1つの、複数のプレイリストファイルが、異なる符号化物に対応するプレイリストを一覧にするプレイリストのマスタープレイリストとともに、提供される。このシステムの欠点は、メタデータが非常に大きくなることがあるので、クライアントがストリームを開始するとき、メタデータが要求されるのにある時間がかかるということである。このシステムのさらなる欠点は、メディアデータブロックに対応するファイルが、リアルタイム(ライブ)で撮影されているメディアストリーム、たとえば、生のスポーツイベントまたはニュース番組から「その場で」生成される、ライブコンテンツの場合に明白である。この場合、プレイリストファイルは、新たなブロックが利用可能になるたびに(たとえば、数秒ごとに)更新され得る。クライアントデバイスは、プレイリストファイルを繰り返しフェッチして、新たなブロックが利用可能かどうかを判定し、それらのURLを取得することができる。このことは、サービングインフラストラクチャに大きな負荷を課すことがあり、具体的には、一般には数秒のオーダーのブロックサイズに等しい更新間隔よりも長く、メタデータファイルがキャッシュされ得ないということを意味する。 URL Constructor As described herein, the general features of a block request streaming system are required by a client to identify available media encodings and request blocks from those encodings. This means that it is necessary to provide “metadata” to the client that provides the information. For example, for HTTP, this information may include the URL of the file that contains the media block. A playlist file can be provided that lists the URLs of the blocks for a given encoding. Multiple playlist files, one for each encoded product, are provided, along with a master playlist of playlists that list playlists corresponding to different encoded products. The disadvantage of this system is that it can take some time for the metadata to be requested when the client starts the stream, since the metadata can be very large. A further disadvantage of this system is that live content is generated "on the fly" from a media stream in which a file corresponding to a media data block is being filmed in real time (live), eg, a live sports event or news program. It is obvious in the case. In this case, the playlist file may be updated each time a new block becomes available (eg, every few seconds). The client device can fetch the playlist file repeatedly, determine whether new blocks are available, and obtain their URLs. This can impose a heavy load on the serving infrastructure, specifically meaning that metadata files cannot be cached longer than the update interval, which is generally equal to a block size on the order of seconds. .

ブロック要求ストリーミングシステムの1つの重要な態様は、ブロックを要求するためにファイルダウンロードプロトコルとともに使用されるべき、ファイル識別子、たとえばURLをクライアントに知らせるために使用される方法である。たとえば、プレゼンテーションの各表現に対して、メディアデータのブロックを含むファイルのURLを一覧にするプレイリストが提供される方法。この方法の欠点は、プレイリストの少なくともいくつかにおいて、再生が開始し得る前にファイル自体がダウンロードされる必要があり、チャネルザッピング時間が長くなり、したがって悪いユーザ体験を引き起こすということである。いくつかのまたは多くの表現を伴う長いメディアプレゼンテーションでは、ファイルURLのリストは大きいことがあるので、プレイリストファイルは大きいことがあり、チャネルザッピング時間がさらに長くなる。 One important aspect of the block request streaming system is the method used to inform the client of a file identifier, eg, a URL, to be used with the file download protocol to request a block. For example, for each presentation presentation, a playlist is provided that lists the URLs of files that contain blocks of media data. The disadvantage of this method is that in at least some of the playlists, the file itself needs to be downloaded before playback can begin, increasing channel zapping time and thus causing a bad user experience. For long media presentations with some or many representations, the list of file URLs can be large, so the playlist file can be large, further increasing channel zapping time.

この方法の別の欠点は、ライブコンテンツの場合に発生する。この場合、URLの完全なリストが事前に利用可能にされず、プレイリストファイルは、新たなブロックが利用可能になるにつれて定期的に更新され、クライアントは、更新されたバージョンを受信するために、プレイリストファイルを定期的に要求する。このファイルは頻繁に更新されるので、キャッシングプロキシサーバ内に長く記憶され得ない。これは、このファイルに対する要求の非常に多くが、他のサーバに、最終的にはファイルを生成するサーバに転送されることを意味する。人気のあるメディアプレゼンテーションの場合には、このことは、このサーバおよびネットワークに対する大きな負荷をもたらすことがあり、これは転じて、遅い応答時間、およびしたがって、長いチャネルザッピング時間および悪いユーザ体験をもたらし得る。最悪の場合には、サーバが過負荷になり、これにより、一部のユーザがプレゼンテーションを見られなくなる。 Another drawback of this method occurs in the case of live content. In this case, the complete list of URLs is not made available in advance, the playlist file is updated regularly as new blocks become available, and the client receives the updated version Request playlist files regularly. Because this file is updated frequently, it cannot be stored long in the caching proxy server. This means that very many requests for this file are forwarded to other servers and ultimately to the server that generates the file. In the case of popular media presentations this can result in a heavy load on this server and network, which in turn can lead to slow response times and thus long channel zapping times and bad user experiences. . In the worst case, the server is overloaded, which prevents some users from seeing the presentation.

使用され得るファイル識別子の形式に制約を課すのを避けることが、ブロック要求ストリーミングシステムの設計において望ましい。これは、多数の考慮事項は、特定の形式の識別子の使用の動機となり得るからである。たとえば、ブロックサービングインフラストラクチャがコンテンツ配信ネットワークである場合、ネットワークにわたる記憶またはサービング負荷の分配を望むことに関連するファイル名または記憶の協定、または、システムの設計時には予測され得ない特定の形式のファイル識別子につながる他の要件が存在し得る。 It is desirable in designing block request streaming systems to avoid imposing restrictions on the types of file identifiers that can be used. This is because a number of considerations can motivate the use of certain types of identifiers. For example, if the block serving infrastructure is a content delivery network, file names or storage agreements related to the desire to distribute storage or serving load across the network, or certain types of files that cannot be predicted when designing a system There may be other requirements leading to the identifier.

ここで、上で言及された欠点を軽減しつつ、適切なファイル識別情報の表現方法を選ぶための柔軟性を保持する、さらなる実施形態が説明される。この方法では、メタデータは、ファイル識別子構築規則を含む、メディアプレゼンテーションの各表現に対して提供され得る。ファイル識別子構築規則は、たとえば、テキスト文字列を含み得る。プレゼンテーションの所与のブロックに対するファイル識別子を決定するために、ファイル識別子構築規則の解釈の方法が提供されてよく、この方法は、入力パラメータの決定と、入力パラメータとともにファイル識別情報構築規則の評価とを含む。入力パラメータは、たとえば、識別されるべきファイルのインデックスを含んでよく、ここで、最初のファイルはインデックス0を有し、2番目のファイルはインデックス1を有し、3番目のファイルはインデックス2を有し、以下同様である。たとえば、各ファイルが同じ時間の長さ(またはほぼ同じ時間の長さ)にわたる場合、プレゼンテーション内の任意の所与の時間と関連付けられるファイルのインデックスは、簡単に決定され得る。あるいは、各ファイルがまたがるプレゼンテーション内の時間は、プレゼンテーションまたはバージョンメタデータ内で提供され得る。 Now, further embodiments will be described that retain the flexibility to choose an appropriate file identification information representation method while mitigating the drawbacks mentioned above. In this way, metadata can be provided for each representation of the media presentation, including file identifier construction rules. The file identifier construction rule may include a text string, for example. In order to determine the file identifier for a given block of the presentation, a method of interpretation of the file identifier construction rules may be provided, which includes determining input parameters and evaluating the file identification information construction rules along with the input parameters. including. The input parameters may include, for example, the index of the file to be identified, where the first file has index 0, the second file has index 1, and the third file has index 2. The same applies hereinafter. For example, if each file spans the same amount of time (or approximately the same amount of time), the index of the file associated with any given time in the presentation can be easily determined. Alternatively, the time within the presentation that each file spans can be provided in the presentation or version metadata.

一実施形態では、ファイル識別子構築規則は、入力パラメータに対応する何らかの空間識別子を含み得るテキスト文字列を含み得る。ファイル識別子構築規則の評価の方法は、テキスト文字列内の空間識別子の位置を決定するステップと、対応する入力パラメータの値の文字列表現によって、各々のそのような空間識別子を置き換えるステップとを含む。 In one embodiment, the file identifier construction rule may include a text string that may include some spatial identifier corresponding to the input parameter. A method for evaluating a file identifier construction rule includes determining a position of a spatial identifier within a text string and replacing each such spatial identifier with a string representation of the value of the corresponding input parameter. .

別の実施形態では、ファイル識別子構築規則は、式言語に準拠するテキスト文字列を含み得る。式言語は、その言語における式が準拠し得るシンタックスの定義と、シンタックスに準拠する文字列を評価するための規則のセットとを含む。 In another embodiment, the file identifier construction rules may include text strings that conform to the formula language. The expression language includes a definition of syntax that an expression in the language can conform to, and a set of rules for evaluating strings that conform to the syntax.

ここで、図21以下を参照して、具体的な例が説明される。拡張バッカスナウア記法で定義される、適切な式言語に対するシンタックスの定義の例は、図21に示される通りである。図21の<expression>プロダクションに準拠する文字列を評価するための規則の例は、<expression>プロダクションに準拠する文字列を、次のように、<literal>プロダクションに準拠する文字列へと再帰的に変換することを含む。 Here, a specific example will be described with reference to FIG. An example of the syntax definition for an appropriate formula language defined in the extended Bacchus-Naur notation is as shown in FIG. An example rule for evaluating a string that conforms to the <expression> production in Figure 21 is to recurse a string that conforms to the <expression> production into a string that conforms to the <literal> production, as follows: Conversion.

<literal>プロダクションに準拠する<expression>は変更されない。 <expression> conforming to <literal> production is not changed.

<variable>プロダクションに準拠する<expression>は、<variable>プロダクションの<token>文字列によって識別される変数の値によって置き換えられる。 The <expression> that conforms to the <variable> production is replaced by the value of the variable identified by the <token> string in the <variable> production.

<function>プロダクションに準拠する<expression>は、これらの規則に従って引数の各々を評価し、以下で説明されるように<function>プロダクションの<token>要素に応じてこれらの引数に変換を適用することによって、評価される。 An <expression> that conforms to a <function> production evaluates each of the arguments according to these rules, and applies transformations to these arguments according to the <token> element of the <function> production, as described below. Is evaluated.

<expression>プロダクションの最後の代替形態に準拠する<expression>は、2つの<expression>要素を評価して、以下で説明されるように<expression>プロダクションの最後の代替形態の<operator>要素に応じて動作をこれらの引数に適用することによって、評価される。 An <expression> that conforms to the last alternative form of an <expression> production evaluates two <expression> elements and becomes the <operator> element of the last alternative form of an <expression> production, as described below. It is evaluated by applying actions to these arguments accordingly.

上で説明される方法では、複数の変数が定義され得る状況において評価が行われると仮定される。変数は(名前、値)のペアであり、「名前」は<token>プロダクションに準拠する文字列であり、「値」は<literal>プロダクションに準拠する文字列である。いくつかの変数は、評価が開始する前に評価プロセスの外側で定義され得る。他の変数は、評価プロセス自体の中で定義され得る。1つだけの変数が各々のあり得る「名前」とともに存在するという意味で、すべての変数は「グローバル」である。 In the method described above, it is assumed that the evaluation is performed in a situation where multiple variables can be defined. A variable is a (name, value) pair, “name” is a string that conforms to <token> production, and “value” is a string that conforms to <literal> production. Some variables may be defined outside the evaluation process before the evaluation begins. Other variables can be defined within the evaluation process itself. All variables are "global" in the sense that only one variable exists with each possible "name".

関数の例は、「printf」関数である。この関数は、1つまたは複数の引数を許容する。第1の引数は、<string>プロダクション(以後「文字列」)に準拠し得る。printf関数は、その第1の引数の変換されたバージョンとして評価される。適用される変換は、C標準ライブラリの「printf」関数と同じであり、<function>プロダクションに含まれる追加の引数が、C標準ライブラリのprintf関数によって要求される追加の引数を提供する。 An example of a function is the “printf” function. This function accepts one or more arguments. The first argument may conform to <string> production (hereinafter “string”). The printf function is evaluated as a converted version of its first argument. The applied transformation is the same as the “printf” function in the C standard library, and the additional arguments included in the <function> production provide the additional arguments required by the C standard library printf function.

関数の別の例は、「hash」関数である。この関数は、2つの引数を許容し、第1の引数は文字列であってよく、第2の引数は<number>プロダクション(以後「数」)に準拠してよい。「hash」関数は、ハッシュアルゴリズムを第1の引数に適用し、第2の引数未満の非負の整数である結果を返す。適切なハッシュ関数の例は、図22に示されるC関数において与えられ、このC関数の引数は、入力文字列(囲いの引用符を除く)および数値的な入力値である。ハッシュ関数の他の例は、当業者にはよく知られている。 Another example of a function is the “hash” function. This function accepts two arguments, the first argument may be a string, and the second argument may conform to <number> production (hereinafter “number”). The “hash” function applies a hash algorithm to the first argument and returns a result that is a non-negative integer less than the second argument. An example of a suitable hash function is given in the C function shown in FIG. 22, where the arguments of the C function are an input string (excluding the enclosing quotes) and a numerical input value. Other examples of hash functions are well known to those skilled in the art.

関数の別の例は、1つ、2つ、または3つの文字列引数をとる「Subst」関数である。1つの引数が与えられる場合、「Subst」関数の結果は第1の引数である。2つの引数が与えられる場合、「Subst」関数の結果は、第1の引数内での第2の引数(囲いの引用符を除く)のあらゆる存在をなくし、そのように修正された第1の引数を返すことによって、計算される。3つの引数が与えられる場合、「Subst」関数の結果は、第1の引数内での第2の引数(囲いの引用符を除く)のあらゆる存在を第3の引数(囲いの引用符を除く)によって置き換え、そのように修正された第1の引数を返すことによって、計算される。 Another example of a function is the “Subst” function that takes one, two, or three string arguments. If one argument is given, the result of the “Subst” function is the first argument. If two arguments are given, the result of the "Subst" function will eliminate any presence of the second argument (except the enclosing quotes) within the first argument, and the first so modified Calculated by returning an argument. If three arguments are given, the result of the "Subst" function is the presence of any second argument (excluding the enclosing quotes) in the first argument and the third argument (excluding the enclosing quotes) ) And return the first argument so modified.

演算子のいくつかの例は、<operator>プロダクション「+」、「-」、「/」、「*」、「%」によってそれぞれ識別される、加算、減算、除算、乗算、およびモジュロ演算子である。これらの演算子は、<operator>プロダクションの両側の<expression>プロダクションが数字として評価されることを要求する。演算子の評価は、適切な算術演算(それぞれ加算、減算、除算、乗算、およびモジュロ)を通常の方式でこれらの2つの数に適用し、<number>プロダクションに準拠する形式で結果を返すことを含む。 Some examples of operators are the add, subtract, divide, multiply, and modulo operators, identified by the <operator> production "+", "-", "/", "*", "%" respectively. It is. These operators require the <expression> production on either side of the <operator> production to be evaluated as a number. Operator evaluation applies the appropriate arithmetic operations (addition, subtraction, division, multiplication, and modulo, respectively) to these two numbers in the usual way and returns the result in a format that conforms to <number> production. including.

演算子の別の例は、<operator>プロダクション「=」によって識別される割当て演算子である。この演算子は、その内容が<token>プロダクションに準拠する文字列として左側の引数が評価されることを要求する。文字列の内容は、囲いの引用符内の文字列として定義される。等号演算子は、その名前が左側の引数の内容に等しい<token>である変数が、右側の引数を評価した結果に等しい値へと割り当てられるようにする。この値はまた、演算子の式を評価した結果である。 Another example of an operator is an assignment operator identified by the <operator> production “=”. This operator requires that the left argument be evaluated as a string whose contents conform to the <token> production. The content of the string is defined as the string within the enclosing quotes. The equals operator causes a variable whose name is <token> equal to the contents of the left argument to be assigned a value equal to the result of evaluating the right argument. This value is also the result of evaluating the operator expression.

演算子の別の例は、<operator>プロダクション「;」によって識別される順序演算子である。この演算子の評価の結果は、右側の引数である。すべての演算子のように、両方の引数が評価され、左側の引数が最初に評価されることに留意されたい。 Another example of an operator is the ordinal operator identified by the <operator> production “;”. The result of the evaluation of this operator is the right argument. Note that, like all operators, both arguments are evaluated and the left argument is evaluated first.

本発明の一実施形態では、ファイルの識別子は、要求されるファイルを識別する入力変数の特定のセットによって、上記の規則に従ったファイル識別子構築規則を評価することによって、取得され得る。入力変数の例は、名前「インデックス」と、プレゼンテーション内のファイルの数値的なインデックスに等しい値とを伴う変数である。入力変数の別の例は、名前「ビットレート」と、プレゼンテーション要求されたバージョンの平均のビットレートに等しい値とを伴う変数である。 In one embodiment of the present invention, the file identifier may be obtained by evaluating a file identifier construction rule according to the above rules with a specific set of input variables identifying the requested file. An example of an input variable is a variable with the name “index” and a value equal to the numerical index of the file in the presentation. Another example of an input variable is a variable with the name “bit rate” and a value equal to the average bit rate of the requested version of the presentation.

図23は、ファイル識別子構築規則のいくつかの例を示し、ここで、入力変数は、所望されるプレゼンテーションの表現の識別子を与える「id」、および、ファイルの順序番号を与える「seq」である。 FIG. 23 shows some examples of file identifier construction rules, where the input variables are “id” giving the identifier of the desired presentation representation and “seq” giving the sequence number of the file .

本開示を読んだ当業者には明らかなように、上の方法の多数の変形が可能である。たとえば、上で説明された関数および演算子のすべてが与えられるとは限らず、または、追加の関数または演算子が与えられることがある。 Numerous variations of the above method are possible, as will be apparent to those skilled in the art after reading this disclosure. For example, not all of the functions and operators described above may be provided, or additional functions or operators may be provided.

URL構築規則およびタイミング URL construction rules and timing

このセクションは、表現およびメディアプレゼンテーションの中で、ファイルまたはセグメントURI、さらには各セグメントの開始時間を割り当てるための、基本的なURI構築規則を提供する。 This section provides basic URI construction rules for assigning file or segment URIs, as well as start times for each segment, in presentations and media presentations.

この節では、クライアントにおけるmedia presentation descriptionの利用可能性が仮定される。 In this section, the availability of media presentation description at the client is assumed.

HTTPストリーミングクライアントが、メディアプレゼンテーション内でダウンロードされるメディアを再生していると仮定する。HTTPクライアントの実際のプレゼンテーション時間は、プレゼンテーション時間がプレゼンテーションの始めに対してどこにあるかに関して定義され得る。初期化のときには、プレゼンテーション時間t=0が仮定され得る。 Suppose an HTTP streaming client is playing media that is downloaded within a media presentation. The actual presentation time of the HTTP client can be defined with respect to where the presentation time is relative to the beginning of the presentation. At initialization, a presentation time t = 0 can be assumed.

任意の点tにおいて、HTTPクライアントは、実際のプレゼンテーション時間tより最大でMaximumClientPreBufferTime前にある再生時間tP(プレゼンテーションの始めに対するものでもある)を伴う任意のデータ、および、ユーザ対話、たとえば、探索、早送りなどによる、必要とされる任意のデータをダウンロードすることができる。いくつかの実施形態では、MaximumClientPreBufferTimeは、クライアントが制約を伴わずに現在の再生時間tPの前にデータをダウンロードできるという意味で、規定すらされないことがある。 At any point t, the HTTP client can accept any data with a playback time tP (also for the beginning of the presentation) that is at most MaximumClientPreBufferTime before the actual presentation time t and user interaction, eg search, fast forward Any data required can be downloaded. In some embodiments, MaximumClientPreBufferTime may not even be defined in the sense that the client can download data before the current playback time tP without constraints.

HTTPクライアントは、不必要なデータのダウンロードを避けることができ、たとえば、再生されることが予想されない表現からの任意のセグメントは通常、ダウンロードされなくてよい。 An HTTP client can avoid downloading unnecessary data, for example, any segment from a representation that is not expected to be played typically will not be downloaded.

ストリーミングサービスを提供する際の基本プロセスは、たとえば、HTTP GET要求またはHTTP partial GET要求を使用することによって、全体のファイル/セグメントまたはファイル/セグメントのサブセットをダウンロードするための、適切な要求の生成によるデータのダウンロードであり得る。この説明は、特定の再生時間tPに対するデータにどのようにアクセスするかを扱うが、一般に、クライアントは、非効率な要求を避けるために、より長い時間範囲の再生時間に対するデータをダウンロードすることができる。HTTPクライアントは、ストリーミングサービスを提供する際に、HTTP要求の数/頻度を最小限にすることができる。 The basic process in providing a streaming service is by generating an appropriate request to download the entire file / segment or a subset of files / segments, for example by using an HTTP GET request or an HTTP partial GET request It can be a data download. Although this description deals with how to access data for a particular playback time tP, in general, clients may download data for playback times in a longer time range to avoid inefficient requests. it can. HTTP clients can minimize the number / frequency of HTTP requests when providing streaming services.

特定の表現の中の、再生時間tP、または少なくとも再生時間tPに近い時間におけるメディアデータにアクセスするために、クライアントは、この再生時間を含むファイルに対するURLを決定し、加えて、この再生時間にアクセスするためにファイル中のバイト範囲を決定する。 In order to access media data at a playback time tP, or at least close to the playback time tP, in a particular representation, the client determines the URL for the file containing this playback time and in addition to this playback time Determine the byte range in the file to access.

Media Presentation Descriptionは、たとえば、RepresentationID属性の使用によって、表現id rを、各表現に割り当てることができる。言い換えると、MPDのコンテンツは、取込システムによって書き込まれると、またはクライアントによって読み取られると、割当てが存在するように解釈される。id rを伴う特定の表現の、特定の再生時間tPに対するデータをダウンロードするために、クライアントは、ファイルの適切なURIを構築することができる。 In Media Presentation Description, for example, the expression id r can be assigned to each expression by using the RepresentationID attribute. In other words, when the MPD content is written by the capture system or read by the client, it is interpreted as having an assignment. In order to download data for a particular representation with id r for a particular playback time tP, the client can construct an appropriate URI for the file.

Media Presentation Descriptionは、次の属性を各表現rの各ファイルまたはセグメントに割り当てることができる。 Media Presentation Description can assign the following attributes to each file or segment of each expression r.

(a)i=1,2,…,Nrである、表現r内のファイルの順序番号i、(b)ts(r,i)として定義される、プレゼンテーション時間に対する表現id rおよびファイルインデックスiを伴うファイルの相対的な開始時間、(c)FileURI(r,i)として示される、表現id rおよびファイルインデックスiを伴うファイル/セグメントのファイルURI。 (a) i = 1,2, ..., Nr, the order number i of the file in the expression r, and the expression id r and the file index i defined for the presentation time defined as (b) ts (r, i) The relative start time of the accompanying file, (c) File URI of the file / segment with representation id r and file index i, denoted as FileURI (r, i).

一実施形態では、ファイルの開始時間およびファイルURIは、表現に対して明示的に提供され得る。別の実施形態では、ファイルURIのリストが明示的に提供されてよく、このとき、リスト中の位置に従ってインデックスiを各ファイルURIが固有に割り当てられ、セグメントの開始時間が1からi-1までのセグメントに対するすべてのセグメントの継続時間の合計として導出される。各セグメントの継続時間は、上で論じられた規則のいずれかに従って提供され得る。たとえば、基本的な数学の当業者は、他の方法を使用して、単一の要素または属性、および表現中のファイルURIの位置/インデックスから、容易に開始時間を導出するための方法を導出することができる。 In one embodiment, the file start time and file URI may be explicitly provided for the representation. In another embodiment, a list of file URIs may be explicitly provided, where each file URI is uniquely assigned according to its position in the list and the segment start time is from 1 to i-1. Is derived as the sum of the durations of all segments for that segment. The duration of each segment can be provided according to any of the rules discussed above. For example, a person skilled in basic mathematics can use other methods to derive a method to easily derive the start time from a single element or attribute and the location / index of the file URI in the representation. can do.

動的なURI構築規則がMPDにおいて提供される場合、各ファイルの開始時間および各ファイルのURIは、構築規則、要求されるファイルのインデックス、および、場合によってはmedia presentation descriptionにおいて提供される何らかの追加のパラメータを使用することによって、動的に構築され得る。情報は、たとえば、FileURIPatternおよびFileInfoDynamicのような、MPDの属性および要素において提供され得る。FileURIPatternは、ファイルインデックス順序番号iおよび表現ID rに基づいてURIをどのように構築するかについての情報を提供する。FileURIFormatは次のように構築される。 If dynamic URI construction rules are provided in the MPD, the start time of each file and the URI of each file are added in the construction rules, the index of the requested file, and possibly any media presentation description Can be built dynamically by using Information can be provided in MPD attributes and elements such as FileURIPattern and FileInfoDynamic, for example. FileURIPattern provides information on how to construct a URI based on file index sequence number i and expression ID r. FileURIFormat is constructed as follows:

FileURIFormat=sprintf("%s%s%s%s%s.%s",BaseURI,BaseFileName, FileURIFormat = sprintf ("% s% s% s% s% s.% S", BaseURI, BaseFileName,

RepresentationIDFormat,SeparatorFormat, RepresentationIDFormat, SeparatorFormat,

FileSequenceIDFormat,FileExtension); FileSequenceIDFormat, FileExtension);

また、FileURI(r,i)は次のように構築される。 FileURI (r, i) is constructed as follows.

FileURI(r,i)=sprintf(FileURIFormat,r,i); FileURI (r, i) = sprintf (FileURIFormat, r, i);

各ファイル/セグメントに対する相対的な開始時間ts(r,i)は、この表現中のセグメントの継続時間を記述する、MPDに含まれる何らかの属性、たとえば、FileInfoDynamic属性によって導出され得る。MPDはまた、上で規定されたのと同じ方法で、メディアプレゼンテーション中のすべてのプレゼンテーションに対して、または、少なくともある期間のすべての表現に対してグローバルである、FileInfoDynamic属性の順序を含み得る。表現r中の特定の再生時間tPに対するメディアデータが要求される場合、このインデックスの再生時間がts(r,i(r,tP))とts(r,i(r,tP)+1)の間にあるように、対応するインデックスi(r,tP)がi(r,t_p)として導出され得る。セグメントへのアクセスはさらに、上の事例によって制約されることがあり、たとえば、セグメントはアクセス可能ではない。 The relative start time ts (r, i) for each file / segment may be derived by some attribute included in the MPD that describes the duration of the segment in this representation, eg, the FileInfoDynamic attribute. The MPD may also include an order of FileInfoDynamic attributes that is global for all presentations in the media presentation, or at least for all representations over a period of time, in the same manner as defined above. If media data for a particular playback time tP in the expression r is requested, the playback time of this index is ts (r, i (r, tP)) and ts (r, i (r, tP) +1) As in between, the corresponding index i (r, tP) can be derived as i (r, t _p ). Access to a segment may be further restricted by the above case, for example, the segment is not accessible.

対応するセグメントのインデックスおよびURIが取得されてから正確な再生時間tPにアクセスすることは、実際のセグメントのフォーマットに依存する。この例では、メディアセグメントが一般性を失うことなく0で開始するローカルの時間軸を有すると仮定する。再生時間tPにおいてデータにアクセスしデータを提示するために、クライアントは、i=i(r,t_p)であるURI FileURI(r,i)を通じてアクセスされ得るファイル/セグメントから、ローカルの時間に対応するデータをダウンロードすることができる。 Accessing the exact playback time tP after the corresponding segment index and URI are obtained depends on the actual segment format. In this example, assume that the media segment has a local time axis that starts at 0 without loss of generality. To access and present data at playback time tP, the client supports local time from a file / segment that can be accessed through URI FileURI (r, i) where i = i (r, t _p ) Data to download.

一般に、クライアントは、ファイル全体をダウンロードすることができ、次いで、再生時間tPにアクセスすることができる。しかしながら、必ずしも3GPファイルのすべての情報がダウンロードされる必要はなく、それは、3GPファイルが、ローカルのタイミングをバイト範囲と対応付けるための構造を提供するからである。したがって、十分なランダムアクセス情報が利用可能である限り、再生時間tPにアクセスするための特定のバイト範囲だけで、メディアを再生するには十分であり得る。構造についての十分な情報、バイト範囲の対応付け、およびメディアセグメントのローカルなタイミングも、たとえば、セグメントインデックスを使用して、セグメントの最初の部分で提供され得る。セグメントの最初の、たとえば1200バイトへのアクセス権を有することによって、クライアントは、再生時間tPに対して必要なバイト範囲に直接アクセスするための十分な情報を有し得る。 In general, the client can download the entire file and then access the playback time tP. However, not all information in a 3GP file need be downloaded, because the 3GP file provides a structure for associating local timing with byte ranges. Thus, as long as sufficient random access information is available, only a specific byte range for accessing the playback time tP may be sufficient to play the media. Sufficient information about the structure, byte range mapping, and media segment local timing may also be provided in the first part of the segment using, for example, a segment index. By having access to the first, eg 1200 bytes, of the segment, the client may have enough information to directly access the required byte range for the playback time tP.

さらなる例では、以下のように場合によっては「tidx」ボックスとして規定されるセグメントインデックスが、要求される1つまたは複数のフラグメントのバイトオフセットを特定するために使用され得ると仮定する。Partial GET要求が、要求される1つまたは複数のフラグメントに対して形成され得る。他の代替形態が存在し、たとえば、クライアントは、ファイルに対する標準的な要求を出し、第1の「tidx」ボックスが受信されたときにこれを取り消すことができる。 In a further example, assume that a segment index, sometimes defined as a “tidx” box, may be used to identify the byte offset of the requested fragment or fragments, as follows. A Partial GET request may be formed for the requested fragment or fragments. Other alternatives exist, for example, the client can issue a standard request for the file and cancel it when the first “tidx” box is received.

探索
クライアントは、表現中の特定のプレゼンテーション時間tpを探索することを試み得る。MPDに基づいて、クライアントは、表現中の各セグメントのメディアセグメント開始時間およびメディアセグメントURLへのアクセス権を有する。クライアントは、開始時間tS(r,i)がプレゼンテーション時間tp以下となる最大のセグメントインデックスiとして、プレゼンテーション時間tpに対するメディアサンプルを含む可能性が最も高いセグメントのセグメントインデックスsegment_index、すなわち、segment_index=max{i|tS(r,i)<=tp}を得ることができる。セグメントURLは、FileURI(r,i)として取得される。 Search The client may attempt to search for a specific presentation time tp in the representation. Based on the MPD, the client has access to the media segment start time and media segment URL for each segment in the representation. The client uses the segment index segment_index of the segment most likely to include a media sample for the presentation time tp as the largest segment index i whose start time tS (r, i) is less than or equal to the presentation time tp, that is, segment_index = max { i | tS (r, i) <= tp} can be obtained. The segment URL is acquired as FileURI (r, i).

MPD中のタイミング情報は、ランダムアクセスポイントの配置、メディアトラックの整列、およびメディアのタイミングのドリフトに関連する問題が原因で、概略的であり得ることに留意されたい。結果として、上記の手順により識別されるセグメントは、tpのわずかに後の時間で開始することがあり、プレゼンテーション時間tpに対するメディアデータは前のメディアセグメント中にあることがある。探索の場合、探索時間は、取り出されたファイルの最初のサンプル時間に等しくなるように更新されてよく、または、先行するファイルが代わりに取り出されてよい。しかしながら、代替的な表現/バージョン間での切り替えが存在する場合を含めて、連続的な再生の間に、tpと取り出されたセグメントの始めとの間の時間に対するメディアデータはそれでも利用可能であることに留意されたい。 Note that the timing information in the MPD may be approximate due to issues related to random access point placement, media track alignment, and media timing drift. As a result, the segment identified by the above procedure may start slightly later than tp, and the media data for the presentation time tp may be in the previous media segment. In the case of searching, the search time may be updated to be equal to the first sample time of the retrieved file, or the preceding file may be retrieved instead. However, media data for the time between tp and the beginning of the retrieved segment is still available during continuous playback, including when there is a switch between alternative representations / versions. Please note that.

プレゼンテーション時間tpに対する正確な探索のために、HTTPストリーミングクライアントは、ランダムアクセスポイント(RAP)にアクセスする必要がある。3GPP適応HTTPストリーミングの場合に、メディアセグメント中のランダムアクセスポイントを決定するために、クライアントは、たとえば、「tidx」または「sidx」ボックス中の情報を使用して、存在する場合、ランダムアクセスポイントと、メディアプレゼンテーション中の対応するプレゼンテーション時間とを位置決めすることができる。セグメントが3GPPムービーフラグメントである場合、クライアントが、「moof」および「mdat」ボックス内の情報を使用して、たとえば、RAPを位置決めして、ムービーフラグメント中の情報からの必要なプレゼンテーション時間と、MPDから導出されるセグメント開始時間とを取得することも可能である。要求されたプレゼンテーション時間tpよりも前のプレゼンテーション時間を伴うRAPが利用可能ではない場合、クライアントは、前のセグメントにアクセスすることができ、または、探索の結果として最初のランダムアクセスポイントだけを使用することができる。メディアセグメントがRAPで開始する場合、これらの手順は単純である。 For an accurate search for the presentation time tp, the HTTP streaming client needs to access a random access point (RAP). In the case of 3GPP-adaptive HTTP streaming, to determine a random access point in the media segment, the client uses, for example, information in the “tidx” or “sidx” box to identify the random access point, if present. , A corresponding presentation time during the media presentation can be located. If the segment is a 3GPP movie fragment, the client uses the information in the "moof" and "mdat" boxes to locate the RAP, for example, the required presentation time from the information in the movie fragment, and the MPD It is also possible to obtain the segment start time derived from If a RAP with a presentation time prior to the requested presentation time tp is not available, the client can access the previous segment or use only the first random access point as a result of the search be able to. These procedures are simple if the media segment starts with a RAP.

メディアセグメントのすべての情報が必ずしもプレゼンテーション時間tpにアクセスするためにダウンロードされる必要はないことにも留意されたい。クライアントは、たとえば、バイト範囲要求を使用して、メディアセグメントの始めから「tidx」または「sidx」ボックスを最初に要求することができる。「tidx」または「sidx」ボックスの使用によって、セグメントのタイミングが、セグメントのバイト範囲と対応付けられ得る。partial HTTP要求を継続的に使用することによって、メディアセグメントの関連する部分のみが、改善されたユーザ体験および少ない始動遅延のためにアクセスされ得る。 Note also that not all information in the media segment need necessarily be downloaded to access the presentation time tp. The client may first request a “tidx” or “sidx” box from the beginning of the media segment, for example, using a byte range request. By using the “tidx” or “sidx” box, the timing of the segment can be associated with the byte range of the segment. By using partial HTTP requests continuously, only the relevant part of the media segment can be accessed for improved user experience and low start-up delay.

セグメントリストの生成
本明細書で説明されるように、durというシグナリングされた概略的なセグメント継続時間を有する表現に対するセグメントのリストを作成するためにMPDによって提供される情報を使用する、単純なHTTPストリーミングクライアントをどのように実装すべきかは明らかであろう。いくつかの実施形態では、クライアントは、表現の連続的なインデックスi=1,2,3,…内にメディアセグメントを割り当てることができ、すなわち、第1のメディアセグメントはインデックスi=1を割り当てられ、第2のメディアセグメントがインデックスi=2を割り当てられ、以下同様である。次いで、セグメントインデックスiを伴うメディアセグメントのリストはstartTime[i]を割り当てられ、URL[i]がたとえば次のように生成される。まず、インデックスiが1に設定される。第1のメディアセグメントの開始時間は0として取得され、startTime[1]=0である。メディアセグメントiのURL、URL[i]は、FileURI(r,i)として取得される。インデックスiを伴うすべての記述されたメディアセグメントに対してこのプロセスが続けられ、メディアセグメントiのstartTime[i]は(i-1)*durとして取得され、URL[i]はFileURI(r,i)として取得される。 Segment List Generation A simple HTTP that uses the information provided by the MPD to create a list of segments for a representation with a signaled approximate segment duration, dur, as described herein. It should be clear how a streaming client should be implemented. In some embodiments, the client can assign media segments within the continuous index i = 1,2,3, ... of the representation, ie, the first media segment is assigned index i = 1. , The second media segment is assigned index i = 2, and so on. The list of media segments with segment index i is then assigned startTime [i], and URL [i] is generated, for example: First, index i is set to 1. The start time of the first media segment is acquired as 0, and startTime [1] = 0. The URL and URL [i] of the media segment i are acquired as FileURI (r, i). This process continues for all described media segments with index i, the starttime [i] of media segment i is obtained as (i-1) * dur, and URL [i] is FileURI (r, i ).

同時のHTTP/TCP要求
ブロック要求ストリーミングシステムにおける1つの課題は、再生のために時間内に完全に受信され得る最高品質のブロックを常に要求することに対する要求である。しかしながら、データ到達レートは事前に知られていないことがあるので、要求されたブロックが再生されるべき時間内に到達しないということが起こり得る。これにより、メディア再生が一時停止しなければならず、悪いユーザ体験をもたらす。この問題は、データ到達レートがブロックの受信の間に落ちた場合でも、時間内に受信される可能性がより高い、より低品質の(および、よってより小さなサイズの)ブロックを要求することによる、要求すべきブロックの選択についての保守的な手法を採用するクライアントアルゴリズムによって、軽減され得る。しかしながら、この保守的な手法は、ユーザまたは宛先デバイスに対して、悪いユーザ体験でもあるより低品質の再生を配信する可能性があるという欠点を有する。問題は、複数のHTTP接続が以下で説明されるように異なるブロックをダウンロードするために同時に使用されるときには大きくなることがあり、それは、利用可能なネットワークリソースが複数の接続にわたって共有され、したがって、異なる再生時間を伴うブロックに対して同時に使用されるからである。 Simultaneous HTTP / TCP Requests One challenge in block request streaming systems is the requirement to always request the highest quality blocks that can be completely received in time for playback. However, since the data arrival rate may not be known in advance, it may happen that the requested block does not arrive in time to be played. This should cause media playback to pause, resulting in a bad user experience. This problem is due to requiring lower quality (and hence smaller size) blocks that are more likely to be received in time, even if the data arrival rate drops during the reception of the block. Can be mitigated by a client algorithm that employs a conservative approach to the selection of blocks to request. However, this conservative approach has the disadvantage that it may deliver lower quality playback, which is also a bad user experience, to the user or destination device. The problem can be magnified when multiple HTTP connections are used simultaneously to download different blocks as described below, which means that available network resources are shared across multiple connections, thus This is because they are used simultaneously for blocks with different playback times.

クライアントが複数のブロックに対する要求を同時に出すことが有利であることがあり、この文脈では、「同時に」とは、要求に対する応答が重複する時間間隔において行われることを意味し、必ずしも、要求が厳密に、または概略的にすらも同時に行われないことがある。HTTPプロトコルの場合、この手法は、(よく知られているように)TCPプロトコルの挙動が原因で、利用可能な帯域幅の利用率を改善することができる。これは、コンテンツザッピング時間を改善するためには特に重要であり得る。それは、新たなコンテンツが初めて要求されると、それを通じてブロックに対するデータが要求される対応するHTTP/TCP接続が、開始するのが遅いことがあり、したがって、この時点でいくつかのHTTP/TCP接続を使用することで、最初のブロックのデータ配信時間を劇的に高速にすることができるからである。しかしながら、異なるHTTP/TCP接続を通じて異なるブロックまたはフラグメントを要求することは、性能の劣化にもつながり得る。それは、最初に再生されるべきブロックに対する要求が後続のブロックに対する要求と競合して、競合するHTTP/TCPダウンロードが配信速度に関して大きく変動し、したがって、要求の完了時間が大きく変わりやすいことがあり、どのHTTP/TCPダウンロードが迅速に完了し、どれがより遅いかを制御することは一般的に不可能であり、したがって、最初の数ブロックのHTTP/TCPダウンロードの少なくとも一部が完了するのが最後になり、長くおよび変わりやすいチャネルザッピング時間につながる可能性が高いからである。 It may be advantageous for a client to make requests for multiple blocks simultaneously, and in this context, “simultaneously” means that responses to requests are made in overlapping time intervals, and the requests are not necessarily exact Or even not at the same time. In the case of the HTTP protocol, this approach can improve the utilization of available bandwidth due to the behavior of the TCP protocol (as is well known). This can be particularly important for improving content zapping time. That is, when new content is requested for the first time, the corresponding HTTP / TCP connection through which data for the block is requested may be slow to start, so at this point some HTTP / TCP connections This is because the data delivery time of the first block can be dramatically increased by using. However, requesting different blocks or fragments over different HTTP / TCP connections can also lead to performance degradation. It can be that the request for the block to be played first conflicts with the request for the subsequent block, and the competing HTTP / TCP downloads vary greatly in terms of delivery speed, and therefore the request completion time can vary greatly, It's generally impossible to control which HTTP / TCP downloads complete quickly and which are slower, so at least some of the first few blocks of HTTP / TCP downloads are complete This is likely to lead to long and variable channel zapping times.

セグメントの各ブロックまたはフラグメントが、別個のHTTP/TCO接続を通じてダウンロードされ、並列接続の数がnであり、各ブロックの再生継続時間がt秒であり、セグメントと関連付けられるコンテンツのストリーミングレートがSであると仮定する。クライアントが最初にコンテンツのストリーミングを開始するとき、要求は、n*t秒のメディアデータを表す最初のnブロックに対して出され得る。 Each block or fragment of the segment is downloaded over a separate HTTP / TCO connection, the number of parallel connections is n, the playback duration of each block is t seconds, and the streaming rate of the content associated with the segment is S Assume that there is. When the client first starts streaming content, a request may be issued for the first n blocks representing n * t seconds of media data.

当業者に知られているように、TCP接続のデータレートには大きな変動がある。しかしながら、この議論を簡単にするために、最初のブロックが、要求される他のn-1個のブロックとほぼ同時に完全に受信されるように、理想的にはすべての接続が並列に進行すると仮定する。議論をさらに簡単にするために、n個のダウンロード接続によって利用される合計の帯域幅は、ダウンロードの継続時間全体に対して値Bに固定され、ストリーミングレートSは表現全体を通じて一定であると仮定する。さらに、メディアデータ構造は、ブロック全体がクライアントにおいて利用可能なときにブロックの再生が行われ得るようなものであると仮定し、すなわち、ブロックの再生は、たとえば、背後にあるビデオ符号化の構造が原因で、または、各フラグメントまたはブロックを別々に暗号化するために暗号化が利用されており、したがって全体のフラグメントまたはブロックが復号され得る前に受信される必要があるために、ブロック全体が受信された後にのみ開始できる。したがって、以下の議論を簡単にするために、ブロックのいずれかが再生され得る前に、ブロック全体が受信される必要があると仮定する。そうすると、最初のブロックが到達し再生され得るまでに必要とされる時間は、約n*t*S/Bである。 As is known to those skilled in the art, there are significant variations in the data rate of a TCP connection. However, to simplify this discussion, ideally all connections proceed in parallel so that the first block is received almost simultaneously with the other n-1 blocks required. Assume. To further simplify the discussion, assume that the total bandwidth used by n download connections is fixed at the value B for the entire download duration, and that the streaming rate S is constant throughout the representation. To do. Further, the media data structure assumes that block playback can occur when the entire block is available at the client, i.e., block playback is, for example, the structure of the underlying video encoding Because of encryption, or because encryption is used to encrypt each fragment or block separately, and therefore the entire fragment or block needs to be received before it can be decrypted, It can only start after it is received. Thus, to simplify the following discussion, assume that the entire block needs to be received before any of the blocks can be played. Then, the time required for the first block to arrive and be replayed is about n * t * S / B.

コンテンツザッピング時間を最小限にすることが望ましいので、n*t*S/Bを最小限にすることが望ましい。tの値は、背後にあるビデオ符号化構造、および取込方法がどのように利用されるかなどの要因によって決定され得るので、tはかなり小さいことがあるが、非常に小さなtの値は、過剰に複雑なセグメントマップにつながり、場合によっては、効率的なビデオの符号化および復号が使用される場合はそれらと適合しないことがある。nの値はまた、Bの値に影響を与えることがある。すなわち、Bは、より大きな接続の数nに対してはより大きいことがあり、したがって、接続の数nを減らすことには、利用される利用可能な帯域幅の量Bを減らす可能性があるという、負の副作用があり、よって、コンテンツザッピング時間を減らすという目標を達成するには効果的ではないことがある。Sの値は、ダウンロードおよび再生のためにどの表現が選ばれるかに依存し、理想的には、Sは、所与のネットワーク条件に対するメディアの再生品質を最大限にするために、可能な限りBに近くなければならない。したがって、この議論を簡単にするために、SはほぼBに等しいと仮定する。そうすると、チャネルザッピング時間はn*tに比例する。したがって、より多くの接続を利用して異なるフラグメントをダウンロードすることは、通常そうであるように、接続によって利用される合計の帯域幅が接続の数に準線形に比例する場合、チャネルザッピング時間を劣化させ得る。 Since it is desirable to minimize content zapping time, it is desirable to minimize n * t * S / B. The value of t can be determined by factors such as the underlying video coding structure and how the capture method is utilized, so t can be quite small, but a very small value of t May lead to overly complex segment maps, and in some cases may not match them if efficient video encoding and decoding is used. The value of n can also affect the value of B. That is, B may be larger for a larger number n of connections, and thus reducing the number n of connections may reduce the amount of available bandwidth B used. Negative side effects, and therefore may not be effective in achieving the goal of reducing content zapping time. The value of S depends on which representation is chosen for download and playback, and ideally S is as much as possible to maximize media playback quality for a given network condition. Must be close to B. Therefore, to simplify this discussion, assume that S is approximately equal to B. Then, the channel zapping time is proportional to n * t. Therefore, using more connections to download different fragments, as is usually the case, if the total bandwidth utilized by a connection is quasi-linearly proportional to the number of connections, the channel zapping time is reduced. Can deteriorate.

ある例として、t=1秒、n=1の場合はBの値=500Kbps、n=2の場合はBの値=700Kbps、かつn=3の場合はBの値=800Kbpsであると仮定する。S=700Kbpsを伴う表現が選ばれると仮定する。そうすると、n=1では、最初のブロックのダウンロード時間は1*700/500=1.4秒であり、n=2では、最初のブロックのダウンロード時間は2*700/700=1秒であり、n=3では、最初のブロックのダウンロード時間は3*700/800=2.625秒である。さらに、接続の数が増えるにつれて、(1つの接続でも、何らかの大きな変動がある可能性が高いが)接続の個々のダウンロード速度の変動が大きくなる可能性が高い。したがって、この例では、チャネルザッピング時間およびチャネルザッピング時間の変動は、接続の数が増えるにつれて増加する。直感的に、配信されているブロックは異なる優先順位を有し、すなわち、最初のブロックは最早の配信期限を有し、2番目のブロックは2番目に早い期限を有する、などであるが、ブロックがそれを通じて配信されているダウンロード接続は、配信の間にネットワークリソースをめぐって競合し、したがって、最早の期限を伴うブロックは、より多くの競合するブロックが要求されるとより遅れるようになる。一方、この場合でも、2つ以上のダウンロード接続を最終的に使用することは、持続可能により高いストリーミングレートをサポートすることを可能にし、たとえば、3つの接続によって、最高で800Kbpsのストリーミングレートがこの例ではサポートされ得るが、1つの接続では500Kbpsの1つのストリームしかサポートされ得ない。 As an example, assume t = 1 second, if n = 1, B value = 500 Kbps, if n = 2, B value = 700 Kbps, and if n = 3, assume B value = 800 Kbps . Suppose an expression with S = 700 Kbps is chosen. Then, for n = 1, the download time for the first block is 1 * 700/500 = 1.4 seconds, for n = 2, the download time for the first block is 2 * 700/700 = 1 second, and n = In 3, the download time of the first block is 3 * 700/800 = 2.625 seconds. In addition, as the number of connections increases, it is likely that the individual download speed variation of the connection will increase (although there is likely to be some significant variation in a single connection). Thus, in this example, channel zapping time and channel zapping time variations increase as the number of connections increases. Intuitively, the blocks being delivered have different priorities, i.e. the first block has the earliest delivery deadline, the second block has the second earliest deadline, etc. Download connections through which content is distributed compete for network resources during delivery, so blocks with the earliest deadlines become more delayed as more competing blocks are requested. On the other hand, even in this case, the final use of two or more download connections makes it possible to support a higher streaming rate sustainably, for example, with three connections, this can lead to a streaming rate of up to 800 Kbps. Although it can be supported in the example, only one stream of 500 Kbps can be supported per connection.

実際には、上で述べられたように、接続のデータレートは、同じ接続の中で時間とともに、および接続の間で、大きく変わりやすいことがあり、結果として、n個の要求されたブロックは一般に同じ時間で完了せず、実際には、あるブロックが別のブロックの半分の時間で完了し得るということが一般に起こり得る。この影響により、いくつかの場合には、最初のブロックが他のブロックよりもはるかに早く完了することがあり、かつ他の場合には最初のブロックが他のブロックよりもはるかに遅く完了することがあり、結果として、再生の開始がいくつかの場合には比較的早く起こることがあり、かつ他の場合には起こるのが遅いことがあるので、予測不可能な挙動をもたらす。この予測不可能な挙動は、ユーザを苛立たせるものであることがあるので、悪いユーザ体験であると見なされ得る。 In practice, as stated above, the data rate of a connection can vary greatly over time and between connections within the same connection, resulting in n requested blocks being It generally does not complete in the same time, and in practice it can generally happen that one block can be completed in half the time of another. This effect can cause the first block to complete much earlier than other blocks in some cases and the first block to complete much later than other blocks. As a result, the onset of playback can occur relatively early in some cases, and can occur late in other cases, resulting in unpredictable behavior. This unpredictable behavior can be annoying to the user and can be considered a bad user experience.

したがって、必要とされるのは、チャネルザッピング時間およびチャネルザッピング時間の変わりやすさを改善するために複数のTCP接続が利用され得ると同時に、可能な高品質のストリーミングレートをサポートする、方法である。やはり必要とされるのは、必要な場合に、利用可能な帯域幅のより大きな部分が最も近い再生時間を伴うブロックに割り振られ得るように、各ブロックに割り振られた利用可能な帯域幅の部分が、ブロックの再生時間が近づくにつれて調整されることを可能にするための方法である。 Therefore, what is needed is a method that supports multiple possible high quality streaming rates while multiple TCP connections can be utilized to improve channel zapping time and channel zapping time variability . What is also needed is the portion of available bandwidth allocated to each block so that, if necessary, a larger portion of available bandwidth can be allocated to the block with the closest playback time. Is a method that allows the block playback time to be adjusted as it approaches.

協調的なHTTP/TCP要求
ここで、協調的な方式で同時のHTTP/TCP要求を使用するための方法について説明する。受信機は、たとえば、複数のHTTPバイト範囲要求を使用して、複数の同時の協調的なHTTP/TCP要求を利用することができ、各々のそのような要求は、ソースセグメント中のフラグメントの一部、または、ソースセグメントのフラグメントのすべて、または、修復セグメントの修復フラグメントの一部、または、修復セグメントの修復フラグメントのすべてに対するものである。 Cooperative HTTP / TCP Requests Here, a method for using simultaneous HTTP / TCP requests in a cooperative manner is described. The receiver can utilize multiple simultaneous cooperative HTTP / TCP requests, for example using multiple HTTP byte range requests, each such request being one of the fragments in the source segment. Or all of the fragments of the source segment, or part of the repair fragment of the repair segment, or all of the repair fragments of the repair segment.

FEC修復データの使用を伴う協調的なHTTP/TCP要求の利点は、高速なチャネルザッピング時間を安定して提供するために特に重要であり得る。たとえば、チャネルザッピング時間においては、TCP接続が開始されたばかりであるか、またはある期間の間アイドル状態であった可能性が高く、その場合、輻輳ウィンドウcwndはその接続に対する最小の値にあり、したがって、これらのTCP接続の配信速度は、上昇するのにいくつかのラウンドトリップタイム(RTT)がかかり、この上昇時間の間は、異なるTCP接続上の配信速度に大きな変動がある。 The benefits of coordinated HTTP / TCP requests with the use of FEC repair data can be particularly important in order to stably provide fast channel zapping time. For example, at channel zapping time, it is likely that a TCP connection has just been initiated or has been idle for a period of time, in which case the congestion window cwnd is at the minimum value for that connection and therefore The delivery speed of these TCP connections takes some round trip time (RTT) to increase, and during this rise time there is a large variation in the delivery speed on different TCP connections.

非FEC方法の概要がここで説明され、これは、協調的なHTTP/TCP要求方法であり、ソースブロックのメディアデータのみが、複数の同時のHTTP/TCP接続を使用して要求され、すなわち、FEC修復データが要求されない。非FEC方法では、同じフラグメントの部分が、異なる接続を通じて、たとえば、フラグメントの部分に対するHTTP接続バイト範囲要求を使用して要求されるので、たとえば、各HTTPバイト範囲要求は、フラグメントに対するセグメントマップにおいて示されるバイト範囲の部分に対するものである。個々のHTTP/TCP要求が、いくつかのRTT(ラウンドトリップタイム)を通じて、利用可能な帯域幅を完全に利用するためにその配信速度を上昇させることがあり得るので、配信速度が利用可能な帯域幅よりも小さい、比較的長い期間が存在し、したがって、単一のHTTP/TCP接続がたとえば再生されるべきコンテンツの最初のフラグメントをダウンロードするために使用される場合、チャネルザッピング時間は大きいことがある。非FEC方法を使用して、異なるHTTP/TCP接続を通じて同じフラグメントの異なる部分をダウンロードすることは、チャネルザッピング時間を大きく低下させ得る。 An overview of non-FEC methods is described here, which is a cooperative HTTP / TCP request method, where only source block media data is requested using multiple simultaneous HTTP / TCP connections, i.e. FEC repair data is not required. In the non-FEC method, for example, each HTTP byte range request is indicated in the segment map for a fragment because the same fragment portion is requested through a different connection, for example using an HTTP connection byte range request for the fragment portion. For the portion of the byte range to be Bandwidth is available because individual HTTP / TCP requests can increase their delivery speed to fully utilize available bandwidth through several RTTs (round trip times). If there is a relatively long period of time that is less than the width, and therefore a single HTTP / TCP connection is used to download the first fragment of content to be played, for example, the channel zapping time may be large is there. Downloading different parts of the same fragment over different HTTP / TCP connections using non-FEC methods can greatly reduce the channel zapping time.

FEC方法の概要がここで説明され、これは協調的なHTTP/TCO要求方法であり、ソースセグメントのメディアデータおよびメディアデータから生成されたFEC修復データが、複数の同時のHTTP/TCP接続を使用して要求される。FEC方法では、同じフラグメントの部分およびそのフラグメントから生成されたFEC修復データが、異なる接続を通じて、フラグメントの部分に対するHTTP接続バイト範囲要求を使用して要求されるので、たとえば、各HTTPバイト範囲要求は、フラグメントに対するセグメントマップにおいて示されるバイト範囲の部分に対するものである。個々のHTTP/TCP要求が、いくつかのRTT(ラウンドトリップタイム)を通じて、利用可能な帯域幅を完全に利用するためにその配信速度を上昇させることがあり得るので、配信速度が利用可能な帯域幅よりも小さい、比較的長い期間が存在し、したがって、単一のHTTP/TCP接続がたとえば再生されるべきコンテンツの最初のフラグメントをダウンロードするために使用される場合、チャネルザッピング時間は大きいことがある。FEC方法を使用することは、非FEC方法と同じ利点を有し、フラグメントが復元され得る前に要求されたデータのすべてが到達する必要がなく、チャネルザッピング時間およびチャネルザッピング時間の変動がさらに減るという追加の利点を有する。異なるTCP接続を通じて要求を行うこと、および、接続の少なくとも1つでFEC修復データも要求することによる再度の要求を行うことによって、たとえば、メディア再生の開始を可能にする最初に要求されたフラグメントを復元するために十分な量のデータを配信するのにかかる時間は、大きく低減され、協調的なTCP接続およびFEC修復データが使用されなかった場合よりもはるかに安定させられ得る。 An overview of the FEC method is presented here, which is a cooperative HTTP / TCO request method, where the source segment media data and the FEC repair data generated from the media data use multiple concurrent HTTP / TCP connections As required. In the FEC method, the same fragment part and the FEC repair data generated from that fragment are requested through different connections using HTTP connection byte range requests for the fragment part, for example, each HTTP byte range request is , For the portion of the byte range indicated in the segment map for the fragment. Bandwidth is available because individual HTTP / TCP requests can increase their delivery speed to fully utilize available bandwidth through several RTTs (round trip times). If there is a relatively long period of time that is less than the width, and therefore a single HTTP / TCP connection is used to download the first fragment of content to be played, for example, the channel zapping time may be large is there. Using the FEC method has the same advantages as the non-FEC method and does not require all of the requested data to arrive before the fragment can be recovered, further reducing channel zapping time and channel zapping time variation It has the additional advantage of. By making a request over a different TCP connection and making another request by also requesting FEC repair data on at least one of the connections, for example, the first requested fragment that allows media playback to start The time taken to deliver a sufficient amount of data to recover is greatly reduced and can be made much more stable than if cooperative TCP connection and FEC repair data were not used.

図24(a)〜図24(e)は、emulated evolution data optimized(EVDO)ネットワークの同じHTTPウェブサーバからの同じクライアントへの、同じリンク上を通る5つのTCP接続の配信レートの変動を示す。図24(a)〜図24(e)では、X軸は秒単位の時間を示し、Y軸は、各接続に対して、1秒という間隔にわたって測定される、5つのTCP接続の各々を通じてクライアントにおいてビットが受信されるレートを示す。この特定の模擬においては、このリンク上を通る全体で12個のTCP接続があったので、ネットワークは示された時間の間は比較的負荷が高く、これは、2つ以上のクライアントがモバイルネットワークの同じセル内でストリーミングしている場合には典型的であり得る。配信レートは時間とともに幾分相関付けられるが、多くの時点において、5つの接続の配信レートには大きな差があることに留意されたい。 FIGS. 24 (a) to 24 (e) show the distribution rate variation of five TCP connections over the same link to the same client from the same HTTP web server in the emulated evolution data optimized (EVDO) network. In Figures 24 (a) to 24 (e), the X-axis represents time in seconds and the Y-axis represents the client through each of the five TCP connections, measured over an interval of 1 second for each connection. Indicates the rate at which bits are received. In this particular simulation, there were a total of 12 TCP connections over this link, so the network was relatively busy during the time shown, which means that two or more clients can be May be typical when streaming in the same cell. Note that the delivery rates are somewhat correlated over time, but at many points in time there are significant differences in the delivery rates of the five connections.

図25は、サイズが250,000ビット(約31.25キロバイト)であるフラグメントに対する可能な要求構造を示し、フラグメントの異なる部分に対して並列に行われる4つのHTTPバイト範囲要求があり、すなわち、第1のHTTP接続は最初の50,000ビットを要求し、第2のHTTP接続は次の50,000ビットを要求し、第3のHTTP接続は次の50,000ビットを要求し、第4のHTTP接続は次の50,000ビットを要求する。FECが使用されない場合、すなわち、非FEC方法では、これらが、この例でのフラグメントに対するすべての4つの要求である。FECが使用される場合、すなわち、FEC方法では、この例では、フラグメントから生成される、追加の50,000ビットの修復セグメントのFEC修復データを要求する1つの追加のHTTP接続がある。 Figure 25 shows a possible request structure for a fragment that is 250,000 bits in size (approximately 31.25 kilobytes), with four HTTP byte range requests made in parallel for different parts of the fragment, i.e. the first HTTP The connection requires the first 50,000 bits, the second HTTP connection requires the next 50,000 bits, the third HTTP connection requires the next 50,000 bits, and the fourth HTTP connection requires the next 50,000 bits To do. If FEC is not used, i.e. for non-FEC methods, these are all four requests for the fragment in this example. If FEC is used, that is, in the FEC method, in this example, there is one additional HTTP connection that requests FEC repair data for an additional 50,000-bit repair segment generated from the fragment.

図26は、図24(a)〜図24(e)に示される5つのTCP接続の最初の数秒の拡大であり、図26では、X軸は100ミリ秒の間隔で時間を示し、Y軸は100ミリ秒の間隔にわたって測定された5つのTCP接続の各々でクライアントにおいてビットが受信されるレートを示す。1つの線は、最初の4つのHTTP接続(FECデータがそれを通じて要求されるHTTP接続を除く)からフラグメントに対してクライアントにおいて受信されたビット、すなわち、非FEC方法を使用して到達するものの合計の量を示す。別の線は、5つすべてのHTTP接続(FECデータがそれを通じて要求されるHTTP接続を含む)からフラグメントに対してクライアントにおいて受信されたビット、すなわち、FEC方法を使用して到達するものの合計の量を示す。FEC方法では、フラグメントが要求された250,000ビットのうちの任意の200,000ビットの受信からFEC復号され得ると仮定され、これは、たとえば、リードソロモンFEC符号が使用されれば実現されることが可能であり、たとえば、Luby IVで説明されるRaptorQ符号が使用されれば基本的に実現され得る。FEC方法では、この例では、1秒の後にFEC復号を使用してフラグメントを復元するために十分なデータが受信され、(最初のフラグメントが完全に再生される前に後続のフラグメントに対するデータが要求され受信され得ると仮定すると)1秒というチャネルザッピング時間を可能にする。非FEC方法では、この例では、4つの要求に対するすべてのデータが、フラグメントが復元され得る前に受信される必要があり、フラグメントの復元は1.7秒の後に起こり、1.7秒というチャネルザッピング時間をもたらす。したがって、図26に示される例では、非FEC方法は、FEC方法よりもチャネルザッピング時間に関して70%悪い。この例においてFEC方法により示される利点の理由の1つは、FEC方法では、要求されたデータの任意の80%の受信がフラグメントの復元を可能にするが、非FEC方法では、要求されたデータの100%の受信が必要とされる、ということである。したがって、非FEC方法は、配信を終えるために最も遅いTCP接続を待機しなければならず、TCP配信レートの自然な変動により、平均的なTCP接続と比較して、最も遅いTCP接続の配信速度が大きく変動する傾向がある。FEC方法により、この例では、1つの遅いTCP接続が、フラグメントが復元可能であるときを決定しない。代わりに、FEC方法では、十分なデータの配信は、最悪の場合のTCP配信レートよりも、平均のTCP配信レートにはるかに依存する。 FIG. 26 is an enlargement of the first few seconds of the five TCP connections shown in FIGS. 24 (a) to 24 (e), where in FIG. 26 the X axis shows time at 100 millisecond intervals and the Y axis Indicates the rate at which bits are received at the client on each of the five TCP connections measured over an interval of 100 milliseconds. One line is the bit received at the client for the fragment from the first 4 HTTP connections (excluding the HTTP connection through which FEC data is requested), i.e. the sum of what arrives using the non-FEC method The amount of Another line is the sum of the bits received at the client for the fragment from all five HTTP connections (including the HTTP connection through which FEC data is requested), i.e. the total that arrives using the FEC method. Indicates the amount. In the FEC method, it is assumed that the fragment can be FEC decoded from the reception of any 200,000 bits out of the requested 250,000 bits, which can be realized if, for example, a Reed-Solomon FEC code is used. For example, if a RaptorQ code described in Luby IV is used, it can be basically realized. For the FEC method, in this example, enough data is received after 1 second to recover the fragment using FEC decoding, and data for subsequent fragments is requested before the first fragment is fully played. Channel zapping time of 1 second (assuming that it can be received and received). In the non-FEC method, in this example, all data for four requests needs to be received before the fragment can be recovered, and fragment recovery occurs after 1.7 seconds, resulting in a channel zapping time of 1.7 seconds . Thus, in the example shown in FIG. 26, the non-FEC method is 70% worse in terms of channel zapping time than the FEC method. One of the reasons for the advantages demonstrated by the FEC method in this example is that in the FEC method, any 80% reception of the requested data allows fragment recovery, whereas in the non-FEC method the requested data 100% of the reception is required. Therefore, the non-FEC method has to wait for the slowest TCP connection to finish delivery, and due to natural variations in TCP delivery rate, the delivery speed of the slowest TCP connection compared to the average TCP connection Tend to fluctuate significantly. Due to the FEC method, in this example, one slow TCP connection does not determine when a fragment can be recovered. Instead, with the FEC method, the delivery of sufficient data is much more dependent on the average TCP delivery rate than the worst case TCP delivery rate.

上で説明された非FEC方法およびFEC方法の多くの変形がある。たとえば、協調的なHTTP/TCP要求は、チャネルザップが起きてから最初の数フラグメントだけのために使用されてよく、その後、単一のHTTP/TCP要求のみが、さらなるフラグメント、複数のフラグメント、またはセグメント全体をダウンロードするために使用される。別の例として、使用される協調的なHTTP/TCP接続の数は、要求されているフラグメントの緊急性、すなわち、これらのセグメントの再生時間がどれだけ差し迫っているかということと、現在のネットワーク条件の両方に応じたものであり得る。 There are many variations of the non-FEC method and the FEC method described above. For example, a coordinated HTTP / TCP request may be used for only the first few fragments after a channel zap occurs, after which only a single HTTP / TCP request may be used for further fragments, multiple fragments, or Used to download the entire segment. As another example, the number of cooperative HTTP / TCP connections used depends on the urgency of the requested fragment, i.e. how imminent the playback time of these segments is, and the current network conditions It may be in accordance with both.

いくつかの変形では、複数のHTTP接続が、修復セグメントから修復データを要求するために使用され得る。他の変形では、たとえば、メディアバッファの現在のサイズおよびクライアントにおけるデータ受信レートに応じて、異なる量のデータが、異なるHTTP接続上で要求され得る。別の変形では、ソース表現は互いに独立しておらず、代わりに階層化されたメディアコーディングを表し、このとき、たとえば、改善されたソース表現は基本ソース表現に依存し得る。この場合、基本ソース表現に対応する修復表現があってよく、別の修復表現は基本ソース表現と改善ソース表現の組合せに対応する。 In some variations, multiple HTTP connections may be used to request repair data from the repair segment. In other variations, different amounts of data may be requested over different HTTP connections, for example depending on the current size of the media buffer and the data reception rate at the client. In another variation, the source representations are not independent of each other and instead represent layered media coding, where, for example, the improved source representation may depend on the base source representation. In this case, there may be a repair expression corresponding to the basic source expression, and another repair expression corresponds to a combination of the basic source expression and the improved source expression.

追加の全体的な要素が、上で開示された方法により実現できる利点を増やす。たとえば、使用されるHTTP接続の数は、メディアバッファ中の現在のメディアの量、および/またはメディアバッファへの受信レートに応じて変化し得る。FEC、すなわち上で説明されたFEC方法およびその方法の変形を使用する協調的なHTTP要求は、メディアバッファが比較的空いているときには積極的に使用されてよく、たとえば、より多くの協調的なHTTP要求が最初のフラグメントの異なる部分に対して並列に行われ、ソースフラグメントのすべて、および対応する修復フラグメントからの修復データの比較的大きな部分を要求し、次いで、メディアバッファが増えるに従って、より少数の同時のHTTP要求に移行し、要求当たりより多くの部分のメディアデータを要求し、修復データのより小さな断片を要求し、たとえば、1つ、2つ、または3つの同時のHTTP要求に移行し、要求ごとに全体のフラグメントまたは複数の連続的なフラグメントに対する要求を行うことに移行し、修復データを要求しないことに移行する。 The additional overall elements increase the benefits that can be realized by the methods disclosed above. For example, the number of HTTP connections used may vary depending on the current amount of media in the media buffer and / or the receiving rate into the media buffer. Cooperative HTTP requests that use FEC, ie, the FEC method described above and variations of that method, may be actively used when the media buffer is relatively free, eg, more cooperative HTTP requests are made in parallel to different parts of the first fragment, requesting all of the source fragments, and a relatively large part of repair data from the corresponding repair fragment, and then fewer as the media buffer grows Migrate to simultaneous HTTP requests, request more part of media data per request, request smaller pieces of repair data, for example, migrate to one, two, or three simultaneous HTTP requests Move to making requests for the whole fragment or multiple consecutive fragments for each request, and do not request repair data. And move on.

別の例として、FEC修復データの量は、メディアバッファサイズに応じて変化してよく、すなわち、メディアバッファが小さい場合、より多くのFEC修復データが要求されてよく、メディアバッファが増えるにつれて、要求されるFEC修復データの量は減少してよく、メディアバッファが十分に大きくなった何らかの時点で、修復データは要求されなくてよく、ソースプレゼンテーションのソースセグメントからのデータのみが要求されてよい。そのような改善された技法の利点は、より高速でより安定したチャネルザッピング時間と、起こり得るメディアの詰まりまたはストールに対するより大きな復元力とを可能にしつつ、同時に、要求メッセージのトラフィックとFEC修復データの両方を減らすことによって、ソースセグメント中のメディアを配信することのみによって消費されるであろう量を超えて使用される追加の帯域幅の量を最小限にして、同時に、所与のネットワーク条件に対して可能な最高のメディアレートのサポートを可能にできることである。 As another example, the amount of FEC repair data may vary depending on the media buffer size, i.e., if the media buffer is small, more FEC repair data may be requested, and as the media buffer increases, the request The amount of FEC repair data that is played may be reduced, and at some point when the media buffer becomes large enough, repair data may not be required, only data from the source segment of the source presentation may be required. The advantage of such improved techniques is that it enables faster and more stable channel zapping time and greater resilience to possible media jams or stalls, while at the same time request message traffic and FEC repair data. By reducing both the amount of additional bandwidth used beyond the amount that would only be consumed by delivering media in the source segment, and at the same time given network conditions It is possible to support the highest possible media rate for.

同時のHTTP接続を使用したときの追加の改善
HTTP/TCP要求は、適切な条件が満たされると廃棄されてよく、廃棄された要求において要求されたデータを置き換え得るデータをダウンロードするために、別のHTTP/TCP接続が行われてよく、第2のHTTP/TCP要求は、元の要求の中にあるデータと厳密に同じデータ、たとえばソースデータ、または重複するデータ、たとえば同じソースデータの一部と第1の要求において要求されなかった修復データ、または完全にばらばらのデータ、たとえば第1の要求で要求されなかった修復データを要求することができる。適切な条件の例は、与えられた時間内にブロックサーバインフラストラクチャ(BSI)からの応答がなかったこと、またはBSIへのトランスポート接続の確立に失敗したこと、またはサーバからの明示的な失敗メッセージの受信、または別の失敗の条件により、要求が失敗することである。 Additional improvements when using simultaneous HTTP connections
An HTTP / TCP request may be discarded when appropriate conditions are met, and another HTTP / TCP connection may be made to download data that can replace the requested data in the discarded request, The second HTTP / TCP request is exactly the same data as in the original request, eg source data, or duplicate data, eg part of the same source data and repair data not requested in the first request Or completely disjoint data, for example repair data that was not requested in the first request. Examples of suitable conditions are no response from the Block Server Infrastructure (BSI) within a given time, or failure to establish a transport connection to the BSI, or explicit failure from the server A request fails due to receipt of a message or another failure condition.

適切な条件の別の例は、予想される接続速度、または、含まれるメディアデータの再生時間もしくはその時間に依存する別の時間の前に応答を受信するために必要とされる接続速度の推定値との、接続速度の尺度(問題の要求に応答したデータ到達レート)の比較に従って、データの受信が異常に遅く進んでいるということである。 Another example of a suitable condition is an estimate of the expected connection speed or the connection speed required to receive a response before the playback time of the contained media data or another time depending on that time According to the comparison of the connection speed measure with the value (data arrival rate in response to the request in question), data reception is proceeding abnormally slowly.

この手法は、BSIが失敗を起こす、またはその性能が悪いことがある場合に利点を有する。この場合、上の手法は、BSI内での失敗または悪い性能にもかかわらず、クライアントがメディアデータの信頼性のある再生を継続できる確率を高める。いくつかの場合には、そのような失敗または悪い性能を時々示すような方法でBSIを設計することが有利であることがあり、たとえば、そのような設計は、そのような失敗または悪い性能を示さない、またはこれらを示すことがより少ない代替的な設計よりも、コストが低いことがある。この場合、本明細書で説明される方法は、ユーザ体験の当然の劣化を伴わずに、BSIに対するそのようなより低コストの設計の利用を可能にするという、さらなる利点を有する。 This approach has advantages when the BSI fails or its performance may be poor. In this case, the above approach increases the probability that the client can continue to reliably play the media data despite failure or poor performance within the BSI. In some cases, it may be advantageous to design the BSI in such a way as to indicate such failure or bad performance from time to time, for example, such a design may exhibit such failure or bad performance. Costs may be lower than alternative designs that do not show or show less. In this case, the methods described herein have the further advantage of allowing the use of such lower cost designs for BSI without the natural degradation of the user experience.

別の実施形態では、所与のブロックに対応するデータに対して出される要求の数は、ブロックに関する適切な条件が満たされたかどうかに依存し得る。条件が満たされない場合、クライアントは、ブロックに対するすべての現在は未完了のデータ要求の完了に成功することが、高い確率でブロックの復元を可能にする場合、ブロックに対するさらなる要求を行うことを制約され得る。条件が満たされる場合、ブロックに対するより多数の要求が出されてよく、すなわち、上の制約は当てはまらない。適切な条件の例は、ブロックのスケジューリングされた再生時間までの時間、またはその時間に依存する別の時間が与えられた閾値を下回ることである。ブロックを含むメディアデータの再生時間が近いために、ブロックの受信がより緊急になっているときに、ブロックに対するデータのための追加の要求が出されるので、この方法は有利である。HTTP/TCPのような一般的なトランスポートプロトコルの場合、これらの追加の要求は、問題のブロックの受信に寄与する、データに専用の利用可能な帯域幅の占有率を上げるという効果を有する。これは、ブロックを復元して完了するための十分なデータの受信に必要とされる時間を減らすので、ブロックを含むメディアデータのスケジューリングされた再生時間までにブロックが復元できない確率を下げる。上で説明されたように、ブロックを含むメディアデータのスケジューリングされた再生時間までにブロックが復元できない場合、再生は一時停止して悪いユーザ体験をもたらし得るので、ここで説明された方法は、有利なことに、この悪いユーザ体験の確率を下げる。 In another embodiment, the number of requests made for data corresponding to a given block may depend on whether the appropriate conditions for the block are met. If the condition is not met, the client is constrained to make further requests for the block if successful completion of all currently outstanding data requests for the block allows the block to be restored with high probability. obtain. If the condition is met, more requests for the block may be issued, i.e., the above constraints do not apply. An example of a suitable condition is that the time until the scheduled playback time of the block, or another time depending on that time, falls below a given threshold. This method is advantageous because when the reception of the block is more urgent due to the near playback time of the media data containing the block, an additional request for data for the block is issued. In the case of a common transport protocol such as HTTP / TCP, these additional requests have the effect of increasing the occupancy of available bandwidth dedicated to the data that contributes to receiving the block in question. This reduces the time required to receive enough data to restore and complete the block, thus reducing the probability that the block cannot be restored by the scheduled playback time of the media data containing the block. As described above, the method described herein is advantageous because if the block cannot be restored by the scheduled playback time of the media data containing the block, playback may pause and result in a bad user experience. In particular, it reduces the probability of this bad user experience.

本明細書の全体で、ブロックのスケジューリングされた再生時間への言及は、一時停止を伴わずにプレゼンテーションの再生を達成するために、ブロックを含む符号化メディアデータがクライアントにおいて初めて利用可能になり得る時間を指すことを理解されたい。メディアプレゼンテーションシステムの当業者には明らかなように、この時間は、実際には、再生のために使用される物理的な変換器(スクリーン、スピーカーなど)におけるブロックを含むメディアの出現の実際の時間よりもわずかに前であり、それは、ブロックの実際の再生を行うために、いくつかの変換機能がブロックを含むメディアデータに適用される必要があることがあり、これらの機能は、完了するのにある時間を必要とし得るからである。たとえば、メディアデータは一般に、圧縮された形式でトランスポートされ、解凍変換が適用され得る。 Throughout this specification, references to the scheduled playback time of a block may be the first time that encoded media data containing the block is available at the client to achieve playback of the presentation without pausing. Please understand that it refers to time. As will be apparent to those skilled in the art of media presentation systems, this time is actually the actual time of appearance of the media, including the blocks in the physical transducers (screens, speakers, etc.) used for playback. It is slightly earlier than that, in order to make the actual playback of the block, some conversion functions may need to be applied to the media data containing the block, and these functions are completed This is because some time may be required. For example, media data is generally transported in a compressed form and decompression conversion can be applied.

協調的なHTTP/FEC方法をサポートするファイル構造を生成するための方法
協調的なHTTP/FEC方法を利用するクライアントによって有利に使用され得るファイル構造を生成するためのある実施形態が、ここで説明される。この実施形態では、各ソースセグメントに対して、次のように生成される対応する修復セグメントがある。パラメータRは、平均的にどれだけのFEC修復データがソースセグメント中のソースデータに対して生成されるかを示す。たとえば、R=0.33は、ソースセグメントが1,000キロバイトのデータを含む場合、対応する修復セグメントが約330キロバイトの修復データを含むことを示す。パラメータSは、FECの符号化および復号のために使用されるシンボルサイズをバイト単位で示す。たとえば、S=64は、ソースデータおよび修復データが、FECの符号化および復号の目的で各々64バイトのサイズのシンボルを含むことを示す。 A method for generating a file structure that supports a collaborative HTTP / FEC method An embodiment for generating a file structure that can be advantageously used by clients utilizing a collaborative HTTP / FEC method is described herein. Is done. In this embodiment, for each source segment, there is a corresponding repair segment generated as follows. The parameter R indicates on average how much FEC repair data is generated for the source data in the source segment. For example, R = 0.33 indicates that if a source segment contains 1,000 kilobytes of data, the corresponding repair segment contains about 330 kilobytes of repair data. The parameter S indicates the symbol size used for FEC encoding and decoding in bytes. For example, S = 64 indicates that the source data and the repair data include symbols each having a size of 64 bytes for FEC encoding and decoding purposes.

修復セグメントは、次のようにソースセグメントに対して生成され得る。ソースセグメントの各フラグメントは、FEC符号化の目的ではソースブロックとして見なされるので、各フラグメントは、修復シンボルの生成元の、ソースブロックのソースシンボルの列として扱われる。最初のi個のフラグメントに対して生成される修復シンボルの総数は、TNRS(i)=ceiling(R*B(i)/S)として計算され、ここで、ceiling(x)は少なくともxである値を有する最小の整数を出力する関数である。したがって、フラグメントiに対して生成される修復シンボルの数は、NRS(i)=TNRS(i)-TNRS(i-1)である。 A repair segment may be generated for the source segment as follows. Since each fragment of the source segment is considered as a source block for the purposes of FEC encoding, each fragment is treated as a source symbol sequence of the source block from which the repair symbol is generated. The total number of repair symbols generated for the first i fragments is calculated as TNRS (i) = ceiling (R * B (i) / S), where ceiling (x) is at least x A function that outputs the smallest integer having a value. Therefore, the number of repair symbols generated for fragment i is NRS (i) = TNRS (i) −TNRS (i−1).

修復セグメントは、フラグメントに対する修復シンボルの連結物を含み、修復セグメント内での修復シンボルの順序は、修復シンボルの生成元のフラグメントの順序であり、フラグメント内では、修復シンボルは符号化シンボル識別子(ESI)の順序である。修復セグメント生成器2700を含む、ソースセグメント構造に対応する修復セグメント構造が図27に示される。 A repair segment includes a concatenation of repair symbols for fragments, and the order of repair symbols within the repair segment is the order of the fragments from which the repair symbols were generated, and within the fragments, the repair symbols are encoded symbol identifiers (ESI). ) Order. A repair segment structure corresponding to the source segment structure, including the repair segment generator 2700, is shown in FIG.

上で説明されたようにフラグメントに対する修復シンボルの数を定義することによって、すべての前のフラグメントに対する修復シンボルの総数、およびしたがって、修復セグメントの中のバイトインデックスは、R、S、B(i-1)、およびB(i)のみに依存し、ソースセグメント内のフラグメントの以前のまたは後続の構造のいずれにも依存しないことに留意されたい。これは、クライアントが修復セグメント内の修復ブロックの始めの位置を迅速に計算すること、さらに、修復ブロック内の修復シンボルの数を迅速に計算することを、修復ブロックの生成元のソースセグメントの対応するフラグメントの構造についてのローカル情報のみを使用して可能にするので、有利である。したがって、クライアントがソースセグメントの中央部からのフラグメントのダウンロードおよび再生を開始すると決断すると、クライアントはまた、対応する修復セグメント内からの対応する修復ブロックを迅速に生成しそれにアクセスすることができる。 By defining the number of repair symbols for a fragment as described above, the total number of repair symbols for all previous fragments, and thus the byte index in the repair segment, is R, S, B (i- Note that it depends only on 1), and B (i), not on any previous or subsequent structure of the fragment in the source segment. This means that the client can quickly calculate the starting position of the repair block in the repair segment, and can quickly calculate the number of repair symbols in the repair block. This is advantageous because it allows only local information about the structure of the fragment to be used. Thus, if the client decides to start downloading and playing a fragment from the middle of the source segment, the client can also quickly generate and access the corresponding repair block from within the corresponding repair segment.

フラグメントiに対応するソースブロック中のソースシンボルの数は、NSS(i)=ceiling((B(i)-B(i-1))/S)として計算される。B(i)-B(i-1)がSの倍数ではない場合、最後のソースシンボルは、FECの符号化および復号のために0のバイトをパディングされ、すなわち、最後のソースシンボルがFECの符号化および復号のためにSバイトのサイズとなるように、最後のソースシンボルはパディングされるが、これらの0のパディングバイトは、ソースセグメントの一部として記憶されない。この実施形態では、ソースシンボルのESIは0、1、…、NSS(i)-1であり、修復シンボルのESIはNSS(i)、…、NSS(i)+NRS(i)-1である。 The number of source symbols in the source block corresponding to fragment i is calculated as NSS (i) = ceiling ((B (i) −B (i−1)) / S). If B (i) -B (i-1) is not a multiple of S, the last source symbol is padded with 0 bytes for FEC encoding and decoding, i.e., the last source symbol is FEC Although the last source symbol is padded to be S bytes in size for encoding and decoding, these zero padding bytes are not stored as part of the source segment. In this embodiment, the ESI of the source symbol is 0, 1,..., NSS (i) -1, and the ESI of the repair symbol is NSS (i),..., NSS (i) + NRS (i) -1. .

この実施形態の修復セグメントのURLは、たとえば拡張子「.repair」をソースセグメントのURLに単に追加することによって、対応するソースセグメントのURLから生成され得る。 The URL of the repair segment in this embodiment may be generated from the URL of the corresponding source segment, for example, by simply adding the extension “.repair” to the source segment URL.

本明細書で説明されるように、修復インデクシング情報および修復セグメントのFEC情報は、対応するソースセグメントに対するインデクシング情報によって、ならびにRおよびSの値から、暗黙的に定義される。時間オフセット、および修復セグメントを含むフラグメント構造は、対応するソースセグメントの時間オフセットおよび構造によって決定される。フラグメントiに対応する修復セグメント中の修復シンボルの終わりに対するバイトオフセットは、RB(i)=S*ceiling(R*B(i)/S)として計算され得る。そうすると、フラグメントiに対応する修復セグメント中のバイトの数はRB(i)-RB(i-1)なので、フラグメントiに対応する修復シンボルの数はNRS(i)=(RB(i)-RB(i-1))/Sとして計算される。フラグメントiに対応するソースシンボルの数は、NSS(i)=ceiling((B(i)-B(i-1))/S)として計算され得る。したがって、この実施形態では、修復セグメント内の修復ブロックに対する修復インデクシング情報、および対応するFEC情報は、R、S、および対応するソースセグメントの対応するフラグメントに対するインデクシング情報から暗黙的に導出され得る。 As described herein, repair indexing information and repair segment FEC information are implicitly defined by the indexing information for the corresponding source segment and from the values of R and S. The fragment structure including the time offset and repair segment is determined by the time offset and structure of the corresponding source segment. The byte offset for the end of the repair symbol in the repair segment corresponding to fragment i may be calculated as RB (i) = S * ceiling (R * B (i) / S). Then, since the number of bytes in the repair segment corresponding to fragment i is RB (i) -RB (i-1), the number of repair symbols corresponding to fragment i is NRS (i) = (RB (i) -RB Calculated as (i-1)) / S. The number of source symbols corresponding to fragment i may be calculated as NSS (i) = ceiling ((B (i) −B (i−1)) / S). Thus, in this embodiment, repair indexing information for repair blocks in the repair segment, and corresponding FEC information, can be implicitly derived from indexing information for R, S, and corresponding fragments of the corresponding source segment.

ある例として、バイトオフセットB(1)=6,410で開始しバイトオフセットB(2)=6,770で終了するフラグメント2を示す、図28に示される例を考える。この例では、シンボルサイズはS=64バイトであり、垂直方向の点線は、Sの倍数に対応するソースセグメント内のバイトオフセットを示す。ソースセグメントサイズの断片としての全体の修復セグメントのサイズは、この例ではR=0.5に設定される。フラグメント2に対するソースブロック中のソースシンボルの数は、NSS(2)=ceiling((6,770-6,410)/64)=ceil(5.625)=6として計算され、これらの6つのソースシンボルはそれぞれESIs 0、…、5を有し、最初のソースシンボルは、ソースセグメント内のバイトインデックス6,410で開始するフラグメント2の最初の64バイトであり、2番目のソースシンボルは、ソースセグメント内のバイトインデックス6,474で開始するフラグメント2の次の64バイトであり、以下同様である。フラグメント2に対応する修復ブロックの終了バイトオフセットは、RB(2)=64*ceiling(0.5*6,770/64)=64*ceiling(52.89…)=64*53=3,392として計算され、フラグメント2に対応する修復ブロックの開始バイトオフセットは、RB(1)=64*ceiling(0.5*6,410/64)=64*ceiling(50.07…)=64*51=3,264として計算されるので、この例では、ESI 6および7を伴うフラグメント2に対応する修復ブロック中に、それぞれ、修復セグメント内のバイトオフセット3,264で開始しバイトオフセット3,392で終了する、2つの修復シンボルがある。 As an example, consider the example shown in FIG. 28 showing fragment 2 starting at byte offset B (1) = 6,410 and ending at byte offset B (2) = 6,770. In this example, the symbol size is S = 64 bytes, and the vertical dotted line indicates the byte offset in the source segment corresponding to a multiple of S. The size of the entire repair segment as a fragment of the source segment size is set to R = 0.5 in this example. The number of source symbols in the source block for fragment 2 is calculated as NSS (2) = ceiling ((6,770-6,410) / 64) = ceil (5.625) = 6, these 6 source symbols are ESIs 0, ..., 5 and the first source symbol is the first 64 bytes of fragment 2 starting at byte index 6,410 in the source segment, and the second source symbol starts at byte index 6,474 in the source segment The next 64 bytes of fragment 2, and so on. The end byte offset of the repair block corresponding to fragment 2 is calculated as RB (2) = 64 * ceiling (0.5 * 6,770 / 64) = 64 * ceiling (52.89…) = 64 * 53 = 3,392, corresponding to fragment 2 The starting byte offset of the repair block to be calculated is calculated as RB (1) = 64 * ceiling (0.5 * 6,410 / 64) = 64 * ceiling (50.07…) = 64 * 51 = 3,264. In the repair block corresponding to fragment 2 with 7 and 7, there are two repair symbols, starting at byte offset 3,264 and ending at byte offset 3,392, respectively, in the repair segment.

図28に示される例では、R=0.5であっても、フラグメント2に対応する6つのソースシンボルがあり、修復シンボルの数は、ソースシンボルの数を単に使用して修復シンボルの数を計算した場合に予測され得る3ではなく、代わりに、本明細書で説明される方法によれば2として計算されることに留意されたい。フラグメントのソースシンボルの数を単に使用して修復シンボルの数を決定することとは対照的に、上で説明された実施形態は、対応するソースセグメントの対応するソースブロックと関連付けられるインデックス情報のみから、修復セグメント内の修復ブロックの配置を計算することを可能にする。さらに、ソースブロック中のソースシンボルの数Kが増えるにつれて、対応する修復ブロックの修復シンボルKRの数は、ほぼK*Rによって近似される。それは、一般に、KRは最大でもceil(K*R)であり、KRは少なくともfloor((K-1)*R)であるからであり、ここでfloor(x)は最大でもxである最大の整数である。 In the example shown in FIG. 28, even if R = 0.5, there are 6 source symbols corresponding to fragment 2, and the number of repair symbols is calculated simply by using the number of source symbols. Note that it is calculated as 2 instead of 3 which can be predicted in some cases, instead according to the method described herein. In contrast to simply using the number of source symbols in a fragment to determine the number of repair symbols, the embodiment described above is based only on index information associated with the corresponding source block of the corresponding source segment. Allows the calculation of the placement of repair blocks within a repair segment. Furthermore, as the number K of source symbols in the source block increases, the number of repair symbols KR in the corresponding repair block is approximated by K * R. That is because, in general, KR is at most ceil (K * R) and KR is at least floor ((K-1) * R), where floor (x) is at most the largest of x It is an integer.

当業者が認識するように、協調的なHTTP/FEC方法を利用するクライアントによって有利に使用され得るファイル構造を生成するための上の実施形態の多くの変形がある。代替的な実施形態の例として、表現に対する元のセグメントがN>1個の並列セグメントへと区分されてよく、ここで、i=1,…,Nに対して、元のセグメントの規定された断片F_iはi番目の並列セグメントに含まれ、i=1,…,Nに対して、F_iの合計は1に等しい。この実施形態では、上で説明された実施形態で修復セグメントマップがソースセグメントマップから導出される方法と同様に、並列セグメントのすべてに対するセグメントマップを導出するために使用される1つのマスターセグメントマップがあり得る。たとえば、マスターセグメントマップは、ソースメディアデータのすべてが並列セグメントに区分されず、代わりに1つの元のセグメントに含まれる場合、フラグメント構造を示すことができ、次いで、i番目の並列セグメントに対するセグメントマップは、元のセグメントのフラグメントの最初のプレフィックス中のメディアデータの量がLバイトである場合に、最初のi個の並列セグメント全体で、このプレフィックスのバイトの総数がceil(L*G_i)であることを計算することによって、マスターセグメントマップから導出されることが可能であり、ここで、G_iはF_jのj=1,…,iにわたる合計である。代替的な実施形態の別の例として、セグメントは、各フラグメントに対する修復データが直後に続く、各フラグメントに対する元のソースメディアデータの組合せで構成されてよく、ソースメディアデータ、およびそのソースメディアデータからFEC符号を使用して生成された修復データの混合物を含むセグメントをもたらす。代替的な実施形態の別の例として、ソースメディアデータおよび修復データの混合物を含むセグメントは、ソースメディアデータおよび修復データの混合物を含む複数の並列セグメントへと区分され得る。 As those skilled in the art will appreciate, there are many variations of the above embodiment for generating a file structure that can be advantageously used by clients utilizing cooperative HTTP / FEC methods. As an example of an alternative embodiment, the original segment for the representation may be partitioned into N> 1 parallel segments, where i = 1,... Fragment F _i is included in the i th parallel segment, and for i = 1,..., N, the sum of F _i is equal to 1. In this embodiment, there is one master segment map used to derive the segment map for all of the parallel segments, similar to the way the repair segment map is derived from the source segment map in the embodiment described above. possible. For example, the master segment map can show the fragment structure if all of the source media data is not partitioned into parallel segments but instead contained in one original segment, then the segment map for the i th parallel segment If the amount of media data in the first prefix of the original segment fragment is L bytes, the total number of bytes in this prefix is ceil (L * G _i ) across the first i parallel segments. By calculating something, it can be derived from the master segment map, where G _i is the sum of F _j over j = 1,. As another example of an alternative embodiment, a segment may consist of a combination of the original source media data for each fragment, immediately followed by repair data for each fragment, from the source media data and its source media data Resulting in a segment containing a mixture of repair data generated using FEC codes. As another example of an alternative embodiment, a segment that includes a mixture of source media data and repair data may be partitioned into multiple parallel segments that include a mixture of source media data and repair data.

低レイテンシストリーミングを処理するための方法
いくつかの展開シナリオでは、ライブサービスのための低レイテンシストリーミングが望ましいことがある。たとえば、スポーツイベントまたはコンサートのような、イベントの局所的なその場での配信の場合、生の動作とクライアント上でのライブサービスのプレゼンテーションとの間の遅延は、可能な限り短いことが望ましい。たとえば、最大で1秒の遅延が望ましいことがある。 Methods for Handling Low Latency Streaming In some deployment scenarios, low latency streaming for live services may be desirable. For local in-situ distribution of events, such as sporting events or concerts, for example, it is desirable that the delay between the live action and the presentation of the live service on the client is as short as possible. For example, a delay of up to 1 second may be desirable.

上で説明されたように、メディアプレゼンテーションのセグメントを記憶する各ファイルがランダムアクセスポイント(RAP)で開始するように並べることが有利であり得る。いくつかのプロファイル、具体的には、ISOベースのメディアファイルフォーマットのライブプロファイルは、各メディアセグメントがRAPで開始することを必要とする。 As described above, it may be advantageous to arrange each file storing a segment of a media presentation to start at a random access point (RAP). Some profiles, specifically live profiles in ISO-based media file formats, require each media segment to start with a RAP.

しかしながら、端末間レイテンシの少ない配信が必要とされる環境では、各セグメントの継続時間は、生の動作とクライアント上でのライブイベントのプレゼンテーションとの間の遅延を最小限にするために、短くなければならない。低レイテンシストリーミングのために使用されるべき各セグメントに、RAPを挿入するのは避けることが望ましい。たとえば、ビデオ中のRAPは、IDRフレームによって通常は実現される。符号化の効率は、低レイテンシストリーミングのために望ましい短いセグメント内でIDRフレームの使用を避けることによって、改善され得る。 However, in environments where low end-to-end latency delivery is required, the duration of each segment should be short to minimize the delay between live action and live event presentation on the client. I must. It is desirable to avoid inserting a RAP in each segment that should be used for low latency streaming. For example, RAP in video is usually realized by IDR frames. Encoding efficiency can be improved by avoiding the use of IDR frames within the short segments desirable for low latency streaming.

ある実施形態によれば、メディアプレゼンテーションの、ライブプロファイルに準拠する表現および低レイテンシ表現が生成される。ライブプロファイルに準拠する表現は、比較的長いメディアセグメント継続時間を有する。ライブプロファイルに準拠する表現の各メディアセグメントは、メディアセグメントの始めにRAPを有する。低レイテンシ表現は、RAPを含まないことがある、比較的短いセグメント(「メディアフラグメント」と呼ばれ得る)を有する。低レイテンシストリーミングをサポートするクライアントは、メディアプレゼンテーションの低レイテンシ表現に対して生成されるメディアフラグメントを受信することができるが、低レイテンシストリーミングをサポートしないクライアントは、メディアプレゼンテーションのライブプロファイルに準拠する表現のために生成されたメディアセグメントを受信することが可能であり得る。 According to an embodiment, a live profile compliant representation and a low latency representation of the media presentation are generated. Expressions that conform to the live profile have a relatively long media segment duration. Each media segment of the representation that conforms to the live profile has a RAP at the beginning of the media segment. The low latency representation has relatively short segments (which may be referred to as “media fragments”) that may not include RAP. Clients that support low-latency streaming can receive media fragments that are generated for the low-latency representation of the media presentation, but clients that do not support low-latency streaming can use a representation that conforms to the media presentation's live profile. It may be possible to receive media segments generated for the purpose.

図30は、低レイテンシストリーミングのためのメディアフラグメントとメディアフラグメントとの関係を示す。ライブプロファイルストリーミングのために生成されたメディアセグメント3002は、メディアデータ(「mdat」)の始めにRAP 3004を含む。対照的に、低レイテンシストリーミングのために生成されたメディアフラグメント3004、3006、および3008のうち、メディアフラグメント3004のみがRAPを含む。 FIG. 30 shows the relationship between media fragments and media fragments for low latency streaming. Media segment 3002 generated for live profile streaming includes RAP 3004 at the beginning of the media data ("mdat"). In contrast, of media fragments 3004, 3006, and 3008 generated for low latency streaming, only media fragment 3004 contains a RAP.

メディアフラグメントは、その場で生成され、HTTPを介したクライアントによるダウンロードが可能である。メディアフラグメントは、必要とされるメディアフラグメントに対する修正を何ら伴わずに、ISOベースのメディアファイルフォーマットのライブプロファイルに準拠するメディアセグメントに蓄積され得る。たとえば、メディアフラグメントは、メディアセグメントへと連結され得る。 Media fragments are generated on the fly and can be downloaded by the client via HTTP. Media fragments can be stored in media segments that conform to a live profile of an ISO-based media file format without any modification to the required media fragments. For example, media fragments can be concatenated into media segments.

メディアセグメントおよびメディアフラグメントは両方、同じ符号化プロセスを使用して作成され得る。このようにして、メディアは、端末間の低レイテンシを要求する環境において動作するクライアント、および各セグメント中にRAPを必要とするプロトコルを使用するクライアントによる消費のために、効率的に符号化され得る。 Both media segments and media fragments can be created using the same encoding process. In this way, media can be efficiently encoded for consumption by clients operating in environments that require low latency between terminals and clients using protocols that require RAP in each segment. .

いくつかの実施形態では、セグメントインデックス(SIDX)が、各メディアフラグメントに対して生成される。SIDXは、メディアセグメント内のプレゼンテーション時間範囲と、メディアフラグメントにより専有されるメディアセグメントの対応するバイト範囲とを含み得る。いくつかの実施形態では、SIDXは、RAPがフラグメント内に存在するかどうかを示す。図30では、メディアフラグメント3004のSIDXボックスのcontains_RAPフィールドは1に設定され、メディアフラグメント3004がRAPを含んでいることを示す。メディアフラグメント3006および3008のSIDXボックスのcontains_RAPフィールドは0に設定され、メディアフラグメント3006および3008がRAPを含んでいないことを示す。SIDXはさらに、フラグメント内の最初のRAPのプレゼンテーション時間を示し得る。 In some embodiments, a segment index (SIDX) is generated for each media fragment. SIDX may include the presentation time range within the media segment and the corresponding byte range of the media segment that is dedicated by the media fragment. In some embodiments, SIDX indicates whether RAP is present in the fragment. In FIG. 30, the contains_RAP field of the SIDX box of the media fragment 3004 is set to 1, indicating that the media fragment 3004 includes a RAP. The contains_RAP field of the SIDX box for media fragments 3006 and 3008 is set to 0, indicating that media fragments 3006 and 3008 do not contain a RAP. SIDX may further indicate the presentation time of the first RAP in the fragment.

ある実施形態によれば、メディアサーバは、低レイテンシストリーミングのためのフラグメントを生成し、フラグメントをキャッシュにプッシュすることができる。キャッシュは、フラグメントを連結して、ライブプロファイル適合メディアセグメントを生成することができる。メディアセグメントが生成された後、キャッシュは、メディアセグメントを生成するために連結されたメディアセグメントをパージすることができる。 According to an embodiment, the media server can generate fragments for low latency streaming and push the fragments to the cache. The cache can concatenate the fragments to generate live profile compatible media segments. After the media segment is created, the cache can purge the concatenated media segments to create the media segment.

単一のmedia presentation description(MPD)は、メディアプレゼンテーションのライブプロファイル適合メディアセグメントを有する第1の表現、および、低レイテンシストリームのメディアフラグメントを有する第2の表現についての情報を記憶し得る。タイムシフトバッファリングのためのメディアセグメントと、ストリーミングのライブに近い端部における視聴を処理するためのメディアフラグメントとを使用して、タイムシフト視聴が実現され得る。クライアントは、たとえば、タイムシフトバッファで開始し、メディアプレゼンテーションのセクションを飛ばすことによって、ライブの端部に近いところに移動して、これらの表現を切り替えることができる。MPDの各表現は、単一のメディアプレゼンテーションに利用可能な表現のアレイを表すための属性を割り当てられ得る。 A single media presentation description (MPD) may store information about a first representation having a live profile conforming media segment of the media presentation and a second representation having a media fragment of a low latency stream. Time-shifted viewing can be achieved using media segments for time-shifting buffering and media fragments for processing viewing at the end of the streaming near live. The client can switch to these representations by moving closer to the live end, for example by starting with a time shift buffer and skipping a section of the media presentation. Each representation of the MPD may be assigned an attribute to represent an array of representations available for a single media presentation.

メディアセグメントを有する第1の表現およびメディアフラグメントを有する第2の表現についての情報を記憶するMPDでは、第2の表現のどのメディアフラグメントがRAPで開始するかを示す情報を提供することが有利であり得る。たとえば、MPDは、複数のメディアフラグメント内でのRAPの発生の頻度を示すための属性を含み得る。一実施形態では、MPDは、フラグメントの数に関して頻度を示す属性を含む(すなわち、x個のメディアフラグメントごとに1個のメディアフラグメントがRAPを含む)。別の実施形態では、属性は、隣接するRAPの間の時間的な距離に関して頻度を示す。 In an MPD that stores information about a first representation with media segments and a second representation with media fragments, it is advantageous to provide information indicating which media fragments of the second representation start with a RAP. possible. For example, the MPD may include an attribute to indicate the frequency of occurrence of RAP within multiple media fragments. In one embodiment, the MPD includes an attribute that indicates the frequency with respect to the number of fragments (ie, one media fragment contains a RAP for every x media fragments). In another embodiment, the attribute indicates the frequency with respect to the temporal distance between adjacent RAPs.

あるいは、メディアフラグメントについての情報は第1のMPDに記憶されてよく、メディアセグメントについての情報は第2のMPDに記憶されてよい。 Alternatively, information about the media fragment may be stored in the first MPD and information about the media segment may be stored in the second MPD.

いくつかの実施形態では、MPDは、プレゼンテーションのメディアセグメントまたはメディアフラグメントの最大継続時間のような、特定の表現に適用可能な特定のパラメータをシグナリングし得る。 In some embodiments, the MPD may signal certain parameters applicable to a particular representation, such as the maximum duration of a presentation media segment or media fragment.

本開示を読んだ後で、当業者はさらなる実施形態を想起し得る。他の実施形態では、有利には、上で開示された発明の組合せまたは副次的な組合せが作られ得る。コンポーネントの例示的な構成は例示のために示されており、組合せ、追加、再構成などが、本発明の代替的な実施形態で考慮されることを理解されたい。したがって、本発明は、例示的な実施形態に関して説明されてきたが、当業者は、多くの修正が可能であることを認識するだろう。 After reading this disclosure, one of ordinary skill in the art will recognize additional embodiments. In other embodiments, advantageously, combinations or sub-combinations of the inventions disclosed above may be made. It should be understood that exemplary configurations of components are shown for illustrative purposes, and that combinations, additions, reconfigurations, etc. are contemplated in alternative embodiments of the invention. Thus, although the invention has been described with respect to exemplary embodiments, those skilled in the art will recognize that many modifications are possible.

たとえば、本明細書で説明されたプロセスは、ハードウェアコンポーネント、ソフトウェアコンポーネント、および/またはこれらの任意の組合せを使用して実施され得る。いくつかの場合には、ソフトウェアコンポーネントは、メディア内に設けられた、またはメディアとは別々のハードウェア上での実行のために、有形な非一時的媒体上で提供され得る。したがって、本明細書および図面は、限定ではなく例示であると見なされるべきである。しかし、特許請求の範囲において述べられる本発明のより広範な趣旨および範囲から逸脱することなく、様々な修正および変更が行われてよく、本発明は、以下の特許請求の範囲内にあるすべての修正および等価物を包含することが意図されることが、明らかであろう。 For example, the processes described herein may be implemented using hardware components, software components, and / or any combination thereof. In some cases, software components may be provided on a tangible non-transitory medium for execution on hardware that is provided in or separate from the media. The specification and drawings are accordingly to be regarded as illustrative rather than restrictive. However, various modifications and changes may be made without departing from the broader spirit and scope of the invention as set forth in the claims below, and the invention is intended to be within the scope of the following claims. It will be apparent that modifications and equivalents are intended to be included.

100 ブロックストリーミングシステム
101 ブロックサービングインフラストラクチャ
102 コンテンツ
103 コンテンツ準備(メディア取込システム)
104 HTTPストリーミングサーバ
106 HTTPキャッシュ
108 HTTPストリーミングクライアント
110 コンテンツ記憶装置
112 要求
114 要求
122 ネットワーク
123 ブロック選択器
124 ブロック要求器
125 ブロックバッファ
126 バッファモニタ
127 メディアデコーダ
128 メディアトランスデューサ
300 バス
302 取込システム
304 メモリ
306 ディスク記憶装置
308 ビデオディスプレイ
310 英数字入力デバイス
312 ネットワークインターフェース
400 バス
402 クライアントプロセッサ
404 メモリ
406 ディスク記憶装置
408 ビデオディスプレイ
410 英数字入力デバイス
412 ネットワークインターフェース
500 MPD
501 期間記録
502 表現記録
503 セグメント情報
504 初期化セグメント
505 メディアセグメント
510 ソースセグメント
512 修復セグメント
700 単純なインデックス
712 階層的なインデックス
900 メタデータテーブル
902 HTTPストリーミングクライアント
904 ブロック
906 HTTPストリーミングサーバ
1000 ビデオストリーム
1200 送信機
1202 メタデータ
1204 スケーラブルレイヤ1
1206 スケーラブルレイヤ2
1208 スケーラブルレイヤ3(完全には受信されない)
1210 受信機
1212 メディアプレゼンテーション
3002 メディアセグメント
3004 メディアフラグメント
3006 メディアフラグメント
3008 メディアフラグメント 100 block streaming system
101 block serving infrastructure
102 content
103 Content preparation (media capture system)
104 HTTP streaming server
106 HTTP cache
108 HTTP streaming client
110 Content storage device
112 request
114 request
122 network
123 block selector
124 block requester
125 block buffer
126 Buffer monitor
127 Media decoder
128 media transducer
300 buses
302 uptake system
304 memory
306 Disk storage device
308 video display
310 Alphanumeric input device
312 Network interface
400 bus
402 Client processor
404 memory
406 Disk storage device
408 video display
410 Alphanumeric input device
412 Network interface
500 MPD
501 period recording
502 expression record
503 segment information
504 Initialization segment
505 Media segment
510 Source segment
512 repair segment
700 simple index
712 Hierarchical index
900 metadata table
902 HTTP streaming client
904 blocks
906 HTTP streaming server
1000 video streams
1200 transmitter
1202 metadata
1204 Scalable Layer 1
1206 Scalable Layer 2
1208 Scalable Layer 3 (not fully received)
1210 receiver
1212 Media presentation
3002 Media segment
3004 Media fragment
3006 Media fragment
3008 Media fragment

Claims

A method for structuring content data to be provided using a media server, comprising:
Obtaining the content to be provided;
Generating a plurality of media segments that are encoded according to an encoding protocol that represents the content and includes one or more frames of a media presentation encoded into each media segment, the random access point being Generating steps available in each media segment;
Generating a plurality of media fragments encoded according to the encoding protocol, wherein the media segment includes the plurality of media fragments, at least some of the plurality of media fragments include random access points, and at least Some do not include a random access point, and the random access point includes a position in the segment where a decoder can decode a fragment that follows the random access point independently of the previous fragment of the random access point; and
Generating a segment index for the media fragment;
Including
Whether the segment index is a presentation time range for each media fragment in the media segment, a corresponding byte range in the media segment that each media fragment occupies, and whether a random access point is present in each media fragment; show, a random access point exists indicator seen including,
The media segment is a live profile conforming media segment generated by concatenating the plurality of media fragments for low latency streaming;
The method further stores information regarding a first representation of the media representation having the plurality of live profile conforming media segments and a second representation of the media representation having the plurality of media fragments of a low latency stream. Including generating a single media presentation description (MPD) file,
Method.

Generating the media segment in a cache, wherein after the media segment is generated in the cache, the plurality of media fragments used to generate the media segment are from the cache; The method of claim 1 , wherein the method is purged.

The MPD file includes an attribute to indicate the frequency of occurrence of the random access point in the second representation method according to claim 1.

4. The method of claim 3 , wherein the frequency is a period.

4. The method of claim 3 , wherein the frequency is the number of media fragments.

The method further comprises: determining a position of the plurality of media fragments relative to the random access point, wherein the random access point is disposed at a variable non-fixed position of the plurality of media fragments. The method described in 1.

A media server,
Get the content to be provided,
Generating a plurality of media segments representing the content and encoded according to an encoding protocol that includes one or more frames of the media presentation encoded into each media segment;
Generating a plurality of media fragments encoded according to the encoding protocol;
Generating a segment index for the media fragment;
With a processor configured to
A random access point is available in each media segment,
The media segment includes the plurality of media fragments, at least some of the plurality of media fragments include random access points, at least some do not include random access points, and a random access point precedes the random access point. A position in the segment where a decoder can decode a fragment following the random access point independently of the fragment of
Whether the segment index is a presentation time range for each media fragment in the media segment, a corresponding byte range in the media segment that each media fragment occupies, and whether a random access point is present in each media fragment; show, a random access point exists indicator seen including,
The media segment is a live profile conforming media segment generated by concatenating the plurality of media fragments for low latency streaming;
The processor further stores information regarding a first representation of the media representation having the plurality of live profile conforming media segments and a second representation of the media representation having the plurality of media fragments of a low latency stream. Configured to generate a single media presentation description (MPD) file,
Media server.

The processor is further configured to generate the media segment in a cache, and the plurality of media used to generate the media segment after the media segment is generated in the cache The media server of claim 7 , wherein fragments are purged from the cache.

8. The media server according to claim 7 , wherein the MPD file includes an attribute for indicating a frequency of occurrence of random access points in the second representation.

10. The media server according to claim 9 , wherein the frequency is a period.

11. A media server according to claim 10 , wherein the frequency is the number of media fragments.

The processor is further configured to determine a position of the plurality of media fragments with respect to the random access point, the random access point further comprising a step of being located at a variable non-fixed position of the plurality of media fragments. The media server according to claim 7 .

When executed, one or more computing devices
Get the content to be provided,
Generating a plurality of media segments representing the content and encoded according to an encoding protocol that includes one or more frames of the media presentation encoded into each media segment;
Generating a plurality of media fragments encoded according to the encoding protocol;
Generating a segment index for the media fragment;
One or more non-transitory computer-readable storage media storing computer-executable instructions comprising:
A random access point is available in each media segment,
The media segment includes the plurality of media fragments, at least some of the plurality of media fragments include random access points, at least some do not include random access points, and a random access point precedes the random access point. A position in the segment where a decoder can decode a fragment following the random access point independently of the fragment of
Whether the segment index is a presentation time range for each media fragment in the media segment, a corresponding byte range in the media segment that each media fragment occupies, and whether a random access point is present in each media fragment; show, a random access point exists indicator seen including,
The media segment is a live profile conforming media segment generated by concatenating the plurality of media fragments for low latency streaming;
The instructions further relate to the device a first representation of the media representation having the plurality of live profile conforming media segments and a second representation of the media representation having the plurality of media fragments of a low latency stream. Generate a single media presentation description (MPD) file to store the information,
Non-transitory computer readable storage medium.

When executed, the computer-executable instructions cause the one or more computing devices to determine a position of the plurality of media fragments relative to the random access point, the random access point being the plurality of media fragments. 14. The non-transitory computer readable storage medium of claim 13 , further comprising the step of: being located at a variable non-fixed location.