WO2018079293A1

WO2018079293A1 - Information processing device and method

Info

Publication number: WO2018079293A1
Application number: PCT/JP2017/037116
Authority: WO
Inventors: 金井　健一; 俊也浜田; 充勝股
Original assignee: ソニー株式会社
Priority date: 2016-10-27
Filing date: 2017-10-13
Publication date: 2018-05-03

Abstract

The present invention relates to an information processing device and a method capable of suppressing an increase in the load on a playback terminal. Prior to delivering content data for delivery, the content data is edited and post-edited content data is created, and the created post-edited content data is added to the content data for delivery. In addition, prior to delivering content data for delivery, management information is created that includes information for managing the playback of the content data and information for managing the playback of the post-edited content data created by editing the content data. The present invention is applicable to, for example, an information processing device, a file generation device, a delivery server, or a playback terminal.

Description

INFORMATION PROCESSING APPARATUS AND METHOD

The present disclosure relates to an information processing apparatus and method, and more particularly to an information processing apparatus and method capable of suppressing an increase in load on a playback terminal.

Conventionally, MPEG-DASH (Moving Picture Experts Group phase-Dynamic Adaptive Streaming over HTTP) has been developed to stream video and music data over the Internet (see, for example, Non-Patent Document 1).

In the case of MPEG-DASH, for example, a plurality of audio data having the same content but different bit rates are prepared on the distribution side, and the bit rate can be changed by switching the audio data to be distributed according to the bandwidth of the transmission path. can do. Further, for example, it is possible to make the language variable by preparing a plurality of audio data of the same content different only in language on the distribution side and switching the audio data to be distributed according to the user's instruction or the like.

However, when switching audio data such as these, a mismatch in codec (encoding / decoding method) or parameter occurs between data before and after the switching, and the waveform of audio data becomes smooth before and after the switching. There was a risk that noise could occur as a result.

Therefore, it has been considered, for example, in the playback terminal to edit the audio data of this switching portion to suppress the generation of noise.

However, for example, for high-quality coding of music, DSD (Direct Stream Digital) is known as a high-quality coding method. Since DSD is difficult to perform partial digital editing, it is necessary to perform editing after converting it to PCM or the like, which may increase the load compared to editing PCM. That is, the load on the playback terminal may be increased.

The present disclosure has been made in view of such a situation, and makes it possible to suppress an increase in load on a playback terminal.

The information processing apparatus according to one aspect of the present technology edits the content data to generate edited content data before distributing the distributed content data, and generates the edited content data for distribution. An information processing apparatus including a generation unit to be added to the content data of

The generation unit may edit the content data in units of segments to generate the edited content data.

The generation unit may edit a partial section of the segment unit content data.

The generation unit may select a partial interval of a predetermined length from the start of the segment unit content data or a partial interval of a predetermined length from the segment unit content data to the end, or both of them. It can be edited.

The generation unit may convert the edited content data of the edited segment into a file separate from the content data before editing and add the file to the distribution content data.

The generation unit may convert the edited content data of the partial section of the edited segment into a file separate from the content data before editing and add the file to the distribution content data. it can.

The generation unit may convert content data after editing of the partial section of the edited segment into a file with the content data before editing as one file, and add the file to the content data for distribution .

The content data may be audio data.

The generation unit fades in and edits a partial section of a predetermined length from the start of the audio data of the segment unit, or a partial section of a predetermined length from the end of the audio data of the segment unit You can fade out, edit, or both.

In the information processing method according to one aspect of the present technology, before distributing content data for distribution, the content data is edited to generate edited content data, and the generated edited content data is used for distribution Information processing method to be added to the content data of

An information processing apparatus according to another aspect of the present technology, before distributing content data for distribution, manages information for managing reproduction of the content data and reproduction of edited content data obtained by editing the content data. This is an information processing apparatus including a management information generation unit that generates management information including information.

The management information is an MPD (Media Presentation Description), and the management information generation unit adapts the information for managing the reproduction of the content data after the editing separately from the information for managing the reproduction of the content data before the editing. It may be configured to generate the MPD as an Adaptation Set.

The management information is an MPD (Media Presentation Description), and the management information generation unit edits information for managing reproduction of the edited content data including information indicating a length of the edited content data. The MPD may be configured to be generated as an adaptation set different from the information for managing the reproduction of the previous content data.

The management information is an MPD (Media Presentation Description), and the management information generation unit is adapted from the information for managing the reproduction of the content data after editing and the information for managing the reproduction of the content data before editing. It may be configured to generate the MPD as an Adaptation Set.

An information processing method according to another aspect of the present technology manages information for managing reproduction of the content data and reproduction of edited content data after editing the content data before distributing the content data for distribution. This is an information processing method of generating management information including information.

An information processing apparatus according to still another aspect of the present technology controls an acquisition unit that acquires content data according to a predetermined parameter and the acquisition unit, and switches content data to be acquired according to a change in the parameter. And an information processing unit including a control unit configured to acquire content data after editing in which predetermined editing has been performed on content data before switching, and content data after editing in which predetermined editing has been performed on content data after switching It is an apparatus.

The acquisition unit acquires the content data in units of segments, and when the control unit switches the content data to be acquired according to a change in the parameter, the content data in units of segments before switching is predetermined. It is possible to obtain segment data after editing in segment units and content data after segment units in which predetermined editing is performed on segment data after switching.

When the control unit switches the content data to be acquired according to the change of the parameter, the predetermined editing is performed on a partial section of a predetermined length until the end of the segment unit content data before switching. The segment-based edited content data and the segment-based edited content data in which a predetermined editing has been performed in a partial section of a predetermined length from the start of the segment-based content data after switching are acquired be able to.

The content data is audio data, and when the control unit switches the content data to be acquired according to a change in the parameter, a part of a predetermined length until the end of the content data in units of segments before switching. Segment-based editing in which fade-in editing has been performed on a partial section of a predetermined length from the start of content data after segment-based editing where fade-out editing has been performed on the section and segment-based content data after switching It can be configured to obtain later content data.

In an information processing method according to still another aspect of the present technology, when content data corresponding to a predetermined parameter is acquired and content data to be acquired is switched according to a change in the parameter, the content data before switching is subjected to predetermined editing In the information processing method, the post-editing content data that has been edited and the post-editing content data that has been subjected to predetermined editing on the switched content data are acquired.

In the information processing device and method according to one aspect of the present technology, before distributing content data for distribution, the content data is edited to generate edited content data, and the generated edited content data is generated. Added to content data for distribution.

In the information processing apparatus and method according to another aspect of the present technology, before distributing content data for distribution, information for managing reproduction of the content data and reproduction of edited content data obtained by editing the content data Management information including information for managing

In an information processing apparatus and method according to still another aspect of the present technology, content data corresponding to a predetermined parameter is acquired, and when switching content data to be acquired according to a change in the parameter, the content data before switching is selected. Content data after editing in which predetermined editing has been performed and content data after editing in which predetermined editing has been performed on content data after switching are acquired.

According to the present disclosure, information can be processed. In particular, an increase in load on the playback terminal can be suppressed.

It is a figure explaining the example of the mode of the data transmission which used MPEG-DASH. It is a figure which shows the structural example of MPD. It is a figure explaining the time division | segmentation of content. It is a figure which shows the example of the hierarchical structure below Period in MPD. It is a figure explaining the example of composition of an MPD file on a time-axis. It is a figure which shows the example of DASH video adaptation. It is a figure which shows the example of DASH audio adaptation. It is a figure which shows the example of the mode of the discontinuous generation | occurrence | production in DASH audio data. It is a figure which shows the example of a mode of generation | occurrence | production of the data discontinuity between music. It is a figure which shows the example of the mode of the language selection in DASH. It is a figure which shows the example of a mode of generation | occurrence | production of the data discontinuity at the time of language switching. It is a figure which shows the example of a mode of the avoidance of noise generation using fade out fade in. It is a figure explaining a DSD method. It is a figure explaining the example of the mode of digital sigma modulation. It is a figure explaining the example of the size comparison by a codec. It is a figure explaining the example of the system which does not have a DSD exclusive apparatus * circuit. It is a figure explaining an example of a system which has DSD special equipment and a circuit. It is a figure explaining the example of the mode of the edit process of PCM and DSD. It is a figure which shows the example of the mode of switching of audio data. It is a block diagram which shows the example of the mode of the fade process performed by the server side. It is a figure explaining the example of the segment arrangement | positioning to a server. It is a figure explaining the example of a statement of MPD. It is a figure explaining an example of a delivery system. It is a block diagram showing an example of main composition of a file generation device. It is a flow chart explaining an example of a flow of sound file generation processing. It is a block diagram which shows the main structural examples of a reproduction | regeneration terminal. It is a flowchart explaining the example of the flow of reproduction processing. It is a flowchart explaining the example of the flow of bandwidth selection processing. It is a figure explaining the example of the segment arrangement | positioning to a server. It is a figure explaining the example of a statement of MPD. It is a block diagram showing an example of main composition of a file generation device. It is a flow chart explaining an example of a flow of sound file generation processing. It is a block diagram which shows the main structural examples of a reproduction | regeneration terminal. It is a flowchart explaining the example of the flow of bandwidth selection processing. It is a figure explaining the example of the segment arrangement | positioning to a server. It is a figure explaining the data arrangement in mdat, and the reference position of a track. It is a figure explaining the example of a statement of MPD. It is a figure explaining the structural example of MPD. It is a figure explaining the relationship between the level of a sub presentation, and a track. It is a figure explaining the example of a mode of the track and sample before edit. It is a figure explaining the example of a mode of the relation of the track and sample at the time of fade-in. It is a figure explaining the example of a mode of the relation of the track and sample at the time of fade-out. It is a block diagram showing an example of main composition of a file generation device. It is a flow chart explaining an example of a flow of sound file generation processing. It is a block diagram which shows the main structural examples of a reproduction | regeneration terminal. It is a flowchart explaining the example of the flow of bandwidth selection processing. It is a figure explaining the example of the mode of lossless compression. It is a figure explaining the example of a mode of switching of content data. It is a figure which shows the example of the silence byte of DSD. It is a figure which shows the example of a mode of the matching of the silence byte of DSD. It is a figure which shows the example of a mode of the mismatch of the silence byte of DSD. FIG. 7 illustrates an example of DSD silence byte signaling. It is a block diagram showing an example of main composition of a computer.

Hereinafter, modes for carrying out the present disclosure (hereinafter referred to as embodiments) will be described. The description will be made in the following order.
0. Overview 1. First embodiment (delivery system)
2. Second Embodiment (Distribution System)
3. Third Embodiment (Distribution System)
4. Fourth Embodiment (Distribution System)
5. Fifth Embodiment (Distribution System)
6. Other

<0. Overview>
<Distribution of video and audio>
In recent years, streaming delivery via the Internet is expected as a means for delivering video and music to consumers. However, the Internet as a transmission means is unstable in transmission compared to broadcast and optical disks. First, the maximum rate of the transmission band changes greatly depending on the user's environment. Furthermore, even for the same user, a constant transmission band is not always secured, and it fluctuates with the passage of time. The fluctuation of the transmission bandwidth also means that the response time to the request from the client is not constant.

As a standard for such transmission over the Internet, MPEG-DASH (Moving Picture Experts Group-Dynamic Adaptive Streaming over HTTP) has been developed. This is a pull type model in which a plurality of files with different data sizes are placed on the server side, and the client refers to the MPD (Media Presentation Description) to select the most suitable file. A general HTTP (HyperText Transfer Protocol) server can be used by using http without using a special protocol. The file format is not only MPEG-TS (Moving Picture Experts Group-Transport Stream), but also a file of International Organization for Standardization Base Media File Format (ISOBMFF) format.

<MPEG-DASH>
An example of data transmission using MPEG-DASH is shown in FIG. In the information processing system 1 of FIG. 1, the file generation device 2 generates video data and audio data as moving image content, encodes the data, and converts it into a file format for transmission. For example, the file generation device 2 files (segments) these data every 10 seconds or so. The file generation device 2 uploads the generated segment file to the Web server 3. Also, the file generation device 2 generates an MPD file (management file) for managing moving image content, and uploads it to the Web server 3.

The Web server 3 as a DASH server distributes the file of the moving image content generated by the file generation device 2 live to the reproduction terminal 5 via the Internet 4 in a method conforming to MPEG-DASH. For example, the web server 3 stores the segment file and the MPD file uploaded from the file generation device 2. Further, in response to a request from the reproduction terminal 5, the Web server 3 transmits the stored segment file and MPD file to the reproduction terminal 5.

The reproduction terminal 5 (reproduction apparatus) includes streaming data control software (hereinafter also referred to as control software) 6, video reproduction software 7, client software for HTTP access (hereinafter referred to as access software). ) Execute 8 mag.

The control software 6 is software that controls data to be streamed from the web server 3. For example, the control software 6 acquires an MPD file from the web server 3. Also, the control software 6 transmits the segment file to be reproduced based on, for example, the reproduction time information indicating the reproduction time specified by the MPD file or the moving image reproduction software 7 and the network band of the Internet 4 The request is commanded to the access software 8.

The video reproduction software 7 is software that reproduces the encoded stream acquired from the Web server 3 via the Internet 4. For example, the moving image reproduction software 7 designates reproduction time information as the control software 6. In addition, when the moving image reproduction software 7 receives a notification of reception start from the access software 8, the video reproduction software 7 decodes the encoded stream supplied from the access software 8. The video reproduction software 7 outputs video data and audio data obtained as a result of decoding.

The access software 8 is software that controls communication with the Web server 3 using HTTP. For example, the access software 8 supplies the moving image reproduction software 7 with a notification of reception start. Further, the access software 8 transmits a transmission request for the encoded stream of the segment file to be reproduced to the Web server 3 in response to an instruction of the control software 6. Furthermore, the access software 8 receives the segment file of the bit rate according to the communication environment and the like transmitted from the Web server 3 in response to the transmission request. Then, the access software 8 extracts the encoded stream from the received file, and supplies it to the moving image reproduction software 7.

<MPD>
Next, the MPD will be described. The MPD has, for example, a configuration as shown in FIG. In MPD analysis (parsing), the client (reproduction terminal 5 in the example of FIG. 1) is optimum from the attribute of Representation included in Period of MPD (Media Presentation of FIG. 2). Choose one.

The client reads the top segment (Segment) of the selected Representation (Representation) to obtain and process the Initialization Segment. Subsequently, the client acquires and reproduces the subsequent segment.

The relationship between Period, Representation, and Segment in the MPD is as shown in FIG. That is, one media content can be managed for each period, which is a data unit in the time direction, and each period is managed for each segment, which is a data unit in the time direction. Can. Also, for each period, a plurality of representations (Representations) with different attributes such as bit rate can be configured.

Therefore, the file of this MPD (also referred to as MPD file) has a hierarchical structure as shown in FIG. 4 below Period. Further, when the structure of this MPD is arranged on the time axis, it becomes as shown in the example of FIG. As is clear from the example of FIG. 5, a plurality of representations (Representations) exist for the same segment (Segment). By adaptively selecting one of these, the client can acquire and reproduce appropriate stream data according to the communication environment, its own decoding capability, and the like.

<Switching of content>
Generally, in MPEG-DASH, as in the example shown in FIG. 6, the distribution server (WEB server) prepares video data groups having different bit rates for one video content, and the reproduction terminal has a transmission path status. Adaptive streaming delivery is realized by requesting the video data group of the optimal bit rate according to. On the other hand, audio is often compressed at a sufficiently smaller size than moving pictures using lossy compression, and often only one audio data is prepared for one audio content.

However, in the case of the MPEG-DASH audio distribution service mainly composed of high-quality audio, as in the example shown in FIG. Prepare multiple audio data groups with different bit rates like video data groups by using multiple codecs such as uncompressed, lossless compression, lossy compression, etc., and changing the sampling frequency and compression ratio etc. A possible method is to realize adaptive streaming delivery by making an audio data request according to the situation.

When switching to audio data of different bit rates occurs during playback, as shown in FIG. 8, on the playback terminal side, data with codec mismatch or identical codec but parameter mismatch at that boundary is continuous. It may be played back. Although their data are identical, the waveforms may not completely match in the process of encoding / decoding, etc., and the audio waveform may not be connected smoothly and cause noise generation.

For example, as in the example shown in FIG. 9, when the end of the preceding song ends at a small volume and the beginning of the following song starts at a relatively large volume between two songs, the connection boundary is discontinuous before and after And noise may occur.

Also, with content etc. compatible with multiple languages, as in the example shown in FIG. 10, the distribution server prepares audio data groups having different contents, and if language selection is dynamically performed during reproduction, the contents differ. As continuous data is continuously reproduced, this may also cause noise.

For example, as in the example shown in FIG. 11, when the language to be reproduced is switched from Japanese to English, in the reproduction terminal, Japanese audio data before the language switching and English audio data after the switching are Is reproduced continuously, but the content is different at the connection boundary, and noise may occur.

One of the methods for avoiding the noise caused by these data discontinuities is a method of connecting the front and back of the boundary by using fade-out editing or fade-in editing on the playback terminal. For example, as shown in FIG. 12, the preceding data is faded out and edited (the volume is gradually reduced gradually) to create silent data. Next, the following data is faded in and edited from silence (the volume is smoothly increased) to create data restored to the original volume. Finally, the two created data are connected and continuously reproduced. By making the connection boundary silent in this manner, the generation of noise can be suppressed.

<DSD>
By the way, the quality of video and music data is increasing, and at the same time, higher quality data distribution is also required for distribution. For example, DSD (Direct Stream Digital) is known as a high-quality modulation method for audio signals (FIG. 13). As shown in FIG. 13, in the case of PCM (Pulse Code Modulation), the signal value at each sampling time of the audio analog signal is converted to digital data of fixed number of bits, while in the case of DSD, FIG. As shown, the audio analog signal is ΔΣ modulated and converted to 1-bit digital data. That is, the information of the audio analog signal is represented by the density of the change points of "1" and "0" using the time axis. Therefore, high resolution recording and reproduction independent of the number of bits can be realized.

However, it is difficult to perform partial digital editing (increase / decrease in volume, partial deletion / combination in time, etc.) because the result of the past A / D conversion output affects subsequent output results as a feature of delta sigma modulation The Further, as shown in the table shown in FIG. 15, for example, the data size is large because the change is recorded at a high sampling frequency. For example, in the case of DSD, since the sampling frequency is as high as 2.8 MHz, 5.6 MHz and 11.2 MHz, for example, the bit rate becomes 5.6 Mbps, 11.2 Mbps and 22.4 Mbps with 2 channels respectively.

<Play>
In addition, as shown in Fig. 16, in systems without devices or circuits compatible with DSD, PCM 16 bit, 44.1 kHz (the sound quality equivalent to conventional CDs) or a bit higher than that supported by many systems Because it is played back after converting to PCM16bit 48kHz of the sound quality, it is not possible to reproduce the high sound quality originally possessed by DSD.

Therefore, for example, a device that directly processes a DSD as shown in FIG. 17 is externally connected, or by having such a circuit inside, a corresponding system is configured. Such devices and circuits often support PCM, and even if they do not correspond directly, they often have other devices and circuits that can process PCM in the system, so in many cases they are compatible with DSD / PCM. Become. However, even with devices and circuits compatible with such DSD / PCM, DSD / PCM switching during playback and parameter changes such as sampling frequency involve setting change time, which causes noise and sound interruption. It is not realistic to change settings during playback because it is possible.

<Fade out edit of DSD, fade in edit>
As described above, DSD is difficult to perform partial digital editing, and there is a risk that sound quality may be lost if converted to low-quality PCM. A codec that reproduces as much as possible even in a system compatible with DSD and PCM It is desirable not to change the On the other hand, in the case of an audio format such as PCM in which there is no correlation between preceding and succeeding data, editing is relatively easy. Therefore, when partial editing of the DSD is performed, for example, as shown in FIG. 18, once converted to a PCM (for example, 176.4 kHz 32 bit) which does not impair the sound quality of the DSD, editing such as volume change is performed. The editorial method of converting to DSD was conceived again. As described above, in the DSD, the processing load may be high because the number of processing steps is larger than the editing of the PCM.

<Noise generation and increase of load on the terminal>
As described above, in the MPEG-DASH audio distribution service mainly composed of high-quality audio, audio data groups of the same content and plural bit rates (or audio data groups of plural contents in other language compatible contents etc.) are arranged in the distribution server. The audio waveform may not be connected smoothly at the connection boundary of the data downloaded in response to the bandwidth fluctuation of the communication path or the request from the playback terminal, which may cause noise generation.

Since such noise generation is a fatal problem in the distribution service mainly based on high-quality audio, each reproduction terminal receiving the distribution has difficulty in editing and has a high processing load, as shown in FIG. 19, for example. For various audio data including codecs, etc., it is necessary to perform processing of fade-out, fade-in editing, and silent connection before and after the connection boundary, which may increase the load.

<Edit by server>
Therefore, the server performs the processing performed by each playback terminal at the time of playback, and when the playback terminal requests audio data having different bit rates or contents during audio playback, it is possible to download faded out or faded out audio data It can be so. That is, before distributing content data for distribution, the content data is edited to generate edited content data (also referred to as editing data), and the generated edited content data is added to the distributed content data To do.

By doing this, the playback terminal can acquire the edited content data from the server, and therefore, there is no need to edit the content data acquired from the server. That is, an increase in load on the playback terminal can be suppressed.

At this time, content data in segments may be edited to generate edited content data. In addition, in the editing, a partial section of content data in units of segments may be edited. For example, a partial interval of a predetermined length from the start of content data in segment units, or a partial interval of a predetermined length from the end of content data in segment units, or both of them may be edited. It is also good. By doing this, it is possible to suppress the generation of noise at the connection boundary of the segment in which the content data is switched.

To describe a more specific example, as shown in A of FIG. 20, assuming that data of two bit rates (original adaptation set) of bit rate # 1 and bit rate # 2 are prepared in the server, For each of them, a fade-in adaptation set in which the data of each segment is subjected to fade-in processing, and a fade-out adaptation set in which the data of each segment is subjected to fade-out processing are generated.

When switching the content data to be reproduced from bit rate # 1 to bit rate # 2 as shown in B of FIG. 20, the timing of the arrow 111 during reproduction of the segment 101 of the original adaptation set of bit rate # 1 by the reproduction terminal. When a band change is detected at, the playback terminal switches the segment for which transmission is next requested to the segment 102 of the fade-out adaptation set of bit rate # 1 at arrow 112. Then, the playback terminal switches the segment for which transmission is requested next to the segment 103 of the fade-in adaptation set of bit rate # 2. Then, the playback terminal further switches the segment requested to be transmitted next to the segment 104 of the original adaptation set of bit rate # 2.

By doing this, the segments before and after are connected silently at the timing indicated by the arrow 113 at which the bit rate switches from bit rate # 1 to bit rate # 2. Therefore, the generation of noise at the time of switching of content data can be suppressed. Then, as described above, since the server performs the fade-in editing process and the fade-out editing process, it is possible for each reproduction terminal to suppress the generation of noise without editing the connection boundary of the audio data. That is, an increase in load on the playback terminal can be suppressed.

Also, before distributing content data for distribution, management information including information for managing reproduction of the content data and information for managing reproduction of the content data after editing the content data is generated Make it By doing this, the playback terminal can play back the edited content data based on the management information. That is, an increase in load on the playback terminal can be suppressed.

Note that the audio editing process (generation of editing data and management information) performed on the server side may be performed in advance before the request from the terminal, or may be performed at the time of the request.

<1. First embodiment>
<Overview>
The configuration of data after editing is arbitrary, but for example, as shown in FIG. 21, the entire segment including the editing portion may be additionally created as a fade in segment and a fade out segment, respectively. That is, the edited content data of the edited segment may be converted into a file separate from the content data before editing and added to the distribution content data.

In FIG. 21, the server generates the fade-in adaptation set 122-1 and the fade-out adaptation set 123-1 from the original adaptation set 121-1 of bit rate # 1.

That is, for example, the server fades in and edits a partial section of a predetermined length from the beginning (start) of the segment 124-1-1 of the original adaptation set 121-1 to generate the segment 125-1-1. , Fade-in-edit a partial section of a predetermined length from the beginning (start) of segment 124-1-2 to generate segment 125-1-2, and fade in adaptation set 122-1 for the entire segment Store in Also, for example, the server fades out and edits a partial section of a predetermined length up to the end (end) of the segment 124-1-1 of the original adaptation set 12-1 to generate a segment 126-1-1 , Fade out and edit some sections of a predetermined length from the end (end) of segment 124-1-2 to generate segments 126-1-2, and store the entire segments in fade-out adaptation set 123-1. Do.

Hereinafter, when it is not necessary to distinguish each segment of the original adaptation set 121-1 from each other, it will be referred to as a segment 124-1. Also, when it is not necessary to distinguish each segment of the fade-in adaptation set 122-1, they are referred to as a segment 125-1. Also, when it is not necessary to distinguish the segments of fade-out adaptation set 123-1, they are referred to as segment 126-1.

Similarly, when there are other segments 124-1 in the original adaptation set 121-1, the segments 125-1 of the fade-in adaptation set 122-1 and the segments 126 of the fade-out adaptation set 123-1 from each segment 124-1 Generate -1 and

Also, the server generates a fade-in adaptation set 122-2 and a fade-out adaptation set 123-2 from the original adaptation set 12-2 of bit rate # 2.

That is, for example, the server fades in and edits a partial section of a predetermined length from the beginning (start) of the segment 124-2-1 of the original adaptation set 12-2, and generates the segment 125-2-1. , Fade-in-edit a partial section of a predetermined length from the beginning (start) of segment 124-2-2 to generate segment 125-2-2, and fade in adaptation set 122-2 over the entire segments Store in Also, for example, the server fades out and edits a partial section of a predetermined length up to the end (end) of the segment 124-2-1 of the original adaptation set 12-2 to generate a segment 126-2-1 , Fade-edit some sections of a predetermined length from the end (end) of the segment 124-2-2 to generate segments 126-2-2, and store the entire segments in the fade-out adaptation set 123-2 Do.

In the following, when it is not necessary to distinguish each segment of the original adaptation set 121-2 from each other, it is referred to as a segment 124-2. Also, when it is not necessary to distinguish each segment of the fade-in adaptation set 122-2, they are referred to as a segment 125-2. Also, when it is not necessary to distinguish the segments of fade-out adaptation set 123-2, they are referred to as segment 126-2.

Similarly, when there are other segments 124-2 in the original adaptation set 12-2, segments 125-2 of fade-in adaptation set 122-2 and segments 126 of fade-out adaptation set 123-2 from each segment 124-2. Generate -2.

In the following, when it is not necessary to distinguish the original adaptations of each bit rate, they are referred to as an original adaptation set 121. Also, when it is not necessary to distinguish fade in adaptation of each bit rate from each other, it is referred to as fade in adaptation set 122. In addition, when it is not necessary to distinguish fade-out adaptation of each bit rate from each other, it is referred to as fade-out adaptation set 123.

Also, when it is not necessary to distinguish each segment of each original adaptation set 121 from each other, it is referred to as a segment 124. Also, when it is not necessary to distinguish each segment of each fade-in adaptation set 122 from each other, it is referred to as a segment 125. Also, when it is not necessary to distinguish each segment of each fade-out adaptation set 123, it is referred to as a segment 126.

Similarly, if there is an original adaptation set 121 of another bit rate, (the respective segments 125 of) the fade-in adaptation set 122 corresponding thereto and (the respective segments 126 of) the corresponding fade-out adaptation set 123 are generated. Ru.

Also, in response to such editing, the server generates an MPD in which the information for managing reproduction of content data after editing is an adaptation set different from the information for managing reproduction of content data before editing. You may do so. For example, the server may generate an MPD as shown in FIG. 22 corresponding to the editing described with reference to FIG. In FIG. 22, a portion surrounded by a dotted line 131 is a description regarding the original adaptation set 121. Note that in the MPEG-DASH standard, a playback terminal that can not understand the contents of schemaIdUri of the EssentialProperty element described in the part enclosed by dotted line 132 and the part enclosed by dotted line 133 described below has an Element (here, AdaptationSet) of its parent. Since there is a rule for skipping, a conventional playback terminal not compatible with the present technology processes this original adaptation set 121.

Further, a portion surrounded by a dotted line 132 is a description related to the fade in adaptation set 122. This adaptation set is, for example, an adaptation set selected behind the content selection boundary due to a band change. In this adaptation set, segments edited by the server so that the volume of the playback start gradually increases from silent are stored. The schemaIdUri of the EssentialProperty element in the second line from the top in the dotted line 132 indicates that this adaptation set is a fade in adaptation set.

A portion surrounded by a dotted line 133 is a description related to the fade out adaptation set 123. The adaptation set is, for example, an adaptation set selected in front of the content selection boundary due to a change in bandwidth. In this adaptation set, segments edited by the server so as to gradually decrease in volume toward silence and become silent are stored. The schemaIdUri of the EssentialProperty element in the second line from the top in the dotted line 133 indicates that this adaptation set is a fade out adaptation set.

By generating such management information (MPD) on the server side, the reproduction terminal side can perform reproduction using the edited content data according to the management information. Therefore, an increase in load on the playback terminal can be suppressed.

<Distribution system>
Next, a system to which the present technology as described above is applied will be described. FIG. 23 is a block diagram illustrating an example of a configuration of a delivery system which is an aspect of an information processing system to which the present technology is applied. The delivery system 300 shown in FIG. 23 is a system for delivering data (content) such as images and sounds. In the distribution system 300, the file generation apparatus 301, the distribution server 302, and the reproduction terminal 303 are communicably connected to each other via the network 304.

The file generation device 301 is an aspect of an information processing device to which the present technology is applied, and is a device that performs processing relating to generation of an MP4 file storing audio data and an MPD file (also referred to as an MPD file). For example, the file generation device 301 generates audio data, generates an MP4 file storing the generated audio data and an MPD file managing the MP4 file, and supplies the generated files to the distribution server 302.

The distribution server 302 is an aspect of an information processing apparatus to which the present technology is applied, and is a server that performs processing related to a content data distribution service using MPEG-DASH (that is, a distribution service of MP4 files using MPD files). is there. For example, the distribution server 302 acquires and manages the MPD file and the MP4 file supplied from the file generation device 301, and provides a distribution service using MPEG-DASH. For example, the distribution server 302 provides the MPD file to the reproduction terminal 303 in response to a request from the reproduction terminal 303. Further, the distribution server 302 supplies the requested MP4 file to the reproduction terminal 303 in response to the request from the reproduction terminal 303 based on the MPD file.

The reproduction terminal 303 is an aspect of an information processing apparatus to which the present technology is applied, and is an apparatus that performs processing related to reproduction of audio data. For example, the reproduction terminal 303 requests the distribution server 302 to distribute the MP4 file in accordance with the MPEG-DASH, and acquires the supplied MP4 file in response to the request. More specifically, the reproduction terminal 303 acquires an MPD file from the distribution server 302, and acquires, from the distribution server 302, an MP4 file storing desired content data according to the information of the MPD file. The playback terminal 303 decodes the acquired MP4 file and plays back audio data.

The network 304 is an arbitrary communication network, may be a communication network of wired communication, may be a communication network of wireless communication, or may be configured by both of them. Also, the network 304 may be configured by one communication network, or may be configured by a plurality of communication networks. For example, the Internet, a public telephone network, a wide area communication network for wireless mobiles such as so-called 3G circuits and 4G circuits, WAN (Wide Area Network), LAN (Local Area Network), communication conforming to Bluetooth (registered trademark) standards Communication network for near field communication such as NFC (Near Field Communication), communication channel for infrared communication, standard such as HDMI (High-Definition Multimedia Interface) or USB (Universal Serial Bus) A communication network or communication path of any communication standard such as a communication network of wired communication conforming to the above may be included in the network 304.

The file generation apparatus 301, the distribution server 302, and the reproduction terminal 303 are communicably connected to the network 304, and can exchange information with each other via the network 304. The file generation apparatus 301, the distribution server 302, and the reproduction terminal 303 may be connected to the network 304 by wired communication, may be connected by wireless communication, or may be connected by both of them. It may be done.

In FIG. 23, one file generation device 301, one distribution server 302, and one reproduction terminal 303 are shown as the configuration of the distribution system 300, but these numbers are arbitrary and identical to each other. It does not have to be. For example, in the distribution system 300, each of the file generation device 301, the distribution server 302, and the reproduction terminal 303 may be singular or plural.

<File generation device>
FIG. 24 is a block diagram showing a main configuration example of the file generation device 301. As shown in FIG. As illustrated in FIG. 24, the file generation device 301 includes an acquisition unit 311. The acquisition unit 311 acquires an audio analog signal or audio data of a predetermined format. The acquisition unit 311 supplies the acquired audio analog signal or audio data to the encoding unit 312.

The file generation device 301 further includes an encoding unit 312-1, a segment file generation unit 313-1, a fade out editing unit 314-1, a segment file generation unit 315-1, a fade in editing unit 316-1, and a segment file generation. It has a section 317-1. The fade-out editing unit 314-1 to the segment file generating unit 317-1 are collectively referred to as an editing data generating unit 321-1. These configurations relate to bit rate # 1 (FIG. 21).

The encoding unit 312-1 generates DSD data of bit rate # 1 using the audio analog signal or audio digital data supplied from the acquisition unit 311. For example, when an audio analog signal is supplied, the encoding unit 312-1 performs A / D conversion and encoding, and generates DSD data of bit rate # 1. Also, for example, when audio data in a format other than DSD such as PCM is supplied, the encoding unit 312 performs format conversion and the like to generate DSD data of bit rate # 1. The encoding unit 312-1 supplies the generated DSD data to the segment file generation unit 313-1. Also, the encoding unit 312-1 also supplies the generated DSD data to the fade-out editing unit 314-1 and the fade-in editing unit 316-1. When the DSD data is supplied from the acquisition unit 311, the encoding unit 312-1 performs the DSD data on the segment file generation unit 313-1, the fade-out editing unit 314-1, the fade-in editing unit 316-1, and the like. Supply to The DSD data supplied from the encoding unit 312-1 is data in units of segments.

The segment file generation unit 313-1 converts the DSD data supplied from the encoding unit 312-1 into MP4 files (also referred to as segment files) for each segment. That is, this segment file is a file of the original adaptation set 121-1 of bit rate # 1. The segment file generation unit 313-1 supplies the generated segment file to the upload unit 319.

The fade-out editing unit 314-1 performs fade-out editing as described above on the supplied DSD data in units of segments. However, in the DSD data, it is difficult to perform partial digital editing (increase / decrease in sound volume, partial deletion / combination in time, etc.) since past data affects subsequent data. Therefore, the fade-out editing unit 314-1 converts the supplied DSD data into, for example, PCM data (for example, 176.4 kHz 32 bits etc.) that does not impair the sound quality, and fades out and edits the obtained PCM data. Convert PCM data into DSD data. The fade-out editing unit 314-1 supplies the DSD data after editing to the segment file generating unit 315-1. This DSD data is data in units of segments, and includes not only the faded-out edited portion at the end of the segment but also a portion common to the previous original data.

The segment file generation unit 315-1 converts the DSD data supplied from the fade-out editing unit 314-1 into MP4 files for each segment. That is, this segment file is a file of fade-out adaptation set 123-1 of bit rate # 1. The segment file generation unit 315-1 supplies the generated segment file to the upload unit 319.

The fade-in editing unit 316-1 performs fade-in editing as described above on the supplied DSD data in units of segments. However, in the DSD data, it is difficult to perform partial digital editing (increase / decrease in sound volume, partial deletion / combination in time, etc.) since past data affects subsequent data. Therefore, the fade-in editing unit 316-1 converts the supplied DSD data into, for example, PCM data (for example, 176.4 kHz 32 bits etc.) that does not impair the sound quality, fades in and edits the obtained PCM data, Convert the edited PCM data into DSD data. The fade-in editing unit 316-1 supplies the DSD data after editing to the segment file generating unit 317-1. This DSD data is data in units of segments, and includes not only the fade-in edited portion at the beginning of the segment but also a portion common to the original data thereafter.

The segment file generation unit 317-1 converts the DSD data supplied from the fade-in editing unit 316-1 into MP4 files for each segment. That is, this segment file is a file of the fade-in adaptation set 122-1 of bit rate # 1. The segment file generation unit 317-1 supplies the generated segment file to the upload unit 319.

As described above, the editing data generation unit 321-1 applies predetermined editing such as fade-in editing and fade-out editing to the original data, and generates a segment file (bit rate # 1) of the data after the editing. It is a processing unit to generate.

In addition, the file generation device 301 includes an encoding unit 312-2, a segment file generation unit 313-2, a fade-out editing unit 314-2, a segment file generation unit 315-2, a fade-in editing unit 316-2, and a segment file generation. It has a section 317-2. The fade-out editing unit 314-2 to the segment file generating unit 317-2 are collectively referred to as an editing data generating unit 321-2. These configurations relate to bit rate # 2 (FIG. 21).

The encoding unit 312-2 performs processing similar to that of the encoding unit 312-1, and generates DSD data of bit rate # 2. The encoding unit 312-2 supplies the generated segment unit DSD data to the segment file generation unit 313-2, the fade-out editing unit 314-2, and the fade-in editing unit 316-2.

The segment file generation unit 313-2 performs processing similar to that of the segment file generation unit 313-1, generates a segment file of the original adaptation set 12-2 of bit rate # 2, and supplies it to the upload unit 319.

The fade-out editing unit 314-2 performs the same processing as the fade-out editing unit 314-1, and performs the above-described fade-out editing on the supplied DSD data in units of segments. The fade-out editing unit 314-2 supplies the segment file generating unit 315-2 with the DSD data in segment units after editing.

The segment file generation unit 315-2 performs the same processing as the segment file generation unit 315-1, and converts the supplied DSD data into MP4 files for each segment. That is, the segment file generation unit 315-2 generates a segment file of the fade-out adaptation set 123-2 of bit rate # 2, and supplies it to the upload unit 319.

The fade-in editing unit 316-2 performs the same processing as the fade-in editing unit 316-1, and performs the above-described fade-in editing on the supplied DSD data in units of segments. The fade-in editing unit 316-2 supplies the segment unit generated DSD data after the editing to the segment file generating unit 317-2.

The segment file generation unit 317-2 performs the same processing as the segment file generation unit 317-1 and converts the supplied DSD data into MP4 files for each segment. That is, the segment file generation unit 315-2 generates a segment file of the fade-in adaptation set 122-2 of bit rate # 2, and supplies it to the upload unit 319.

As described above, the editing data generation unit 321-2 performs predetermined editing such as fade-in editing and fade-out editing on the original data, and generates a segment file (bit rate # 2) of the data after the editing. It is a processing unit to generate.

In the following, when it is not necessary to distinguish and explain each encoding unit, it is referred to as an encoding unit 312. In addition, when it is not necessary to distinguish and explain the segment file generation units that generate the segment files of the original adaptation set 121, the segment file generation unit 313 is referred to as “segment file generation unit 313”. In addition, when it is not necessary to distinguish and explain each fade-out editing unit, it is referred to as a fade-out editing unit 314. Also, when it is not necessary to distinguish the respective segment file generation units for generating the segment files of the fade-out adaptation set 123 from each other, they are referred to as a segment file generation unit 315. In addition, when it is not necessary to distinguish and explain each fade-in editing unit, it is referred to as a fade-in editing unit 316. Also, when it is not necessary to distinguish the respective segment file generation units for generating the segment files of the fade-in adaptation set 122 from each other, they are referred to as a segment file generation unit 317. Further, when it is not necessary to distinguish the respective edit data generation units from each other and describe them, they are referred to as an edit data generation unit 321.

As shown in FIG. 24, the configurations of the encoding unit 312 to the segment file generation unit 317 are provided as many as the number of bit rates to be prepared. The number of bit rates prepared as data for distribution is arbitrary. For example, when preparing data of three types of bit rates as data for distribution, the file generation device 301 has three sets of this configuration. Also, for example, when only data of one type of bit rate is prepared as data for distribution, the configurations of the encoding unit 312-2 to the segment file generation unit 317 can be omitted.

Each edit data generation unit 321 may operate (for all segments) at all times (generate a segment file), or operate only when the reproduction terminal 303 requests a bit rate switching. (A segment file may be generated).

Furthermore, the file generation device 301 includes an MPD generation unit 318 and an upload unit 319. The MPD generation unit 318 generates an MPD, which is information for managing reproduction of the segment file generated in the above processing unit. The MPD generation unit 318 supplies the generated MPD to the upload unit 319. The upload unit 319 uploads the supplied segment file and MPD to the distribution server 302.

<Flow of audio file generation process>
Next, an example of the flow of the audio file generation process performed by the file generation device 301 will be described with reference to the flowchart of FIG.

When the audio file generation process is started, the MPD generation unit 318 generates an MPD file in step S101. In step S102, the upload unit 319 uploads the MPD file generated in step S101 to the distribution server 302.

In step S103, the acquisition unit 311 acquires audio content such as an audio analog signal and audio data, and the encoding unit 312 encodes the audio content at a plurality of bit rates to generate DSD data (segments). In step S104, the fade-in editing unit 316 performs fade-in processing for each bit rate. In step S105, the fade-out editing unit 314 performs fade-out processing for each bit rate.

In step S106, the segment file generation unit 313 generates a segment file of the original adaptation set 121 for each bit rate. The segment file generation unit 315 generates, for each bit rate, a segment file of the entire segment including the intersection with the original data of the fade-out adaptation set 123. The segment file generation unit 317 generates, for each bit rate, a segment file of the entire segment including the portion common to the original data of the fade-in adaptation set 122.

In step S107, the upload unit 319 uploads the segment file of each bit rate generated in step S106 to the distribution server 302.

In step S108, the acquisition unit 311 determines whether to end the audio file generation process. If it is determined that the process for the next segment is to be performed without ending, the process returns to step S103, and the process thereafter is repeated. Then, in step S108, when it is determined that the audio file generation process is to end in the segment, the audio file generation process is ended.

By executing each process as described above, it is possible to suppress the occurrence of noise without performing editing processing such as fade-in editing and fade-out editing on the playback terminal, and therefore, to suppress an increase in load on the playback terminal. Can.

<Playback terminal>
FIG. 26 is a block diagram showing a main configuration example of the reproduction terminal 303. As shown in FIG. As shown in FIG. 26, the reproduction terminal 303 has an MPD acquisition unit 351, an MPD processing unit 352, a segment file acquisition unit 353, a selection unit 354, a buffer 355, a decoding unit 356, and an output control unit 357.

The MPD acquisition unit 351 performs a process related to acquisition of an MPD file. The MPD processing unit 352 performs processing on the MPD file acquired by the MPD acquisition unit 351.

The segment file acquisition unit 353 performs processing related to acquisition of content data. The selection unit 354 performs processing regarding selection of content data acquired by the segment file acquisition unit 353.

For example, when the segment file acquisition unit 353 acquires content data according to a predetermined parameter, and the selection unit 354 controls the segment file acquisition unit 353 to switch content data to be acquired according to a change in parameter, The post-editing content data before the switching is subjected to predetermined editing, and the post-editing content data after the predetermined editing is acquired for the post-switching content data. By doing this, it is possible to suppress the generation of noise at the time of content data switching. At this time, it is possible to suppress the occurrence of noise without performing editing processing such as fade-in editing and fade-out editing on the content data. That is, it is possible to realize the suppression of the increase in the load of the reproduction terminal 303.

When the segment file acquisition unit 353 acquires segment unit content data and the selection unit 354 switches the content data to be acquired according to a change in parameters, the segment data acquisition unit 353 performs predetermined editing on segment unit content data before switching. It is also possible to acquire the content data after the segment unit editing that has been performed, and the segment data after the editing performed for the segment unit content data after the switching.

Further, when the selecting unit 354 switches the content data to be acquired according to the change of the parameter, a predetermined editing is performed on a partial section of a predetermined length until the end of the segment unit content data before switching. Acquires the content data after editing in segment units and the content data after editing in segment units for which predetermined editing has been performed in a partial section of a predetermined length from the start of the content data in segment units after switching You may make it

Also, when the content data is audio data, and the selecting unit 354 switches the content data to be acquired according to the change of the parameter, a section of a predetermined length until the end of the content data in units of segments before switching. After the edit in fade-out, the segment data after editing in segment units, and after the start of the content data in segment units after switching, after edit in segment units in which fade-in edit is performed to a partial section of a predetermined length And may be acquired.

The buffer 355 performs processing regarding storage of segment files and the like acquired by the segment file acquisition unit 353. The decryption unit 356 performs a process related to the segment file decryption. The output control unit 357 performs processing related to output control of the decoded segment file.

<Flow of regeneration process>
An example of the flow of reproduction processing executed by such a reproduction terminal 303 will be described with reference to the flowchart in FIG.

When the reproduction process is started, in step S151, the selection unit 354 selects the audio stream of the smallest bandwidth. In step S152, the segment file acquisition unit 353 acquires a segment of a predetermined time length of the selected bandwidth. In step S153, the decoding unit 356 starts decoding.

In step S154, the selection unit 354 detects a network band. In step S155, the bandwidth of the audio stream is selected based on the network bandwidth and the bandwidth. In step S156, the selection unit 354 detects the actual bit rate of the audio stream. In step S157, the segment file acquisition unit 353 and the selection unit 354 execute a bandwidth selection process.

In step S158, the decoding unit 356 determines whether to end the reproduction. If it is determined that the decoding process is to be continued without ending the reproduction process, the process returns to step S154 to repeat the subsequent processes. If it is determined in step S158 that the reproduction is to be ended, the reproduction processing is ended.

<Flow of bandwidth selection process>
Next, an example of the flow of the bandwidth selection process executed in step S157 of FIG. 27 will be described with reference to the flowchart of FIG. When the bandwidth selection process is started, in step S171, the selection unit 354 determines whether to reselect the bandwidth of the audio stream. If it is determined to select again, the process proceeds to step S172.

In step S172, if there is no space in the buffer 355, the segment file acquisition unit 353 waits until it is empty. When a space is generated in the buffer 355, in step S173, the segment file acquisition unit 353 acquires an audio fade-out segment of a predetermined time length of the currently selected bandwidth.

In step S174, if there is no space in the buffer 355, the segment file acquisition unit 353 waits until the space is available. When a space is generated in the buffer 355, in step S175, the segment file acquisition unit 353 acquires an audio fade-in segment of a predetermined time length of the bandwidth selected as the change destination.

In step S176, if there is no space in the buffer 355, the segment file acquisition unit 353 waits until the space is available. When a space is generated in the buffer 355, in step S177, the segment file acquisition unit 353 acquires an original segment of a predetermined time length of the bandwidth selected as the change destination.

When the process of step S177 ends, the bandwidth selection process ends, and the process returns to FIG. If it is determined in step S171 that the bandwidth of the audio stream is not selected again, the processing in steps S172 to S177 is omitted, the bandwidth selection processing ends, and the processing returns to FIG.

By executing each process as described above, it is possible to suppress the occurrence of noise at the time of content data switching without performing editing processes such as fade-in editing and fade-out editing on content data. That is, it is possible to realize the suppression of the increase in the load of the reproduction terminal 303.

Note that by applying the present technology described in the present embodiment, editing by the file generation apparatus 301, distribution control by the distribution server 302, reproduction control (data acquisition control) by the reproduction terminal 303, etc. can be performed in units of segments. Since it is possible, it is possible to suppress an increase in the load of those processes.

<2. Second embodiment>
<Overview>
For example, as shown in FIG. 29, the edited data may be additionally created as fade-in and fade-out segments, as shown in FIG. That is, the content data after editing of a partial section of the edited segment may be filed as a file different from the content data before editing and added to the content data for distribution.

In FIG. 29, the server generates the fade-in adaptation set 402-1 and the fade-out adaptation set 403-1 from the original adaptation set 401-1 of bit rate # 1.

That is, for example, the server fades in and edits a partial section of a predetermined length from the beginning (start) of the segment 404-1-1 of the original adaptation set 401-1, and generates the segment 405-1-1. , Fade-in edit a partial section of a predetermined length from the beginning (start) of segment 404-1-2 to generate segment 405-1-2 and store them in fade-in adaptation set 402-1 . Also, for example, the server fades out and edits a partial section of a predetermined length up to the end (end) of the segment 404-1-1 of the original adaptation set 401-1, and generates a segment 406-1-1. Fade out and edit a partial section of a predetermined length from the end (end) of the segment 404-1-2 to generate segments 406-1-2 and store them in the fade-out adaptation set 403-1.

In the following, when it is not necessary to distinguish each segment of the original adaptation set 401-1, it will be referred to as segment 404-1. Also, when it is not necessary to distinguish the segments of the fade-in adaptation set 402-1, they are referred to as segments 405-1. Also, when it is not necessary to distinguish the segments of fade-out adaptation set 403-1 from each other, they are referred to as segment 406-1.

Similarly, if there are other segments 404-1 in the original adaptation set 401-1, segments 405-1 of fade-in adaptation set 402-1 and segments 406 of fade-out adaptation set 403-1 from each segment 404-1 Generate -1 and

Also, the server generates a fade-in adaptation set 402-2 and a fade-out adaptation set 403-2 from the original adaptation set 401-2 of bit rate # 2.

That is, for example, the server fades in and edits a partial section of a predetermined length from the beginning (start) of the segment 404-2-1 of the original adaptation set 401-2 to generate the segment 405-2-1. , Fade-in edit a partial section of a predetermined length from the beginning (start) of segment 404-2-2 to generate segment 405-2-2 and store them in fade-in adaptation set 402-2 . Also, for example, the server fades out and edits a partial section of a predetermined length up to the end (end) of the segment 404-2-1 of the original adaptation set 401-2 to generate a segment 406-2-1. , And fade-out edit a partial section of a predetermined length from the end (end) of the segment 404-2-2 to generate segments 406-2-2 and store them in the fade-out adaptation set 403-2.

In the following, when it is not necessary to distinguish each segment of the original adaptation set 401-2, it will be referred to as segment 404-2. Also, when it is not necessary to distinguish each segment of the fade-in adaptation set 402-2, they are referred to as a segment 405-2. Also, when it is not necessary to distinguish each segment of fade-out adaptation set 403-2, they are referred to as segment 406-2.

Similarly, when there are other segments 404-2 in the original adaptation set 401-2, segments 406-2 of fade-in adaptation set 402-2 and segments 406 of fade-out adaptation set 403-2 from each segment 404-2 Generate -2.

In the following, when it is not necessary to distinguish the original adaptations of each bit rate, they are referred to as an original adaptation set 401. Also, when it is not necessary to distinguish fade in adaptation of each bit rate from each other, it is referred to as fade in adaptation set 402. In addition, when it is not necessary to distinguish fade-out adaptation of each bit rate from each other, it is referred to as fade-out adaptation set 403.

Also, when it is not necessary to distinguish each segment of each original adaptation set 401 from each other, it is referred to as segment 404. Also, when it is not necessary to distinguish each segment of each fade-in adaptation set 402, it is referred to as segment 405. Also, when it is not necessary to distinguish each segment of each fade-out adaptation set 403, it is referred to as segment 406.

Similarly, if there is an original adaptation set 401 of another bit rate, (the respective segments 405 of) the fade in adaptation set 402 and its corresponding fade out adaptation set 403 (of the respective segments 406) are similarly generated. Ru.

Also, in response to such editing, the server manages reproduction of content data before editing, including information indicating the length of the content data after editing, and manages reproduction of the content data after editing. The MPD may be generated as an adaptation set different from the information to be processed. For example, the server may generate an MPD as shown in FIG. 30 in response to the editing described with reference to FIG. In FIG. 30, a portion surrounded by a dotted line 411 is a description related to the original adaptation set 401. Note that in the MPEG-DASH standard, a playback terminal that can not understand the contents of schemaIdUri of the EssentialProperty element described in the part enclosed by dotted line 412 and the part enclosed by dotted line 413 described below has an Element (here, AdaptationSet) of its parent. Since there is a rule to skip, a conventional playback terminal not compatible with the present technology processes this original adaptation set 401.

In addition, a portion surrounded by a dotted line 412 is a description regarding the fade in adaptation set 402. This adaptation set is, for example, an adaptation set selected behind the content selection boundary due to a band change. In this adaptation set, a segment of a portion (fade-in portion) edited by the server so that the volume of the playback start gradually increases from silent is stored. The schemaIdUri of the EssentialProperty element in the second line from the top in the dotted line 412 indicates that this adaptation set is a fade in adaptation set. The size of the fade-in partial segment (size of data to be replaced with the original segment) may be indicated during reproduction by the value attribute in the second line from the top. In the example of FIG. 30, it is shown to replace 10 MP4 samples (value = "10").

A portion surrounded by a dotted line 413 is a description regarding the fade out adaptation set 403. The adaptation set is, for example, an adaptation set selected in front of the content selection boundary due to a change in bandwidth. In this adaptation set, a segment of a portion (fade out portion) edited so that the volume gradually decreases toward silence and becomes silent by the server is stored. The schemaIdUri of the EssentialProperty element in the second line from the top in the dotted line 413 indicates that this adaptation set is a fade-out adaptation set. Note that the size of the fade-out partial segment (size of data to be replaced with the original segment) may be indicated during playback by the value attribute in the second row from the top. In the example of FIG. 30, it is shown to replace 10 MP4 samples (value = "10").

<File generation device>
FIG. 31 is a block diagram showing an example of the main configuration of the file generation apparatus 301 in this case. As shown in FIG. 31, also in this case, the file generation device 301 has basically the same configuration as that of the first embodiment (FIG. 24).

However, the file generation device 301 in this case further includes a removal unit 421-1, a removal unit 422-1, a removal unit 421-2, and a removal unit 422-2. The fade-out editing unit 314-1 to the segment file generating unit 317-1 and the removing unit 421-1 and the removing unit 422-2 are collectively referred to as an editing data generating unit 431-1. These configurations relate to bit rate # 1 (FIG. 29).

The removing unit 421-1 removes a portion (that is, a portion overlapping with the original data) in which the fade-out editing of the processing target segment is not performed from the DSD data supplied from the fade-out editing unit 314-1. In other words, the removing unit 421-1 generates, as a segment, DSD data of the remaining part, that is, the part subjected to fade-out editing. At that time, the removing unit 421-1 converts the supplied DSD data into, for example, PCM data (for example, 176.4 kHz 32 bits etc.) that does not impair the sound quality, and fade-out editing is performed from the obtained PCM data Remove the missing part and convert the remaining PCM data to DSD data.

The fade-out editing unit 314-1 may supply the edited data as it is to the removing unit 421-1 as the PCM data. In that case, the removing unit 421-1 removes a portion where fade-out editing is not performed from the supplied PCM data, and converts the remaining PCM data into DSD data.

The removing unit 421-1 supplies the DSD data (segment) of the portion subjected to fade-out editing to the segment file generating unit 315-1. The segment file generation unit 315-1 segments the DSD data of the portion subjected to the fade-out editing.

The removing unit 422-1 removes a portion in which fade-in editing of the processing target segment is not performed (that is, a portion overlapping with the original data) from the DSD data supplied from the fade-in editing unit 316-1. In other words, the removing unit 422-1 generates, as a segment, DSD data of the remaining part, that is, the part where fade-in editing has been performed. At that time, the removing unit 422-1 converts the supplied DSD data into, for example, PCM data (for example, 176.4 kHz 32 bits etc.) that does not impair the sound quality, and fade-in editing is performed from the obtained PCM data. Remove the missing parts and convert the remaining PCM data into DSD data.

The fade-in editing unit 316-1 may supply the edited data as it is to the removing unit 422-1 as the PCM data. In that case, the removing unit 422-1 removes the portion where the fade-in editing is not performed from the supplied PCM data, and converts the remaining PCM data into DSD data.

The removing unit 422-1 supplies the DSD data (segment) of the portion subjected to fade-in editing to the segment file generating unit 317-1. The segment file generation unit 317-1 segments the DSD data of the portion subjected to the fade-in editing.

As described above, the editing data generation unit 431-1 subjects the original data to predetermined editing such as fade-in editing and fade-out editing, and the segment file of the data of the edited portion (bit rate # 1 ) Processing unit.

Further, the fade-out editing unit 314-2 to the segment file generating unit 317-2, and the removing unit 421-2 and the removing unit 422-2 are collectively referred to as an editing data generating unit 431-2. These configurations relate to bit rate # 2 (FIG. 29).

The removing unit 421-2 performs the same process as the removing unit 421-1, and removes a portion of the DSD data supplied from the fade-out editing unit 314-2 in which the fade-out editing of the processing target segment is not performed. In other words, the removing unit 421-2 generates, as a segment, DSD data of a portion subjected to fade-out editing. The removing unit 421-2 supplies the DSD data (segment) of the portion subjected to fade-out editing to the segment file generating unit 315-2. The segment file generation unit 315-2 segments the DSD data of the portion subjected to the fade-out editing.

The removing unit 422-2 performs the same processing as that of the removing unit 422-1, and removes, from the DSD data supplied from the fade-in editing unit 316-2, a portion in which the fade-in editing of the processing target segment is not performed. In other words, the removing unit 422-2 generates, as a segment, DSD data of a portion subjected to fade-in editing. The removing unit 422-2 supplies the DSD data (segment) of the portion subjected to fade-in editing to the segment file generating unit 317-2. The segment file generation unit 317-2 segments the DSD data of the portion subjected to the fade-in editing.

As described above, the editing data generation unit 431-2 subjects the original data to predetermined editing such as fade-in editing and fade-out editing, and the segment file of data of the edited portion (bit rate # 2 ) Processing unit.

In the following, when it is not necessary to distinguish the respective removal units that perform processing on data after fade-out editing from each other, they are referred to as a removal unit 421. In addition, when it is not necessary to distinguish the respective removal units that perform processing on data after fade-in editing from each other, they are referred to as a removal unit 422. In addition, when it is not necessary to distinguish and explain the respective edit data generation units, the edit data generation unit 431 is referred to.

As shown in FIG. 31, the configurations of the encoding unit 312 to the segment file generation unit 317, and the removing unit 421 and the removing unit 422 are provided as many as the number of bit rates to be prepared. The number of bit rates prepared as data for distribution is arbitrary. Further, each edit data generation unit 431 may operate (for all segments) at all times (generate a segment file), or operate only when there is a request for bit rate switching from the playback terminal 303. (A segment file may be generated).

<Flow of audio file generation process>
Next, an example of the flow of the audio file generation process executed by the file generation device 301 will be described with reference to the flowchart of FIG. Also in this case, the audio file generation processing is basically performed in the same flow as in the case of the first embodiment (FIG. 25).

That is, each process of step S201 to step S205 of FIG. 32 is performed similarly to each process of step S101 to step S105 of FIG. However, in step S201, as described with reference to FIG. 30, the MPD generation unit 318 can describe information indicating the length of a fade-in edited or fade-out edited segment.

In step S206, the removing unit 421 and the removing unit 422 remove the portion other than the fade-edited (fade-in or fade-out) portion of the audio content that has been subjected to the fade processing (fade-in or fade-out). That is, the part overlapping with the original data is removed. When the process of step S206 ends, the process proceeds to step S207.

The processes of steps S207 to S209 are performed in the same manner as the processes of steps S106 to S108 of FIG. Then, if it is determined in step S209 that processing for the next segment is to be performed without ending, the processing returns to step S203, and the subsequent processing is repeated. Also, if it is determined in step S209 that the audio file generation process is to end at the segment, the audio file generation process ends.

<Playback terminal>
FIG. 33 is a block diagram showing a main configuration example of the reproduction terminal 303 in this case. As shown in FIG. 33, the reproduction terminal 303 in this case basically has the same configuration as that of the first embodiment (FIG. 26). However, in this case, the reproduction terminal 303 includes the fade editing partial segment replacing unit 451.

The fade editing partial segment replacing unit 451 performs processing of replacing the fade-in edited or fade-out edited segment with a segment of the original data.

<Flow of bandwidth selection process>
The reproduction process executed by the reproduction terminal 303 in this case is the same as that in the case of the first embodiment (FIG. 27), and thus the description thereof is omitted. An example of the flow of the bandwidth selection process executed in step S157 of FIG. 27 in this case will be described with reference to the flowchart of FIG.

When the bandwidth selection process is started, in step S251, the selection unit 354 determines whether to reselect the bandwidth of the audio stream. If it is determined to select again, the process proceeds to step S252.

In step S252, if there is no space in the buffer 355, the segment file acquisition unit 353 waits until the space is available. When a space is generated in the buffer 355, in step S253, the segment file acquisition unit 353 acquires an audio original segment of a predetermined time length of the currently selected bandwidth. In step S254, the segment file acquisition unit 353 acquires an audio fade-out partial segment of the currently selected bandwidth.

In step S255, the fade editing partial segment replacing unit 451 replaces the end of the acquired audio original segment with the MP4 sample of the audio fade out partial segment. That is, the fade-out edited part is replaced with the edited data from the original data.

In step S256, if there is no space in the buffer 355, the segment file acquisition unit 353 waits until the space is available. When a space is generated in the buffer 355, in step S257, the segment file acquisition unit 353 acquires an audio original segment of a predetermined time length of the bandwidth selected as the change destination.

In step S258, the segment file acquisition unit 353 acquires the audio fade-in partial segment of the bandwidth selected as the change destination.

In step S259, the fade editing partial segment replacing unit 451 replaces the head of the acquired change destination audio original segment with the MP4 sample of the audio fade in partial segment. That is, the faded-in edited part is replaced with the edited data from the original data.

When the process of step S259 ends, the bandwidth selection process ends, and the process returns to FIG. If it is determined in step S251 that the bandwidth of the audio stream is not selected again, the processing in steps S252 to S259 is omitted, the bandwidth selection processing ends, and the processing returns to FIG.

Note that by applying the present technology described in the present embodiment, duplication of original data and edited data can be suppressed, and therefore, an increase in the amount of data can be suppressed. In this case, since the post-edit data (segment size) is small (the playback time is short), the load required to replace the post-edit data is low. Further, as in the case of the first embodiment, the above-mentioned fade editing (fade-in editing or fade-out editing) is performed only when a request from the reproduction terminal 303 occurs, thereby making it possible to distribute content data. An increase in the amount of data can be further suppressed.

<3. Third embodiment>
<Overview>
For example, as shown in FIG. 35, the configuration of the data after editing may be such that the data after editing and the original data before editing may be combined into one data. That is, the content data after editing of a partial section of the edited segment may be filed as one file together with the content data before editing and added to the content data for distribution.

For example, a segment including a normal track, a fade-in track, and a fade-out track is created. In this method, three segments are created from three audio data including fade-out data and fade-in data in the method described in the first embodiment, but these are combined into one segment file. However, by devising the audio data reference structure of the MP4 file, the common part is not redundantly recorded in the segment. One piece of audio data is represented as one adaptation, and segments with different bit rates are each represented as a presentation.

For example, in FIG. 35, for bit rate # 1, data generated by performing fade-in editing on section 511A, which is a partial section at the beginning of original data 511-1-1, is fade-in partial data 512. -1-1. Also, data generated by performing fade-out editing on a section 511 C which is a partial section at the end of the original data 511-1-1 is fade-out partial data 513-1-1. That is, the fade-in partial data 512-1-1 is only a portion corresponding to the section 511A, and does not include the sections 511B and 511C in which the fade-in editing is not performed. Similarly, fade-out partial data 513-1 1 is only a portion corresponding to the section 511 C, and does not include the

sections

511 A and 511 B in which the fade-out editing is not performed. Also, these data (original data 511-1-1, fade-in partial data 512-1-1, fade-out partial data 513-1-1) are put together as one segment 502-1.

Similarly, fade-in partial data 512-1-2 is generated by performing fade-in editing on section 511A of original data 511-1-2, and fade-out editing is performed on section 511C. The generated data is fade-out partial data 513-1-2. Also, these data (original data 511-1-2, fade-in partial data 512-1-2, fade-out partial data 513-1-2) are put together as one segment 502-2.

Similarly, for bit rate # 2, data generated by performing fade-in editing on section 511A of original data 511-2-1 is fade-in partial data 512-2-1, and for section 511C. The data generated by performing fade-out editing is fade-out partial data 513-2-1. Also, these data (original data 511-2-1, fade-in partial data 512-2-1, fade-out partial data 513-2-1) are put together as one segment 503-1.

Similarly, fade-in partial data 512-2-2 is generated by performing fade-in editing on section 511A of original data 511-2-2, and fade-out editing is performed on section 511C. The data thus generated is fade-out partial data 513-2-2. Also, these data (original data 511-2-2, fade-in partial data 512-2-2, fade-out partial data 513-2-2) are put together as one segment 503-2.

That is, in the MPD, all these data are represented as one adaptation set 501. Then, each data of bit rate # 1 is represented as a representation 504-1, and each data of bit rate # 2 is represented as a representation 504-2. For bit rate # 1, the original data is represented as sub-representation 505-1-1, the fade-in partial data is represented as sub-representation 505-2-1, and the fade-out partial data is represented as sub-representation 505-3-1. It is represented as Also, for bit rate # 2, the original data is represented as sub-representation 505-1-2, the fade-in partial data is represented as sub-representation 505-2-2, and the fade-out partial data is represented as sub-representation 505-3. It is expressed as -2.

In the following, each original data of bit rate # 1 is referred to as original data 511-1 when it is not necessary to distinguish them from each other. Similarly, when it is not necessary to distinguish and explain each fade-in partial data of bit rate # 1, it is referred to as fade-in partial data 512-1. Similarly, when it is not necessary to distinguish and explain each fade-out partial data of bit rate # 1, it is referred to as fade-out partial data 513-1.

Similarly, when it is not necessary to distinguish and explain each original data of bit rate # 2, it is referred to as original data 511-2. Similarly, when the fade-in partial data of the bit rate # 2 need not be distinguished from one another, they are referred to as fade-in partial data 512-2. Similarly, when it is not necessary to distinguish and explain the fade-out partial data of bit rate # 2, they are referred to as fade-out partial data 513-2.

When it is not necessary to distinguish and explain each original data of each bit rate, it is referred to as original data 511. Similarly, when it is not necessary to distinguish and explain each fade-in partial data of each bit rate, it is referred to as fade-in partial data 512. Similarly, when it is not necessary to distinguish and explain each fade-out partial data of each bit rate, it is referred to as fade-out partial data 513.

Also, when it is not necessary to distinguish and explain the segments of bit rate # 1, they are referred to as segments 502. Similarly, when the segments of bit rate # 2 need not be distinguished from one another, they are referred to as segments 503. In addition, when it is not necessary to distinguish and explain the representations of each bit rate, they are referred to as a representation 504.

Further, when it is not necessary to distinguish and describe the sub-representations in which the information of the original data 511 of each bit rate is stored, it is referred to as a sub-representation 505-1. Similarly, when it is not necessary to distinguish and describe sub-representations in which information of fade-in partial data 512 of each bit rate is stored, it is referred to as sub-representation 505-2. Similarly, when it is not necessary to distinguish and describe sub-representations in which information of fade-out partial data 513 of each bit rate is stored, it is referred to as sub-representation 505-3. Note that when there is no need to distinguish and explain each sub-representation, it is referred to as a sub-representation 505.

In order not to redundantly record the common part, the MP4 sampled audio data is divided into five parts as shown in FIG. That is, fade-in partial data (fi) 512, fade-out partial data (fo) 513, data (org _a ) 511A corresponding to the fade-in edit portion of the original data 511, data not to be fade-edited (org _b ) 511 B, fade-out edit portion There are five corresponding data (org _c ) 511C. As described in the second embodiment, the portion to be fade-edited may be a plurality of MP4 samples. These five data are recorded in mdat which is a box for storing MP4 samples. By referring to three of the five data from the original, fade-out, and fade-in tracks, it is possible to suppress overlapping data recording by configuring each track (suppressing an increase in the amount of data) be able to).

That is, for example, when referring to the original track, data (org _a ) 511A, data (org _b ) 511B, and data (org _c ) 511C are referred to (arrow 531). Also, for example, when referring to a fade-in track, fade-in partial data (fi) 512, data (org _b ) 511B, and data (org _c ) 511C are referred to (arrow 532, arrow 534). Furthermore, for example, when referring to a fade-out track, data (org _a ) 511A and data (org _b ) 511B, and fade-out partial data (fo) 513 are referenced (arrow 535, arrow 536).

The information for managing the reproduction of the content data after editing and the information for managing the reproduction of the content data before editing may be generated as one adaptation set to generate the MPD. For example, the server may generate an MPD as shown in FIG. 36 in response to the editing described with reference to FIG. In FIG. 36, a portion surrounded by a dotted line 541 is a description regarding one adaptation set 501. Further, a portion surrounded by a dotted line 542 is a description related to the representation 504-1, and a portion surrounded by a dotted line 543 is a description related to the representation 504-2. The portion enclosed by a dotted line 544 is a description related to the sub-representation 505-1-1, the portion enclosed by a dotted line 545 is a description related to the sub-representation 505-2-1, and the dotted line 546 The enclosed portion is a description regarding the sub-representation 505-3-1.

That is, since original and fade-in / out audio data are recorded in one MP4 file, one adaptation set is set for one audio data. In addition, representations are prepared for the type of bandwidth. Here, two types of 11,289,600 bps and 5,644,800 bps are shown. As in the other embodiments, the number of types of bandwidth is arbitrary.

In addition, which of the original, fade-in, and fade-out tracks are referred to is expressed using subrepresentation. That is, "level" is associated with the track in the MP4 file. Further, in FIG. 37, it is shown that the Sub-representation of Level 2 is associated with the fade-out track, and the Sub-representation of Level 3 is associated with the fade-in track, using EssentialProperty. In addition, a level 1 subrepresentation without EssentialProperty is associated with the original track. The structure of such an MPD is shown in FIG.

The relation between the level of sub-representation and the MP4 file track is as shown in FIG. Also, reference structures of samples in the mdat from each track are as shown in FIG. 40 to FIG.

<File generation device>
FIG. 43 is a block diagram showing an example of the main configuration of the file generation apparatus 301 in this case. As shown in FIG. 43, also in this case, the file generation device 301 has basically the same configuration as that of the second embodiment (FIG. 31).

However, the file generation device 301 in this case has a segment file generation unit 601-1 instead of the segment file generation unit 313-1, the segment file generation unit 315-1, and the segment file generation unit 317-1. The fade-out editing unit 314-1, the removing unit 421-1, the fade-in editing unit 316-1, and the removing unit 422-1 are collectively referred to as an editing data generation unit 621-1. These configurations relate to bit rate # 1 (FIG. 35).

The removing unit 421-1 supplies the fade-out partial data 513-1 to the segment file generating unit 601-1. The removing unit 422-1 supplies the fade-in partial data 512-1 to the segment file generating unit 601-1.

The segment file generation unit 601-1 is supplied with the original data 511-1 supplied from the encoding unit 312-1, the fade-out partial data 51-1 supplied from the removal unit 421-1, and the supply unit 221-1 A fade-in partial data 512-1 is put together to generate a segment file.

Also, the file generation device 301 in this case has a segment file generation unit 601-2 instead of the segment file generation unit 313-2, the segment file generation unit 315-2, and the segment file generation unit 317-2. The fade-out editing unit 314-2, the removing unit 421-2, the fade-in editing unit 316-2, and the removing unit 422-2 are collectively referred to as an editing data generation unit 621-2. These configurations relate to bit rate # 2 (FIG. 35).

The removing unit 421-2 supplies the fade-out partial data 513-2 to the segment file generating unit 601-2. The removing unit 422-2 supplies the fade-in partial data 512-2 to the segment file generating unit 601-2.

The segment file generation unit 601-2 is supplied with the original data 511-2 supplied from the coding unit 312-2, the fade-out partial data 513-2 supplied from the removal unit 42-2, and the removal unit 422-2 A fade-in partial data 512-2 is put together to generate a segment file.

In the following, when it is not necessary to distinguish and explain each segment file generation unit, the segment file generation unit 601 is referred to as "segment file generation unit 601". Further, when it is not necessary to distinguish the respective edited data generation units from each other, they are referred to as an edited data generation unit 621.

As shown in FIG. 43, the configurations of the encoding unit 312, the fade-out editing unit 314, the fade-in editing unit 316, the removing unit 421, the removing unit 422, and the segment file generating unit 601 are the same as the number of prepared bit rates. Provided. The number of bit rates prepared as data for distribution is arbitrary. Also, in this case, each editing data generation unit 621 always operates (for all segments).

<Flow of audio file generation process>
Next, an example of the flow of audio file generation processing executed by the file generation device 301 will be described with reference to the flowchart in FIG. Also in this case, the audio file generation processing is basically performed in the same flow as in the case of the second embodiment (FIG. 32).

That is, each process of step S301 to step S306 of FIG. 44 is performed similarly to each process of step S201 to step S206 of FIG. However, in step S301, the MPD generation unit 318 describes all data as one adaptation set as described with reference to FIG.

In step S307, the segment file generation unit 601 generates a segment file by collecting the original data, the fade-out partial data, and the fade-in partial data for each bit rate.

The processes of steps S308 and S309 are performed in the same manner as the processes of steps S208 and S209 of FIG. If it is determined in step S309 that the audio file generation process is to end at the segment, the audio file generation process ends.

<Playback terminal>
FIG. 45 is a block diagram showing an example of the main configuration of the playback terminal 303 in this case. As shown in FIG. 45, the reproduction terminal 303 in this case basically has the same configuration as that of the first embodiment (FIG. 26). However, in this case, the reproduction terminal 303 includes the fade editing partial track switching unit 651.

The fade editing partial track switching unit 651 performs processing regarding switching of the track of the fade-in edited or fade-out edited part.

When the bandwidth selection process is started, in step S351, the selection unit 354 determines whether to reselect the bandwidth of the audio stream. If it is determined that the selection is made again, the process proceeds to step S352.

In step S352, if there is no space in the buffer 355, the segment file acquisition unit 353 waits for space. When the buffer 355 has a space, the segment file acquisition unit 353 acquires a segment of a predetermined time length of the bandwidth selected as the change destination in step S353. In step S354, the fade editing partial track switching unit 651 switches to the playback of the fade out track in the last segment of the currently selected bandwidth.

In step S355, the decoding unit 356 reproduces the fade-in track at the start of reproduction of the bandwidth selected as the change destination.

In step S356, when the fade-in portion of the bandwidth selected as the change destination is finished playing, the fade-editing partial track switching unit 651 switches to the playing of the original track.

When the process of step S356 ends, the bandwidth selection process ends, and the process returns to FIG. If it is determined in step S351 that the bandwidth of the audio stream is not selected again, the processing in steps S352 to S356 is omitted, the bandwidth selection processing ends, and the processing returns to FIG.

By doing this, it is possible to suppress the occurrence of noise without performing the fade editing process on the reproduction terminal 303. That is, the process of selecting the download content of the reproduction terminal can be simplified while maintaining the duplication avoidance of the data arranged on the server. The playback terminal always downloads data to the fade-out / in-track audio data, but the load is not high if the playback time is short and the size is small like fade editing. In addition, when the reproduction terminal 303 changes the bandwidth, it does not synchronize the reproduction of the connection boundary and the switching of the track, without using the track change, the format of the method described in the second embodiment is one MP4 file May be sent together.

<4. Fourth embodiment>
<Lossless Compression DSD>
Since DSDs are much larger in size than lossy compression codecs, lossless compression DSD codecs exist that perform lossless compression for network transmission, as shown in FIG. By using this, the size can be compressed to about 80% depending on the data content before compression. As shown in the upper part of FIG. 47, data before compression is divided into fixed lengths, and a plurality of divided fixed length data are input to the encoder. As shown in the middle part of FIG. 47, one fixed-length data before compression is compressed by the encoder and has a structure of DSD_lossless_block. At this time, a header (hereinafter referred to as GoB Header) necessary for the expansion process is added to the top of the Block group processed at the same time. The DSD_lossless_blocks output simultaneously with the GoB Header are collectively called DSD_lossless_gob (hereinafter GoB). As shown in the lower part of FIG. 47, information such as the number of blocks and sampling frequency is added before GoB, and a lossless compressed DSD is output as a row of DSD_lossless_payload. When recording in the MP4 segment, it is recorded in units of this DSD_lossless_payload. To expand a block belonging to GoB, the GoB header at the top of the GoB is required.

In such a dependent stream, as shown in FIG. 48, when replacing the audio data at the beginning or end of the segment as in the method described in the second embodiment or the third embodiment, as shown in FIG. The GoB Header and DSD_lossless_block generated after fade editing must be replaced with a pair.

<5. Fifth embodiment>
<Lossless Compression DSD>
In each of the methods described in the first to third embodiments, the silent byte, which is a problem specific to DSD faders, may be recorded in the MPD and transmitted to the reproduction terminal. The silence byte value is not 0 in DSD. Also, there is no exact silence byte, and there are multiple values that can be treated as silence. Those values are reproduced as almost silent by any DAC. This is because the DSD has a back-and-forth dependency of data by delta sigma modulation, and the unit of sampling data is not a byte but a bit. For example, as shown in FIG. 49, a byte value of 0x69 in hexadecimal number that can be treated as a silent byte becomes a value of 01101001 as a bit value, but the position where the value is cut out as a byte from multiple arranged values is If it changes, it will become the value of 0x96, 0xA5, etc.

Even if the sampling unit is a 1-bit DSD, it is handled in byte units at the time of file and transmission. This also affects the case of reproducing the faded data on the server on the reproduction terminal. For example, assuming that there is a processing system in which the playback terminal continues sending a byte string of 0x69 as silent DSD data to the DAC before starting playback, and starts playback with fade-in data downloaded from the server at playback time, If the beginning of the DSD data of the silent part of the server is 0x69, the boundary between the silent byte of the playback terminal and the first byte of the downloaded audio data is shown in FIG. 50 when playback of the data downloaded from the server is started. It becomes like the example shown in the lower part, and it becomes a server's intended data arrangement and silence is reproduced correctly.

However, if the beginning of the DSD data in the silent part of the server is 0xA5, the bit arrangement at the start of playback becomes 0x69, 0xA5, as in the example shown in the lower part of FIG. It will not be as good as the intended silence data bits. This may also cause unintended data discontinuity and cause noise.

Therefore, the DSD silence byte intended by the server is recorded in the MPD file and transmitted to the reproduction terminal. As shown in FIG. 52, the Representation element and the Subrepresentation element that are fade-edited in the MPD examples of the respective methods described above in the first to third embodiments are said to be fade-edited. The EssentialProperty element to show is added. It is possible to transmit to the playback terminal by recording a silent byte intended by the server in addition to the number of samples to be replaced with the value attribute. By acquiring the silence byte intended by the server from the MPD file before starting playback, it becomes possible for the playback terminal to continue sending the silence byte intended by the server to the DAC before starting playback, and when the playback starts fade-in The possibility of noise generation can be avoided.

By applying the present technology as described above, it is possible to suppress an increase in the fading process / mounting load applied to the playback terminal. In particular, when the DSD is included in the delivery data, partial editing is difficult and the load increases, so the effect obtained by applying the present technology is greater. In addition, since the load of the fade-out / in process is high, the increase of the load can be further suppressed by collectively executing on the server. Also, since the data size is large, the effect of data reduction on the server is higher. In addition, effects such as resolution change of moving picture, fade in / out at the time of angle change, wipe, and effects such as audio cross fade can be also dealt with in principle. Although editing on the server side is increased by the number of combinations before and after content selection, there is no significant influence when the content is small for the entire content or when the processing capacity of the server side is sufficient.

<6. Other>
<Example of application>
In the above, DSD (DSD data and DSD lossless stream) has been described as an example of the format of audio data, but the present technology can also be applied to audio data of any format other than DSD. For example, it may be AAC or PCM. Moreover, although MP4 was demonstrated as an example as a file format of audio data above, this technique is applicable also to arbitrary file formats other than MP4. Also, although the above describes that the MP4 file is distributed using MPEG-DASH, the present technology can also be applied to data distribution of any standard other than MPEG-DASH.

Furthermore, although audio data has been described above as an example of target data of the present technology, the present technology can be applied to any data other than audio data. For example, the present technology can also be applied to video data.

Also, in the above, fade-in editing and fade-out editing have been described as examples of editing, but the present technology can be applied to any other editing than these. For example, when editing audio data, instead of fade-in editing or fade-out editing, switching is made to the content data before switching while gradually increasing the ratio of the content data after switching to the content data before switching Cross fade editing may be performed to mix with later content data. Also, when editing video data, instead of fade-out editing, editing may be performed such that the brightness of the image is gradually reduced. Also, instead of fade-in editing, editing may be performed such that the brightness of the image is gradually increased. Also, instead of fade-in editing or fade-out editing, cross-fade editing of an image may be performed.

Further, in the above description, it is described that a partial section of a predetermined length starting from the beginning of the segment or a partial section of a predetermined length ending at the end of the segment are to be edited. However, editing may be performed on any section in the segment. For example, editing may be performed on both a partial section of a predetermined length starting from the beginning of the segment and a partial section of a predetermined length ending at the end of the segment. Good. By editing such a section, it is possible to generate editing data suitable for reproduction that is frequently switched (for example, for each segment), such as digest reproduction, zapping reproduction, slide show, etc.

Further, for example, the editing may be performed on a part of the section not including both ends of the segment, or the entire segment may be edited. That is, editing data may be used for purposes other than switching of content data.

Further, although the above-mentioned editing is described as being performed in the file generation device 301, the present invention is not limited to this. For example, the editing may be performed in the distribution server 302 or is performed in any other device. You may do so. In such a case, the device that performs the editing may have the same editing function as the file generation device 301 described above in each embodiment.

In the above, although it has been described that the editing as described above is performed before distribution of content data, this editing is optional as long as the content data in segment units to be edited is transmitted. It can be done at the timing of For example, the editing of the segment may be performed at the timing before the segment to be edited is distributed during content data distribution. Also, for example, after the distribution server 302 receives a content data switching request from the reproduction terminal 303, the editing of the segment to be edited may be performed.

<Application field of this technology>
The system, apparatus, processing unit, etc. to which the present technology is applied can be used in any field such as traffic, medical care, crime prevention, agriculture, animal husbandry, mining, beauty, factory, home appliance, weather, nature monitoring, etc. .

For example, the present technology can also be applied to systems and devices that transmit images provided for viewing. Also, for example, the present technology can be applied to systems and devices provided for traffic. Furthermore, for example, the present technology can be applied to systems and devices provided for security. Also, for example, the present technology can be applied to systems and devices provided for sports. Furthermore, for example, the present technology can be applied to systems and devices provided for agriculture. Also, for example, the present technology can be applied to systems and devices provided for livestock industry. Furthermore, the present technology can also be applied to systems and devices that monitor natural conditions such as, for example, volcanoes, forests, and oceans. In addition, the present technology can be applied to, for example, a meteorological observation system or a meteorological observation apparatus that observes weather, air temperature, humidity, wind speed, daylight hours, and the like. Furthermore, the present technology can be applied to, for example, a system or device for observing the ecology of wildlife such as birds, fish, reptiles, amphibians, mammals, insects, plants and the like.

<Computer>
The series of processes described above can be performed by hardware or software. When the series of processes are performed by software, a program that configures the software is installed on a computer. Here, the computer includes, for example, a general-purpose personal computer that can execute various functions by installing a computer incorporated in dedicated hardware and various programs.

FIG. 53 is a block diagram showing an example of a hardware configuration of a computer that executes the series of processes described above according to a program.

In a computer 1000 shown in FIG. 53, a central processing unit (CPU) 1001, a read only memory (ROM) 1002, and a random access memory (RAM) 1003 are mutually connected via a bus 1004.

An input / output interface 1010 is also connected to the bus 1004. An input unit 1011, an output unit 1012, a storage unit 1013, a communication unit 1014, and a drive 1015 are connected to the input / output interface 1010.

The input unit 1011 includes, for example, a keyboard, a mouse, a microphone, a touch panel, an input terminal, and the like. The output unit 1012 includes, for example, a display, a speaker, and an output terminal. The storage unit 1013 includes, for example, a hard disk, a RAM disk, and a non-volatile memory. The communication unit 1014 includes, for example, a network interface. The drive 1015 drives removable media 1021 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

In the computer 1000 configured as described above, for example, the CPU 1001 loads the program stored in the storage unit 1013 into the RAM 1003 via the input / output interface 1010 and the bus 1004, and executes the program. A series of processing is performed. The RAM 1003 also stores data necessary for the CPU 1001 to execute various processes.

The program executed by the computer 1000 can be recorded and applied to, for example, a removable medium 1021 as a package medium or the like. In that case, the program can be installed in the storage unit 1013 via the input / output interface 1010 by attaching the removable media 1021 to the drive 1015.

The program can also be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting. In that case, the program can be received by the communication unit 1014 and installed in the storage unit 1013.

In addition, this program can be installed in advance in the ROM 1002, the storage unit 1013 or the like.

<Others>
Note that various pieces of information regarding encoded data (bit stream) may be multiplexed and transmitted or recorded in encoded data, or may be encoded with encoded data without being multiplexed in encoded data. It may be transmitted or recorded as associated separate data. Here, the term "associate" means, for example, that one data can be processed (linked) in processing the other data. That is, the data associated with each other may be collected as one data or may be individual data. For example, the information associated with the coded data (image) may be transmitted on a transmission path different from that of the coded data (image). Also, for example, information associated with encoded data (image) may be recorded on a recording medium (or another recording area of the same recording medium) different from the encoded data (image). Good. Note that this “association” may not be the entire data but a part of the data. For example, an image and information corresponding to the image may be associated with each other in an arbitrary unit such as a plurality of frames, one frame, or a part in a frame.

Also, as described above, in the present specification, “combine”, “multiplex”, “add”, “unify”, “include”, “store”, “insert”, “insert” The terms “insert” and the like mean to combine a plurality of objects into one, for example, to combine encoded data and metadata into one data, and one of the above-mentioned “associate” methods. means.

Further, the embodiments of the present technology are not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present technology.

For example, in the present specification, a system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether all the components are in the same housing or not. Therefore, a plurality of devices housed in separate housings and connected via a network, and one device housing a plurality of modules in one housing are all systems. .

Further, for example, the configuration described as one device (or processing unit) may be divided and configured as a plurality of devices (or processing units). Conversely, the configuration described as a plurality of devices (or processing units) in the above may be collectively configured as one device (or processing unit). Further, it goes without saying that configurations other than those described above may be added to the configuration of each device (or each processing unit). Furthermore, part of the configuration of one device (or processing unit) may be included in the configuration of another device (or other processing unit) if the configuration or operation of the entire system is substantially the same. .

Also, for example, the present technology can have a cloud computing configuration in which one function is shared and processed by a plurality of devices via a network.

Also, for example, the program described above can be executed on any device. In that case, the device may have necessary functions (functional blocks and the like) so that necessary information can be obtained.

Further, for example, each step described in the above-described flowchart can be executed by one device or in a shared manner by a plurality of devices. Furthermore, in the case where a plurality of processes are included in one step, the plurality of processes included in one step can be executed by being shared by a plurality of devices in addition to being executed by one device.

In the program executed by the computer, the process of the step of writing the program may be executed in chronological order according to the order described in the present specification, or the call is performed in parallel or It may be individually executed at necessary timing such as time. That is, as long as no contradiction arises, the processing of each step may be performed in an order different from the order described above. Furthermore, the process of the step of writing this program may be executed in parallel with the process of another program, or may be executed in combination with the process of another program.

In addition, as long as there is no contradiction, the present technology described in the plural in the present specification can be independently implemented alone. Of course, any number of the present techniques may be used in combination. For example, the present technology described in any of the embodiments can be implemented in combination with the present technology described in the other embodiments. Also, any of the above described techniques may be practiced in conjunction with other techniques not described above.

Note that the present technology can also have the following configurations.
(1) A generation unit that edits the content data to generate edited content data and adds the generated edited content data to the distributed content data before distributing the distributed content data An information processing apparatus comprising:
(2) The information processing apparatus according to (1), wherein the generation unit edits the content data in units of segments to generate the edited content data.
(3) The information processing apparatus according to (2), wherein the generation unit edits a partial section of the segment unit content data.
(4) The generation unit is a partial section of a predetermined length from the start of the segment unit content data, or a partial section of a predetermined length until the end of the segment unit content data, or The information processing apparatus according to (3), in which both are edited.
(5) The generation unit converts the edited content data of the edited segment into a file separate from the content data before editing, and adds the file to the content data for distribution (3) or The information processing apparatus according to 4).
(6) The generation unit converts the edited content data of the partial section of the edited segment into a file different from the content data before editing, and adds the file to the distribution content data The information processing apparatus according to any one of (3) to (5).
(7) The generation unit converts the edited content data of the partial section of the edited segment into a file with the content data before editing as one file, and adds the file to the distribution content data The information processing apparatus according to any one of (3) to (6).
(8) The information processing apparatus according to any one of (3) to (7), wherein the content data is audio data.
(9) The generation unit fades in and edits a partial section of a predetermined length from the start of the segment unit audio data, or one of the predetermined lengths from the start of the segment unit audio data The information processing apparatus according to (8), wherein the section of the section is faded out and edited.
(10) Before distributing content data for distribution, the content data is edited to generate edited content data, and the generated edited content data is added to the distributed content data Method.
(11) Before distributing content data for distribution, management information including information for managing reproduction of the content data and information for managing reproduction of the content data after editing the content data is generated An information processing apparatus comprising a management information generation unit.
(12) The management information is MPD (Media Presentation Description),
The management information generation unit is configured to generate the MPD as information for managing reproduction of content data after editing as an adaptation set (Adaptation Set) different from information for managing reproduction of content data before editing. The information processing apparatus according to (11).
(13) The management information is MPD (Media Presentation Description),
The management information generation unit separates the information for managing the reproduction of the content data after the editing, which includes the information indicating the length of the content data after the editing, from the information for managing the reproduction of the content data before the editing. The information processing apparatus according to (11) or (12), configured to generate the MPD as an adaptation set.
(14) The management information is MPD (Media Presentation Description),
The management information generation unit is configured to generate the MPD with information for managing reproduction of content data after editing and information for managing reproduction of content data before editing as one adaptation set (Adaptation Set). An information processing apparatus according to any one of (11) to (13).
(15) Before distributing content data for distribution, management information including information for managing reproduction of the content data and information for managing reproduction of the content data after editing the content data is generated Information processing method.
(16) an acquisition unit for acquiring content data according to a predetermined parameter;
When the content data to be acquired is switched according to a change in the parameter by controlling the acquisition unit, the content data after editing in which predetermined editing has been performed on the content data before switching and the content data after switching are predetermined An information processing apparatus comprising: a control unit that acquires the edited content data that has been edited.
(17) The acquisition unit acquires the content data in units of segments.
When switching the content data to be acquired according to the change of the parameter, the control unit performs the segment-based edited content data in which predetermined editing is performed on the segment-based content data before switching, and the switched content data after switching The information processing apparatus according to (16), wherein segment-based content data after segment editing is obtained for the segment-based content data.
(18) When switching the content data to be acquired according to the change of the parameter, the control unit performs a predetermined editing on a partial section of a predetermined length until the end of the segment unit content data before switching. Content data after editing in segment units performed, and content data after editing in segment units in which predetermined editing has been performed on a partial section of a predetermined length from the start of content data in segment units after switching The information processing apparatus according to (17).
(19) The content data is audio data,
When switching the content data to be acquired according to a change in the parameter, the control unit is a segment for which fade-out editing has been performed in a partial section of a predetermined length until the end of the segment-based content data before switching. In order to obtain unit-edited content data and segment-unit edited content data in which fade-in editing has been performed in a partial section of a predetermined length from the start of segment-unit content data after switching The information processing apparatus according to (18).
(20) Acquire content data according to predetermined parameters,
When switching the content data to be acquired according to the change of the parameter, the content data after editing in which predetermined editing has been performed on the content data before switching, and editing in which predetermined editing is performed on the content data after switching An information processing method for acquiring content data afterward.

300 distribution system, 301 file generator, 302 distribution server, 303 reproduction terminal, 304 network, 311 acquisition unit, 312 encoding unit, 313 segment file generation unit, 314 fadeout editing unit, 315 segment file generation unit, 316 fade in editing Section, 317 segment file generation unit, 318 MPD generation unit, 319 upload unit, 321 edit data generation unit, 351 MPD acquisition unit, 352 MPD processing unit, 353 segment file acquisition unit, 354 selection unit, 355 buffer, 356 decoding unit, 357 output control unit, 421, 422 removal unit, 431 edit data generation unit, 451 fade edit partial segment replacement unit

Claims

Information including: a generation unit which edits the content data to generate edited content data and adds the generated edited content data to the distributed content data before distributing the content data for distribution; Processing unit.
The information processing apparatus according to claim 1, wherein the generation unit edits the content data in units of segments to generate the edited content data.
The information processing apparatus according to claim 2, wherein the generation unit edits a partial section of the segment-based content data.
The generation unit may select a partial interval of a predetermined length from the start of the segment unit content data or a partial interval of a predetermined length from the segment unit content data to the end, or both of them. The information processing apparatus according to claim 3 to edit.
The information processing according to claim 3, wherein the generation unit converts the edited content data of the edited segment into a file separate from the content data before editing, and adds the file to the distribution content data. apparatus.
The generation unit converts the edited content data of the partial section of the edited segment into a file separate from the content data before editing, and adds the file to the distribution content data. The information processing apparatus according to 3.
The generation unit converts the edited content data of the partial section of the edited segment into a file with the content data before editing as one file, and adds the file to the distribution content data. The information processing apparatus according to claim 1.
The information processing apparatus according to claim 3, wherein the content data is audio data.
The generation unit fades in and edits a partial section of a predetermined length from the start of the audio data of the segment unit, or a partial section of a predetermined length from the end of the audio data of the segment unit The information processing apparatus according to claim 8, wherein the information is faded out and / or edited.
An information processing method, comprising editing the content data to generate edited content data and adding the generated edited content data to the distributed content data before distributing the distributed content data.
Management information generation for generating management information including information for managing reproduction of the content data and information for managing reproduction of the content data after editing the content data before distributing the content data for distribution An information processing apparatus comprising a unit.
The management information is an MPD (Media Presentation Description),
The management information generation unit is configured to generate the MPD as information for managing reproduction of content data after editing as an adaptation set (Adaptation Set) different from information for managing reproduction of content data before editing. The information processing apparatus according to claim 11.
The management information is an MPD (Media Presentation Description),
The management information generation unit separates the information for managing the reproduction of the content data after the editing, which includes the information indicating the length of the content data after the editing, from the information for managing the reproduction of the content data before the editing. The information processing apparatus according to claim 11, configured to generate the MPD as an adaptation set.
The management information is an MPD (Media Presentation Description),
The management information generation unit is configured to generate the MPD with information for managing reproduction of content data after editing and information for managing reproduction of content data before editing as one adaptation set (Adaptation Set). The information processing apparatus according to claim 11.
A method of generating management information including information for managing reproduction of the content data and information for managing reproduction of the content data after editing the content data before distributing the content data for distribution .
An acquisition unit for acquiring content data according to a predetermined parameter;
When the content data to be acquired is switched according to a change in the parameter by controlling the acquisition unit, the content data after editing in which predetermined editing has been performed on the content data before switching and the content data after switching are predetermined An information processing apparatus comprising: a control unit that acquires the edited content data that has been edited.
The acquisition unit acquires the content data in units of segments.
When switching the content data to be acquired according to the change of the parameter, the control unit performs the segment-based edited content data in which predetermined editing is performed on the segment-based content data before switching, and the switched content data after switching The information processing apparatus according to claim 16, wherein the segment-based content data is acquired after the segment-based edited content data has been subjected to predetermined editing.
When the control unit switches the content data to be acquired according to the change of the parameter, the predetermined editing is performed on a partial section of a predetermined length until the end of the segment unit content data before switching. The segment-based edited content data and the segment-based edited content data in which a predetermined editing has been performed in a partial section of a predetermined length from the start of the segment-based content data after switching are acquired The information processing apparatus according to claim 17.
The content data is audio data,
When switching the content data to be acquired according to a change in the parameter, the control unit is a segment for which fade-out editing has been performed in a partial section of a predetermined length until the end of the segment-based content data before switching. In order to obtain unit-edited content data and segment-unit edited content data in which fade-in editing has been performed in a partial section of a predetermined length from the start of segment-unit content data after switching The information processing apparatus according to claim 18 configured as.
Acquire content data according to a predetermined parameter,
When switching the content data to be acquired according to the change of the parameter, the content data after editing in which predetermined editing has been performed on the content data before switching, and editing in which predetermined editing is performed on the content data after switching An information processing method for acquiring content data afterward.