Ogg

From ActiveArchives

Jump to: navigation, search

Construction.gif This page currently contains "working notes".

Ogg is a free media "container" format, designed from the ground up to contain free media formats with the purpose of streamable network-based playback. (Interestingly, the format apparently also supports containing non-free formats as well, such as mpeg4.) An Ogg file can contain multiple "streams" of data, typically sound and video. Text streams such as subtitles may also be included.

According to the Wikipedia page, handling of metadata is still under discussion. They mention the use of a "skeleton", and also metadata stored in the codec, such as Vorbis comments (so the comment in effect becomes part of the actual audio data).

Streams are organized in "pages" of data, that would then typically be interspersed in a stream (for instance, audio and video pages might be interspersed so that a streaming application could sync and play the various pages as they are received). Each page contains a checksum so that they can be validated by the player, and discarded if corrupted in transit.

http://en.wikipedia.org/wiki/File:Ogg_page_header_structure_(en).svg

A serial number and page number in the page header identifies each page as part of a series of pages making up a bitstream. Multiple bitstreams may be multiplexed in the file where pages from each bitstream are ordered by the seek time of the contained data. Bitstreams may also be appended to existing files, a process known as chaining, to cause the bitstreams to be decoded in sequence.

The granule position is the field of a page that stores a time value, though the exact format and meaning of this field is codec dependent, it typically contains an ascending sample or frame number. (Need to look at format of theora / vorbis granule positions).

Contents

Chaining and Grouping

Very interesting from the point of view of producing dynamically edited streams are the rules for mixing different bitstreams into a single ogg file via "chaining and grouping". However, each codec may (and probably does?!) have strict requirements as to how the mixing can occur. It would seem that basic splicing together of media end-to-end via "chaining" should be pretty generally possible, according to the spec. (still not understanding if there is any sense of a "global" time / position).

http://xiph.org/ogg/doc/oggstream.html

Multiplexing

The Ogg bitstream is intended to encapsulate chronological, time-linear mixed media into a single delivery stream or file.

http://xiph.org/ogg/doc/ogg-multiplex.html

Streams can be either continuous or discontinuous. A given codec may support both styles -- but it needs to be determined (and fixed) within a given bitstream once its initial header is read by the codec. This affects buffering and how the granule time position is interpreted. For continuous streams (like audio / video), the time refers to an end-time, or the time after the given page's data, and for discontinuous (such as text subtitles), the time is a start-time.

Theora's timestamp encodes a keyframe count + frames since the keyframe so that its possible to calculate the absolute time of the preceding keyframe from a given page. This is to support proper display of frames when searching.

In this way, however, seeking to keyframes requires interaction with codecs as a design decision seems to have been to keep the "framing" container structure as abstract as possible in the hope of keeping it open to future codec designs.

Skeletons

http://www.xiph.org/ogg/doc/skeleton.html

The Ogg Skeleton is a special type of stream to include metadata about the various bitstreams in a file, and seems to address some of the limitations of "raw" ogg file with audio and video streams. In particular, interesting values are a mime type, and the possibility to insert a UTC time to represent the real world date/time of the streams "start".

In addition, and very interestingly with respect to on-the-fly dynamic editing: the skeleton data can be used to support extracting substreams. The idea is that a portion of a stream can be extracted, and it's original time data left intact. The skeleton track then provides two additional parameters: the presentation time and the basegranule to allow the extracted pages to be properly mixed into a new stream at the right moment. The presentation time seems to be the time, relative to the stream, where data should cut-in. This is to account for the possibility of data in the stream from before the cut to allow proper playback exactly at the cut-in time. The base granule is the "granule number with which this logical bitstream starts in the remuxed stream [and] provides for each logical bitstream the accurate start time of its data stream".

And indeed, it would seem the "bug" of oggz-chop keeping a stream's original time, is a feature (though does it work without a skeleton track).

Resources

Programming Ogg

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox