матрёшка / matroska

English Deutsch Français Español Português Русский
Matroska Logo
Matroska Logo

Matroska Specifications

Status of this document

This document is not the real format specification. It's a simple draft to work. (For a simplified diagram of the layout of a Matroska file, see the Diagram page.) But since it's quite complete it will be used as a reference for the beta development of libmatroska. An alternate version of the specification can be found here (PDF doc maintained by Alexander Noé). You can follow the changelog of this document from our SVN repository.

There will be several phases for this development:

We are currently in Round 3.

EBML principle

EBML is short for Extensible Binary Meta Language. EBML specifies a binary and octet (byte) aligned format inspired by the principle of XML. EBML itself is a generalized description of the technique of binary markup. Like XML, it is completely agnostic to any data that it might contain. Therein, the Matroska project is a specific implementation using the rules of EBML: It seeks to define a subset of the EBML language in the context of audio and video data (though it obviously isn't limited to this purpose). The format is made of 2 parts: the semantic and the syntax. The semantic specifies a number of IDs and their basic type and is not included in the data file/stream. There is a specific project dealing with EBML in more details and more recent updates.

Just like XML, the specific "tags" (IDs in EBML parlance) used in an EBML implementation are arbitrary. However, the semantic of EBML outlines general data types and ID's.

The known basic types are:

  • Signed Integer - Big-endian, any size from 1 to 8 octets
  • Unsigned Integer - Big-endian, any size from 1 to 8 octets
  • Float - Big-endian, defined for 4 and 8 octets (32, 64 bits)
  • String - Printable ASCII (0x20 to 0x7E), zero-padded when needed
  • UTF-8 - Unicode string, zero padded when needed (RFC 2279)
  • Date - signed 8 octets integer in nanoseconds with 0 indicating the precise beginning of the millennium (at 2001-01-01T00:00:00,000000000 UTC)
  • master-element - contains other EBML sub-elements of the next lower level
  • Binary - not interpreted by the parser

As well as defining standard data types, EBML uses a system of Elements to make up an EBML "document." Elements incorporate an Element ID, a descriptor for the size of the element, and the binary data itself. Futher, Elements can be nested, or contain, Elements of a lower "level."

Element IDs are outlined as follows, beginning with the ID itself, followed by the Data Size, and then the non-interpreted Binary itself:

  • Element ID coded with an UTF-8 like system :
    bits, big-endian
    1xxx xxxx                                  - Class A IDs (2^7 -1 possible values) (base 0x8X)
    01xx xxxx  xxxx xxxx                       - Class B IDs (2^14-1 possible values) (base 0x4X 0xXX)
    001x xxxx  xxxx xxxx  xxxx xxxx            - Class C IDs (2^21-1 possible values) (base 0x2X 0xXX 0xXX)
    0001 xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx - Class D IDs (2^28-1 possible values) (base 0x1X 0xXX 0xXX 0xXX)
    
    Some Notes:
    • The leading bits of the Class IDs are used to identify the length of the ID. The number of leading 0's + 1 is the length of the ID in octets. We will refer to the leading bits as the Length Descriptor.
    • Any ID where all x's are composed entirely of 1's is a Reserved ID, thus the -1 in the definitions above.
    • The Reserved IDs (all x set to 1) are the only IDs that may change the Length Descriptor.

  • Data size, in octets, is also coded with an UTF-8 like system :
    bits, big-endian
    1xxx xxxx                                                                              - value 0 to  2^7-2
    01xx xxxx  xxxx xxxx                                                                   - value 0 to 2^14-2
    001x xxxx  xxxx xxxx  xxxx xxxx                                                        - value 0 to 2^21-2
    0001 xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx                                             - value 0 to 2^28-2
    0000 1xxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx                                  - value 0 to 2^35-2
    0000 01xx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx                       - value 0 to 2^42-2
    0000 001x  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx            - value 0 to 2^49-2
    0000 0001  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx - value 0 to 2^56-2
    

    Since modern computers do not easily deal with data coded in sizes greater than 64 bits, any larger Element Sizes are left undefined at the moment. Currently, the Element Size coding allows for an Element to grow to 72000 To, i.e. 7x10^16 octets or 72000 terabytes, which will be sufficient for the time being.

    There is only one reserved word for Element Size encoding, which is an Element Size encoded to all 1's. Such a coding indicates that the size of the Element is unknown, which is a special case that we believe will be useful for live streaming purposes. However, avoid using this reserved word unnecessarily, because it makes parsing slower and more difficult to implement.

  • Data
    • Integers are stored in their standard big-endian form (no UTF-like encoding), only the size may differ from their usual form (24 or 40 bits for example).
    • The Signed Integer is just the big-endian representation trimmed from some 0x00 and 0xFF where they are not meaningful (sign). For example -2 can be coded as 0xFFFFFFFFFFFFFE or 0xFFFE or 0xFE and 5 can be coded 0x000000000005 or 0x0005 or 0x05.

Elements semantic

Element Name Level Class-ID Mand. Multi. Range Default Element Type Description
EBML Basics
EBML 0 [1A][45][DF][A3] * * - - sub-elements Set the EBML characteristics of the data to follow. Each EBML document has to start with this.
EBMLVersion 1 [42][86] * - - 1 u-integer The version of EBML parser used to create the file.
EBMLReadVersion 1 [42][F7] * - - 1 u-integer The minimum EBML version a parser has to support to read this file.
EBMLMaxIDLength 1 [42][F2] * - - 4 u-integer The maximum length of the IDs you'll find in this file (4 or less in Matroska).
EBMLMaxSizeLength 1 [42][F3] * - - 8 u-integer The maximum length of the sizes you'll find in this file (8 or less in Matroska). This does not override the element size indicated at the beginning of an element. Elements that have an indicated size which is larger than what is allowed by EBMLMaxSizeLength shall be considered invalid.
DocType 1 [42][82] * - - matroska string A string that describes the type of document that follows this EBML header ('matroska' in our case).
DocTypeVersion 1 [42][87] * - - 1 u-integer The version of DocType interpreter used to create the file.
DocTypeReadVersion 1 [42][85] * - - 1 u-integer The minimum DocType version an interpreter has to support to read this file.
Global elements (used everywhere in the format)
CRC-32 1+ [BF] - - - - binary The CRC is computed on all the data from the last CRC element (or start of the upper level element), up to the CRC element, including other previous CRC elements. All level 1 elements should include a CRC-32.
Void 1+ [EC] - - - - binary Used to void damaged data, to avoid unexpected behaviors when using damaged data. The content is discarded. Also used to reserve space in a sub-element for later use.
signature
SignatureSlot 1+ [1B][53][86][67] - * - - sub-elements Contain signature of some (coming) elements in the stream.
SignatureAlgo 2+ [7E][8A] - - - - u-integer Signature algorithm used (1=RSA, 2=elliptic).
SignatureHash 2+ [7E][9A] - - - - u-integer Hash algorithm used (1=SHA1-160, 2=MD5).
SignaturePublicKey 2+ [7E][A5] - - - - binary The public key to use with the algorithm (in the case of a PKI-based signature).
Signature 2+ [7E][B5] - - - - binary The signature of the data (until a new.
SignatureElements 2+ [7E][5B] - - - - sub-elements Contains elements that will be used to compute the signature.
SignatureElementList 3+ [7E][7B] - * - - sub-elements A list consists of a number of consecutive elements that represent one case where data is used in signature. Ex: Cluster|Block|BlockAdditional means that the BlockAdditional of all Blocks in all Clusters is used for encryption.
SignedElement 4+ [65][32] - * - - binary An element ID whose data will be used to compute the signature.
end of signature
Segment
Segment 0 [18][53][80][67] * * - - sub-elements This element contains all other top-level (level 1) elements. Typically a Matroska file is composed of 1 segment.
Meta Seek Information
SeekHead 1 [11][4D][9B][74] - * - - sub-elements Contains the position of other level 1 elements.
Seek 2 [4D][BB] * * - - sub-elements Contains a single seek entry to an EBML element.
SeekID 3 [53][AB] * - - - binary The binary ID corresponding to the element name.
SeekPosition 3 [53][AC] * - - - u-integer The position of the element in the segment in octets (0 = first level 1 element).
Segment Information
Info 1 [15][49][A9][66] * * - - sub-elements Contains miscellaneous general information and statistics on the file.
SegmentUID 2 [73][A4] - - >0 - binary A randomly generated unique ID to identify the current segment between many others (128 bits).
SegmentFilename 2 [73][84] - - - - UTF-8 A filename corresponding to this segment.
PrevUID 2 [3C][B9][23] - - - - binary A unique ID to identify the previous chained segment (128 bits).
PrevFilename 2 [3C][83][AB] - - - - UTF-8 An escaped filename corresponding to the previous segment.
NextUID 2 [3E][B9][23] - - - - binary A unique ID to identify the next chained segment (128 bits).
NextFilename 2 [3E][83][BB] - - - - UTF-8 An escaped filename corresponding to the next segment.
TimecodeScale 2 [2A][D7][B1] * - - 1.000.000 u-integer Timecode scale in nanoseconds (1.000.000 means all timecodes in the segment are expressed in milliseconds).
Duration 2 [44][89] - - >0 - float Duration of the segment (based on TimecodeScale).
DateUTC 2 [44][61] - -- -date Date of the origin of timecode (value 0), i.e. production date.
Title 2 [7B][A9] - - - - UTF-8 General name of the segment.
MuxingApp 2 [4D][80] * - - - UTF-8 Muxing application or library ("libmatroska-0.4.3").
WritingApp 2 [57][41] * - - - UTF-8 Writing application ("mkvmerge-0.3.3").
Cluster
Cluster 1 [1F][43][B6][75] - * - - sub-elements The lower level element containing the (monolithic) Block structure.
Timecode 2 [E7] * - - - u-integer Absolute timecode of the cluster (based on TimecodeScale).
Position 2 [A7] - - - - u-integer Position of the Cluster in the segment (0 in live broadcast streams). It might help to resynchronise offset on damaged streams.
PrevSize 2 [AB] - - - - u-integer Size of the previous Cluster, in octets. Can be useful for backward playing.
BlockGroup 2 [A0] * * - - sub-elements Basic container of information containing a single Block or BlockVirtual, and information specific to that Block/VirtualBlock.
Block 3 [A1] * - - - binary (see Block Structure) Block containing the actual data to be rendered and a timecode relative to the Cluster Timecode.
BlockVirtual 3 [A2] - * - - binary (see Block Virtual) A Block with no data. It must be stored in the stream at the place the real Block should be in display order.
BlockAdditions3[75][A1]----sub-elementsContain additional blocks to complete the main one. An EBML parser that has no knowledge of the Block structure could still see and use/skip these data.
BlockMore4[A6]**--sub-elementsContain the BlockAdditional and some parameters.
BlockAddID5[EE]*-->0u-integerAn ID to identify the BlockAdditional (0 is the main Block).
BlockAdditional5[A5]*---(see Block)Same structure as a Block interpreted by the codec as it wishes (using the ID).
BlockDuration 3 [9B] - - - TrackDuration u-integer The duration of the Block (based on TimecodeScale). This element is mandatory when DefaultDuration is set for the track. When not written and with no DefaultDuration, the value is assumed to be the difference between the timecode of this Block and the timecode of the next Block in "display" order (not coding order). This element can be useful at the end of a Track (as there is not other Block available), or when there is a break in a track like for subtitle tracks.
ReferencePriority 3 [FA] * - - 0 u-integer This frame is referenced and has the specified cache priority. In cache only a frame of the same or higher priority can replace this frame. A value of 0 means the frame is not referenced.
ReferenceBlock 3 [FB] - * - - s-integer Timecode of another frame used as a reference (ie: B or P frame). The timecode is relative to the block it's attached to.
ReferenceVirtual 3 [FD] - - - - s-integer Relative position of the data that should be in position of the virtual block.
CodecState 3 [A4] - - - - binary The new codec state to use. Data interpretation is private to the codec. This information should always be referenced by a seek entry.
Slices3[8E]-*--sub-elementsContains slices description.
TimeSlice4[E8]-*--sub-elementsContains extra time information about the data contained in the Block. While there are a few files in the wild with this element, it is no longer in use and has been deprecated. Being able to interpret this element is not required for playback.
LaceNumber5[CC]---0u-integerThe reverse number of the frame in the lace (0 is the last frame, 1 is the next to last, etc). While there are a few files in the wild with this element, it is no longer in use and has been deprecated. Being able to interpret this element is not required for playback.
FrameNumber5[CD]---0u-integerThe number of the frame to generate from this lace with this delay (allow you to generate many frames from the same Block/Frame).
BlockAdditionID5[CB]---0u-integerThe ID of the BlockAdditional element (0 is the main Block).
Delay5[CE]---0u-integerThe (scaled) delay to apply to the element.
Duration5[CF]---0u-integerThe (scaled) duration to apply to the element.
Track
Tracks 1 [16][54][AE][6B] - * - - sub-elements A top-level block of information with many tracks described.
TrackEntry 2 [AE] * * - - sub-elements Describes a track with all elements.
TrackNumber 3 [D7] * - >0 - u-integer The track number as used in the Block Header (using more than 127 tracks is not encouraged, though the design allows an unlimited number).
TrackUID 3 [73][C5] * - >0 - u-integer A unique ID to identify the Track. This should be kept the same when making a direct stream copy of the Track to another file.
TrackType 3 [83] * - 1-254 - u-integer A set of track types coded on 8 bits (1: video, 2: audio, 3: complex, 0x10: logo, 0x11: subtitle, 0x20: control).
FlagEnabled 3 [B9] * - 0-1 1 u-integer (1 bit) Set if the track is used.
FlagDefault 3 [88] * - 0-1 1 u-integer (1 bit) Set if the track is the default for its TrackType.
FlagLacing 3 [9C] * - 0-1 1 u-integer (1 bit) Set if the track may contain blocks using lacing.
MinCache 3 [6D][E7] * - - 0 u-integer The minimum number of frames a player should be able to cache during playback. If set to 0, the reference pseudo-cache system is not used.
MaxCache 3 [6D][F8] - - - - u-integer The maximum cache size required to store referenced frames in and the current frame. 0 means no cache is needed.
DefaultDuration 3 [23][E3][83] - - >0 - u-integer Number of nanoseconds (i.e. not scaled) per frame.
TrackTimecodeScale 3 [23][31][4F] * - >0 1.0 float The scale to apply on this track to work at normal speed in relation with other tracks (mostly used to adjust video speed when the audio length differs).
TrackOffset 3 [53][7F] - - - 0 s-integer A value to add to the Block's Timecode. This can be used to adjust the playback offset of a track.
Name 3 [53][6E] - - - - UTF-8 A human-readable track name.
Language 3 [22][B5][9C] - -- eng string Specifies the language of the track in the Matroska languages form.
CodecID 3 [86] * - - - string An ID corresponding to the codec, see the codec page for more info.
CodecPrivate 3 [63][A2] - - - - binary Private data only known to the codec.
CodecName 3 [25][86][88] - - - - UTF-8 A human-readable string specifying the codec.
CodecSettings 3 [3A][96][97] - - - - UTF-8 A string describing the encoding setting used.
CodecInfoURL 3 [3B][40][40] - * - - string A URL to find information about the codec used.
CodecDownloadURL 3 [26][B2][40] - * - - string A URL to download about the codec used.
CodecDecodeAll 3 [AA] * - 0-1 1 u-integer (1 bit) The codec can decode potentially damaged data.
TrackOverlay 3 [6F][AB] - - - - u-integer Specify that this track is an overlay track for the Track specified (in the u-integer).
video
Video 3 [E0] - - - - sub-elements Video settings.
FlagInterlaced 4 [9A] * - 0-1 0 u-integer (1 bit) Set if the video is interlaced.
StereoMode 4 [53][B8] - - 0-3 0 u-integer Stereo-3D video mode on 2 bits (0: mono, 1: right eye, 2: left eye, 3: both eyes).
PixelWidth 4 [B0] * - not 0 - u-integer Width of the encoded video frames in pixels.
PixelHeight 4 [BA] * - not 0 - u-integer Height of the encoded video frames in pixels.
PixelCropBottom 4 [54][AA] - - - 0 u-integer The number of video pixels to remove at the bottom of the image (for HDTV content).
PixelCropTop 4 [54][BB] - - - 0 u-integer The number of video pixels to remove at the top of the image.
PixelCropLeft 4 [54][CC] - - - 0 u-integer The number of video pixels to remove on the left of the image.
PixelCropRight 4 [54][DD] - - - 0 u-integer The number of video pixels to remove on the right of the image.
DisplayWidth 4 [54][B0] - - not 0 PixelWidth u-integer Width of the video frames to display.
DisplayHeight 4 [54][BA] - - not 0 PixelHeight u-integer Height of the video frames to display.
DisplayUnit 4 [54][B2] - - - 0 u-integer Type of the unit for DisplayWidth/Height (0: pixels, 1: centimeters, 2: inches).
AspectRatioType 4 [54][B3] - - - 0 u-integer Specify the possible modifications to the aspect ratio (0: free resizing, 1: keep aspect ratio, 2: fixed).
ColourSpace 4 [2E][B5][24] - - - - binary Same value as in AVI (32 bits).
GammaValue 4 [2F][B5][23] - - >0 - float Gamma Value.
end video
audio
Audio 3 [E1] - - - - sub-elements Audio settings.
SamplingFrequency 4 [B5] * - >0 8000.0 float Sampling frequency in Hz.
OutputSamplingFrequency 4 [78][B5] - - >0 Sampling Frequency float Real output sampling frequency in Hz (used for SBR techniques).
Channels 4 [9F] * - not 0 1 u-integer Numbers of channels in the track.
ChannelPositions 4 [7D][7B] - - - - binary Table of horizontal angles for each successive channel, see appendix.
BitDepth 4 [62][64] - - not 0 - u-integer Bits per sample, mostly used for PCM.
end audio
content encoding
ContentEncodings 3 [6d][80] - - - - sub-elements Settings for several content encoding mechanisms like compression or encryption.
ContentEncoding 4 [62][40] * * - - sub-elements Settings for one content encoding like compression or encryption.
ContentEncodingOrder 5 [50][31] * - - 0 u-integer Tells when this modification was used during encoding/muxing starting with 0 and counting upwards. The decoder/demuxer has to start with the highest order number it finds and work its way down. This value has to be unique over all ContentEncodingOrder elements in the segment.
ContentEncodingScope 5 [50][32] * - not 0 1 u-integer A bit field that describes which elements have been modified in this way. Values can be OR'ed. Possible values:
1 - all frame contents,
2 - the track's private data
ContentEncodingType 5 [50][33] * - - 0 u-integer A value describing what kind of modification has been done. Possible values:
0 - compression,
1 - encryption
ContentCompression 5 [50][34] - - - - sub-elements Settings describing the compression used. Must be present if the value of ContentEncodingType is 0 and absent otherwise. Each block must be decompressable even if no previous block is available in order not to prevent seeking.
ContentCompAlgo 6 [42][54] * - - 0 u-integer The compression algorithm used. Algorithms that have been specified so far are:
0 - zlib,
1 - bzlib,
2 - lzo1x
ContentCompSettings 6 [42][55] - - - - binary Settings that might be needed by the decompressor.
ContentEncryption 5 [50][35] - - - - sub-elements Settings describing the encryption used. Must be present if the value of ContentEncodingType is 1 and absent otherwise.
ContentEncAlgo 6 [47][e1] - - - 0 u-integer The encryption algorithm used. The value '0' means that the contents have not been encrypted but only signed. Predefined values:
1 - DES, 2 - 3DES, 3 - Twofish, 4 - Blowfish, 5 - AES
ContentEncKeyID 6 [47][e2] - - - - binary For public key algorithms this is the ID of the public key the the data was encrypted with.
ContentSignature 6 [47][e3] - - - - binary A cryptographic signature of the contents.
ContentSigKeyID 6 [47][e4] - - - - binary This is the ID of the private key the data was signed with.
ContentSigAlgo 6 [47][e5] - - - 0 u-integer The algorithm used for the signature. A value of '0' means that the contents have not been signed but only encrypted. Predefined values:
1 - RSA
ContentSigHashAlgo 6 [47][e6] - - - 0 u-integer The hash algorithm used for the signature. A value of '0' means that the contents have not been signed but only encrypted. Predefined values:
1 - SHA1-160
2 - MD5
end content encoding
Cueing Data
Cues 1 [1C][53][BB][6B] - - - - sub-elements A top-level element to speed seeking access. All entries are local to the segment.
CuePoint 2 [BB] * * - - sub-elements Contains all information relative to a seek point in the segment.
CueTime 3 [B3] * - - - u-integer Absolute timecode according to the segment time base.
CueTrackPositions 3 [B7] * * - - sub-elements Contain positions for different tracks corresponding to the timecode.
CueTrack 4 [F7] * - >0 - u-integer The track for which a position is given.
CueClusterPosition 4 [F1] * - - - u-integer The position of the Cluster containing the required Block.
CueBlockNumber 4 [53][78] - - not 0 1 u-integer Number of the Block in the specified Cluster.
CueCodecState 4 [EA] - - - 0 u-integer The position of the Codec State corresponding to this Cue element. 0 means that the data is taken from the initial Track Entry.
CueReference 4 [DB] - * - - sub-elements The Clusters containing the required referenced Blocks.
CueRefTime 5 [96] * - - - u-integer Timecode of the referenced Block.
CueRefCluster 5 [97] * - - - u-integer Position of the Cluster containing the referenced Block.
CueRefNumber 5 [53][5F] - - not 0 1 u-integer Number of the referenced Block of Track X in the specified Cluster.
CueRefCodecState 5 [EB] - - - 0 u-integer The position of the Codec State corresponding to this referenced element. 0 means that the data is taken from the initial Track Entry.
Attachment
Attachments 1 [19][41][A4][69] - - - - sub-elements Contain attached files.
AttachedFile 2 [61][A7] * * - - sub-elements An attached file.
FileDescription 3 [46][7E] - - - - UTF-8 A human-friendly name for the attached file.
FileName 3 [46][6E] * - - - UTF-8 Filename of the attached file.
FileMimeType 3 [46][60] * - - - string MIME type of the file.
FileData 3 [46][5C] * - - - binary The data of the file.
FileUID 3 [46][AE] * - >0 - u-integer Unique ID representing the file, as random as possible.
Chapters
Chapters 1 [10][43][A7][70] - - - - sub-elements A system to define basic menus and partition data. For more detailed information, look at the Chapters Explanation.
EditionEntry 2 [45][B9] * * - - sub-elements Contains all information about a segment edition.
EditionUID 3 [45][BC] - - >0 - u-integer A unique ID to identify the edition. It's useful for tagging an edition.
EditionFlagHidden 3 [45][BD] * - 0-1 0 u-integer (1 bit) If an edition is hidden (1), it should not be available to the user interface (but still to Control Tracks).
EditionFlagDefault 3 [45][DB] * - 0-1 0 u-integer (1 bit) If a flag is set (1) the edition should be used as the default one.
EditionManaged 3 [45][DD] - - - 0 u-integer The type of edition defined: 0 is a standard chapter definition, 1 is an edition with chapters defined multiple times and the order to play them is enforced.
ChapterAtom 3+ [B6] * * - - sub-elements Contains the atom information to use as the chapter atom (apply to all tracks).
ChapterUID4+[73][C4]*->0-u-integerA unique ID to identify the Chapter.
ChapterTimeStart 4+ [91] * - - - u-integer Timecode of the start of Chapter (not scaled).
ChapterTimeEnd 4+ [92] - - - - u-integer Timecode of the end of Chapter (timecode excluded, not scaled).
ChapterFlagHidden 4+ [98] * - 0-1 0 u-integer (1 bit) If a chapter is hidden (1), it should not be available to the user interface (but still to Control Tracks).
ChapterFlagEnabled 4+ [45][98] * - 0-1 1 u-integer (1 bit) Specify wether the chapter is enabled. It can be enabled/disabled by a Control Track. When disabled, the movie should skip all the content between the TimeStart and TimeEnd of this chapter.
ChapterPhysicalEquiv 4+ [63][C3] - - - - u-integer Specify the physical equivalent of this ChapterAtom like "DVD" (60) or "SIDE" (50), see complete list of values.
ChapterTrack 4+ [8F] - - - - sub-elements List of tracks on which the chapter applies. If this element is not present, all tracks apply
ChapterTrackNumber 5+ [89] * * >0 - u-integer UID of the Track to apply this chapter too. In the absense of a control track, choosing this chapter will select the listed Tracks and deselect unlisted tracks. Absense of this element indicates that the Chapter should be applied to any currently used Tracks.
ChapterDisplay 4+ [80] - * - - sub-elements Contains all possible strings to use for the chapter display.
ChapString 5+ [85] * - - - UTF-8 Contains the string to use as the chapter atom.
ChapLanguage 5+ [43][7C] * * - eng string The languages corresponding to the string, in the bibliographic ISO-639-2 form.
ChapCountry 5+ [43][7E] - * - - string The countries corresponding to the string, same 2 octets as in Internet domains.
Tagging
Tags 1 [12][54][C3][67] - * - - sub-elements Element containing elements specific to Tracks/Chapters. A list of valid tags can be found here.
Tag 2 [73][73] * * - - sub-elements Element containing elements specific to Tracks/Chapters.
Targets 3 [63][C0] * - - - sub-elements Contain all UIDs where the specified meta data apply. It is void to describe everything in the segment.
TargetTypeValue 4 [68][CA] - - - 50 u-integer A number to indicate the logical level of the target (see TargetType).
TargetType 4 [63][CA] - - - - string An informational string that can be used to display the logical level of the target like "ALBUM", "TRACK", "MOVIE", "CHAPTER", etc (see TargetType).
EditionUID 4 [63][C9] - * - 0 u-integer A unique ID to identify the EditionEntry(s) the tags belong to. If the value is 0 at this level, the tags apply to all editions in the Segment.
ChapterUID 4 [63][C4] - * - 0 u-integer A unique ID to identify the Chapter(s) the tags belong to. If the value is 0 at this level, the tags apply to all chapters in the Segment.
AttachmentUID 4 [63][C6] - * - 0 u-integer A unique ID to identify the Attachment(s) the tags belong to. If the value is 0 at this level, the tags apply to all the attachments in the Segment.
SimpleTag 3+ [67][C8] * * - - sub-elements Contains general information about the target.
TagName 4+ [45][A3] * - - - UTF-8 The name of the Tag that is going to be stored.
TagLanguage 4+ [44][7A] * - - und string Specifies the language of the tag specified, in the Matroska languages form.
TagDefault 4+ [44][84] * - 0-1 1 u-integer (1 bit) Indication to know if this is the default/original language to use for the given tag.
TagString 4+ [44][87] - - - - UTF-8 The value of the Tag.
TagBinary 4+ [44][85] - - - - binary The values of the Tag if it is binary. Note that this cannot be used in the same Tag as TagString or TagInteger.

All top-levels elements (Segment and direct sub-elements) are coded on 4 octets, i.e. class D elements.

Table Notes

  • Multi: specifies that the element can be found multiple times in the upper level element

Appendix

Profiles Grid

See the Profiles Page.

Audio Channel Position

See the Channel Postion Page.

Language Codes

Language codes can be either the 3 letters bibliographic ISO-639-2 form (like "fre" for french), or a language code mixed with a country code for specialities in languages (like "fre-ca" for canadian french).

Physical Types

Each level can have different meanings for audio and video. The ORIGINAL_MEDIUM tag can be used to specify a string for ChapterPhysicalEquiv = 60. Here is the list of possible levels for both audio and video :

ChapterPhysicalEquivAudioVideoComment
70SET / PACKAGESET / PACKAGEthe collection of different media
60CD / 12" / 10" / 7" / TAPE / MINIDISC / DATDVD / VHS / LASERDISCthe physical medium like a CD or a DVD
50SIDESIDEwhen the original medium (LP/DVD) has different sides
40-LAYERanother physical level on DVDs
30SESSIONSESSIONas found on CDs and DVDs
20TRACK-as found on audio CDs
10INDEX-the first logical level of the side/medium

Block Structure

Size = 1 + (1-8) + 4 + (4 + (4)) octets. So from 6 to 21 octets.

Bit 0 is the most significant bit.

Frames using references should be stored in "coding order". That means the references first and then the frames referencing them. A consequence is that timecodes may not be consecutive. But a frame with a past timecode must reference a frame already known, otherwise it's considered bad/void.

There can be many Blocks in a BlockGroup provided they all have the same timecode. It is used with different parts of a frame with different priorities.

Block Header
OffsetPlayerDescription
0x00+mustTrack Number (Track Entry). It is coded in EBML like form (1 octet if the value is < 0x80, 2 if < 0x4000, etc) (most significant bits set to increase the range).
0x01+mustTimecode (relative to Cluster timecode, signed int16)
0x03+-
Flags
BitPlayerDescription
7mustGap - Set when the Track is empty after this Block ends (data should not be rendered from this timecode, cache can be flushed)
5-6mustLacing
  • 00 : no lacing
  • 01 : Xiph lacing
  • 11 : EBML lacing
  • 10 : fixed-size lacing
4-0-Reserved, set to 0
Lace (when lacing bit is set)
0x00mustNumber of frames in the lace-1 (uint8)
0x01 / 0xXXmust*Lace-coded size of each frame of the lace, except for the last one (multiple uint8). *This is not used with Fixed-size lacing as it is calculated automatically from (total size of lace) / (number of frames in lace).
-mustConsecutive laced frames

Lacing

Lacing is a mechanism to save space when storing data. It is typically used for small blocks of data (refered to as frames in matroska). There are 3 types of lacing : the Xiph one inspired by what is found in the Ogg container, the EBML one which is the same with sizes coded differently and the fixed-size one where the size is not coded. As an example is better than words...

Let's say you want to store 3 frames of the same track. The first frame is 800 octets long, the second is 500 octets long and the third is 1000 octets long. As these data are small, you can store them in a lace to save space. They will then be solved in the same block as follows:

Xiph lacing

  • Block head (with lacing bits set to 01)
  • Lacing head: Number of frames in the lace -1, i.e. 2 (the 800 and 400 octets one)
  • Lacing sizes: only the 2 first ones will be coded, 800 gives 255;255;255;35, 500 gives 255;245. The size of the last frame is deduced from the total size of the Block.
  • Data in frame 1
  • Data in frame 2
  • Data in frame 3

A frame with a size multiple of 255 is coded with a 0 at the end of the size, for example 765 is coded 255;255;255;0.

EBML lacing

In this case the size is not coded as blocks of 255 bytes, but as a difference with the previous size and this size is coded as in EBML. The first size in the lace is unsigned as in EBML. The others use a range shifting to get a sign on each value :

1xxx xxxx                                                                              - value -(2^6-1) to  2^6-1

                                                                                        (ie 0 to 2^7-2 minus 2^6-1, half of the range)

01xx xxxx  xxxx xxxx                                                                   - value -(2^13-1) to 2^13-1

001x xxxx  xxxx xxxx  xxxx xxxx                                                        - value -(2^20-1) to 2^20-1

0001 xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx                                             - value -(2^27-1) to 2^27-1

0000 1xxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx                                  - value -(2^34-1) to 2^34-1

0000 01xx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx                       - value -(2^41-1) to 2^41-1

0000 001x  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx            - value -(2^48-1) to 2^48-1

  • Block head (with lacing bits set to 11)
  • Lacing head: Number of frames in the lace -1, i.e. 2 (the 800 and 400 octets one)
  • Lacing sizes: only the 2 first ones will be coded, 800 gives 0x320 0x4000 = 0x4320, 500 is coded as -300 : - 0x12C + 0x1FFF + 0x4000 = 0x5ED3. The size of the last frame is deduced from the total size of the Block.
  • Data in frame 1
  • Data in frame 2
  • Data in frame 3

Fixed-size lacing

In this case only the number of frames in the lace is saved, the size of each frame is deduced from the total size of the Block. For example, for 3 frames of 800 octets each :

  • Block head (with lacing bits set to 10)
  • Lacing head: Number of frames in the lace -1, i.e. 2
  • Data in frame 1
  • Data in frame 2
  • Data in frame 3

Virtual Block

The data in matroska is stored in coding order. But that means if you seek to a particular point and a frame has been referenced far away, you won't know while playing and you might miss this frame (true for independent frames and overlapping of dependent frames). So the idea is to have a placeholder for the original frame in the timecode (display) order.

The structure is a scaled down version of the normal Block.

Virtual Block Header
OffsetPlayerDescription
0x00+mustTrack Number (Track Entry). It is coded in EBML like form (1 octet if the value is < 0x80, 2 if < 0x4000, etc) (most significant bits set to increase the range).
0x01+mustTimecode (relative to Cluster timecode, signed int16)
0x03+-
Flags
BitPlayerDescription
7mustGap - Set when the Track is empty after this Block ends (data should not be rendered from this timecode, cache can be flushed)
6-0-Reserved, set to 0