-
Notifications
You must be signed in to change notification settings - Fork 17
Understanding The Sample Tables: An Example
This discussion applies to the case where the mp4 file is self-contained, and has an 'mdat' box containing the media data and a 'moov' box containing the metadata that references the media data.
(diagram from ISO/IEC 14496-12 – MPEG-4 Part 12)
Annex A. of ISO/IEC 14496-12 – MPEG-4 Part 12 contains a description of how the media data is laid out within the file as an interleaved set of samples and how the sample table box container (stbl) contains a set of tables that are used to identify the position of individual samples within the file.
The text of the standard document is perfectly well written, but it greatly helps to understand the relationship between the different tables (stco, stsz, stsc etc.) through a worked example. Hence why I've written this page.
First of all some definitions from the standard:
- chunk: contiguous set of samples for one track
- sample: all the data associated with a single timestamp
It is quite possible (and quite common) that a chunk only contains one sample, but it seems to be usual for a chunk to contain n-samples where n is a single or double-digit number.
The data for the example comes from a real file that has two tracks (track 1 is a audio track, track 2 is video track) and an mdat section located at 121915 bytes from the start of the file.
Each track has an stbl container and hence it's own set of sample tables.
Gives a byte offset for each chunk from the start of the file.
for track 1 the start of the table looks like this:
Has header:
{"size": 1312, "type": "stco"}
Has values:
{
"version": 0,
"flags": "0x000000",
"entry_count": 324,
"entry_list": [
{ "chunk_offset": 121923 } ,
{ "chunk_offset": 897412 } ,
{ "chunk_offset": 1170432 } ,
{ "chunk_offset": 1426814 } ,
and for track 2 the start of the table looks like this:
Has header:
{"size": 1224, "type": "stco"}
Has values:
{
"version": 0,
"flags": "0x000000",
"entry_count": 302,
"entry_list": [
{ "chunk_offset": 130635 },
{ "chunk_offset": 904603 },
{ "chunk_offset": 1177851 },
{ "chunk_offset": 1434346 },
So the first chunk of track 1 (byte offset 121923) starts immediately after the 8 bytes of the mdat header (this is not always the case, sometimes there appears to be some kind of pre-amble at the start of the mdat before the referenced chunks) and is followed by the first chunk of track 2 which in turn is followed by the second chunk of track 1 and so on.