1. Describing YUV 4:2:0 files and removing the color components
In our first example, a simple BS Schema is given together with an easy adaptation implemented in XSLT.
The bitstreams that have to be described in this example contain uncompressed data, which are stored according to the YUV 4:2:0 file format (known as I420 on the fourcc website). The structure of bitstreams complaint to the YUV 4:2:0 file format is given in Figure 1. Note that each sample is represented by 8 bits.
The entire BS Schema for the YUV 4:2:0 file format makes only use of BSDL features belonging to the first version of the specification. The BintoBSD Parser generates BSD of the following structure, hereby taking our BS Schema as input:
<bitstreams>
<frame>
<y>0 101376</y>
<u>101376 25344</u>
<v>126720 25344</v>
</frame>
<frame>
<y>152064 101376</y>
<u>253440 25344</u>
<v>278784 25344</v>
</frame>
<!-- and so on -->
</bitstream>
The uncompressed YUV file format cannot be considered as a scalable format. Nevertheless, it is possible to execute a number of adaptations on this file format. In particular, it is possible to reduce the frame rate of the embedded video sequence and to remove the chroma (or color) components in order to obtain a monochrome video sequence. These two adaptations can be executed in the XML domain by transforming the generated BSDs. Therefore, both transformations were been implemented in XSLT.
- BS Schema for the YUV 4:2:0 file format contain a CIF resolution video sequence (yuv.xsd, together with all XML Schemata included by our BS Schema): BSSchema_yuv.zip.
- MPEG-21 DIA Reference software containing the BintoBSD Parser and BSDtoBin Parser (public available on ISO as ISO/IEC 21000-8:2006) in order to generate the desired BSDs: DIA-RefSW-A-1_0_4.zip.
- Example of an YUV 4:2:0 bitstream containing a video sequence of 120 frames: tempete_120.yuv.
- XSLT stylesheet that removes all even frames from the video sequence in the YUV 4:2:0 bitstreams: half_frame_rate.xsl.
- XSLT stylesheet that removes the chroma samples (i.e., the color components) from the YUV 4:2:0 bitstreams: removeChroma.xsl.
- SAXON as XSLT transformation engine can be downloaded from the Saxonica website: http://saxon.sourceforge.net/.
2. Temporal adaptation of H.264/MPEG-4 AVC bitstreams
The state-of-the-art H.264/MPEG-4 AVC standard does not contain any tools for providing spatial and SNR scalability. Hence, the standard is a so-called single-layer coding specification. In spite of this, temporal scalability is supported by H.264/MPEG-4 AVC, but it came as a side effect of the improvements in terms of coding efficiency. More precisely, temporal scalability in H.264/MPEG-4 AVC is based on the definition of sub-sequences during the encoding phase of a video sequence.
In order to exploit the temporal scalability of H.264/MPEG-4 AVC bitstreams, the high-level structure of the first version of H.264/MPEG-4 AVC has to be described by using the MPEG-21 BSDL language. The high-level structure of an H.264/MPEG-4 AVC bitstream is given in Figure 2.
We
have described the Sequence Parameter Set (SPS), Picture Parameter Set (SPS), the sub-sequence information SEI message, the sub-sequence layer characteristics SEI message, and the NAL unit header of each slice in our BS Schema. By describing these syntax elements in XML, it is possible to adapt temporally scalable H.264/MPEG-4 AVC bitstreams in an efficient manner. This efficient adaptation process can be obtained because a temporally scalable H.264/MPEG-4 AVC bitstream will have a sequence of NAL units as shown in Figure 3.
- The BS Schema for H.264/MPEG-4 AVC bitsteams: h264_avc.zip.
- Following two classes have to be added to the classpath in the following directory: org/iso/mpeg/mpeg21/dia/bsdl/XSD/datatype. These two classes are called from in the BS Schema and are needed to parse exponential Golomb coded syntax elements: UnsignedExpGolomb.class and SignedExpGolomb.class.
- Example of an H.264/MPEG-4 AVC bitstream containing 4 temporal layers and in which SEI messages are encapsulated such that the bitstream can be adapted in an efficient manner: coastguard.h264.
- XSLT stylesheet that removes the highest temporal layer such that the adapted bitstream has half of its original frame rate: removeLayer.xsl.
3 BS Schema for the Scalable Extension of H.264/MPEG-4 AVC
The scalable extension of H.264/MPEG-4 AVC, abbreviated as SVC, is a coding specification that allows to generate scalable bitstreams that are adaptable along three embedded scalability axes at the same time. The final version of the specification will be available in April 2007 as the third amendment on H.264/MPEG-4 AVC. The design of SVC allows temporal scalability using hierarchical B-slice-coded pictures, spatial scalability based on the use of a layered approach, and SNR scalability by relying on a bit plane encoder.
The syntax of the SVC standard is, of course, an extension of the H.264/MPEG-4 AVC specification. However, in order to describe the SVC bitstreams in the XML domain, it is necessary that emulation prevention bytes can be interpreted in the correct manner and that context-related attributes are taken into account. Without the interpretation of emulation prevention bytes, it is not possible to parse correctly the SVC bitstreams and without using the context-related attributes, the generation of the BSD cannot be done in an acceptable amount of time. These extensions are currently not implemented in the public available reference software of DIA. As such, we can only show our BS Schema which cannot be interpreted by the current public available software package. Once the reference software is compliant with the second amendment on DIA, we will give a complete example with bitstreams and stylesheet. In the meantime, we will only give our BS Schema for version 6 of the SVC specification (JSVM-6): jsvm.zip.
More information on the exploitation of temporal scalability in H.264/MPEG-4 AVC and the adaptation of SVC bitstreams and the needed extensions can be found in publication.