forked from mirror/libbpg
427 lines
16 KiB
Text
427 lines
16 KiB
Text
|
BPG Specification
|
||
|
|
||
|
version 0.9.3
|
||
|
|
||
|
Copyright (c) 2014 Fabrice Bellard
|
||
|
|
||
|
1) Introduction
|
||
|
---------------
|
||
|
|
||
|
BPG is a lossy and lossless picture compression format based on HEVC
|
||
|
[1]. It supports grayscale, YCbCr, RGB, YCgCo color spaces with an
|
||
|
optional alpha channel. CMYK is supported by reusing the alpha channel
|
||
|
to encode an additional white component. The bit depth of each
|
||
|
component is from 8 to 14 bits. The color values are stored either in
|
||
|
full range (JPEG case) or limited range (video case). The YCbCr color
|
||
|
space is either BT 601 (JPEG case), BT 709 or BT 2020.
|
||
|
|
||
|
The chroma can be subsampled by a factor of two in horizontal or both
|
||
|
in horizontal and vertical directions (4:4:4, 4:2:2 or 4:2:0 chroma
|
||
|
formats are supported). The chroma is sampled at the same position
|
||
|
relative to the luma as in the JPEG format [2].
|
||
|
|
||
|
Arbitrary metadata (such as EXIF, ICC profile, XMP) are supported.
|
||
|
|
||
|
2) Bitstream conventions
|
||
|
------------------------
|
||
|
|
||
|
The bit stream is byte aligned and bit fields are read from most
|
||
|
significant to least signficant bit in each byte.
|
||
|
|
||
|
- u(n) is an unsigned integer stored on n bits.
|
||
|
|
||
|
- ue7(n) is an unsigned integer of at most n bits stored on a variable
|
||
|
number of bytes. All the bytes except the last one have a '1' as
|
||
|
their first bit. The unsigned integer is represented as the
|
||
|
concatenation of the remaining 7 bit codewords. Only the shortest
|
||
|
encoding for a given unsigned integer shall be accepted by the
|
||
|
decoder (i.e. the first byte is never 0x80). Example:
|
||
|
|
||
|
Encoded bytes Unsigned integer value
|
||
|
0x08 8
|
||
|
0x84 0x1e 542
|
||
|
0xac 0xbe 0x17 728855
|
||
|
|
||
|
- ue(v) : unsigned integer 0-th order Exp-Golomb-coded (see HEVC
|
||
|
specification).
|
||
|
|
||
|
- b(8) is an arbitrary byte.
|
||
|
|
||
|
3) File format
|
||
|
--------------
|
||
|
|
||
|
3.1) Syntax
|
||
|
-----------
|
||
|
|
||
|
heic_file() {
|
||
|
|
||
|
file_magic u(32)
|
||
|
|
||
|
pixel_format u(3)
|
||
|
alpha1_flag u(1)
|
||
|
bit_depth_minus_8 u(4)
|
||
|
|
||
|
color_space u(4)
|
||
|
extension_present_flag u(1)
|
||
|
alpha2_flag u(1)
|
||
|
limited_range_flag u(1)
|
||
|
reserved_zero u(1)
|
||
|
|
||
|
picture_width ue7(32)
|
||
|
picture_height ue7(32)
|
||
|
|
||
|
picture_data_length ue7(32)
|
||
|
if (extension_present_flag)
|
||
|
extension_data_length ue7(32)
|
||
|
if (alpha1_flag || alpha2_flag)
|
||
|
alpha_data_length ue7(32)
|
||
|
|
||
|
if (extension_present_flag) {
|
||
|
extension_data()
|
||
|
}
|
||
|
|
||
|
hevc_header_and_data()
|
||
|
|
||
|
if (alpha1_flag || alpha2_flag) {
|
||
|
hevc_header_and_data()
|
||
|
}
|
||
|
|
||
|
}
|
||
|
|
||
|
extension_data()
|
||
|
{
|
||
|
for(i = 0; i < v; i++) {
|
||
|
extension_tag ue7(32)
|
||
|
extension_tag_length ue7(32)
|
||
|
for(j = 0; j < extension_tag_length; j++) {
|
||
|
extension_tag_data_byte b(8)
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
|
||
|
hevc_header_and_data()
|
||
|
{
|
||
|
hevc_header_length ue7(32)
|
||
|
log2_min_luma_coding_block_size_minus3 ue(v)
|
||
|
log2_diff_max_min_luma_coding_block_size ue(v)
|
||
|
log2_min_transform_block_size_minus2 ue(v)
|
||
|
log2_diff_max_min_transform_block_size ue(v)
|
||
|
max_transform_hierarchy_depth_intra ue(v)
|
||
|
sample_adaptive_offset_enabled_flag u(1)
|
||
|
pcm_enabled_flag u(1)
|
||
|
if (pcm_enabled_flag) {
|
||
|
pcm_sample_bit_depth_luma_minus1 u(4)
|
||
|
pcm_sample_bit_depth_chroma_minus1 u(4)
|
||
|
log2_min_pcm_luma_coding_block_size_minus3 ue(v)
|
||
|
log2_diff_max_min_pcm_luma_coding_block_size ue(v)
|
||
|
pcm_loop_filter_disabled_flag u(1)
|
||
|
}
|
||
|
strong_intra_smoothing_enabled_flag u(1)
|
||
|
sps_extension_present_flag u(1)
|
||
|
if (sps_extension_present_flag) {
|
||
|
sps_range_extension_flag u(1)
|
||
|
sps_extension_7bits u(7)
|
||
|
}
|
||
|
if (sps_range_extension_flag) {
|
||
|
transform_skip_rotation_enabled_flag u(1)
|
||
|
transform_skip_context_enabled_flag u(1)
|
||
|
implicit_rdpcm_enabled_flag u(1)
|
||
|
explicit_rdpcm_enabled_flag u(1)
|
||
|
extended_precision_processing_flag u(1)
|
||
|
intra_smoothing_disabled_flag u(1)
|
||
|
high_precision_offsets_enabled_flag u(1)
|
||
|
persistent_rice_adaptation_enabled_flag u(1)
|
||
|
cabac_bypass_alignment_enabled_flag u(1)
|
||
|
}
|
||
|
trailing_bits u(v)
|
||
|
|
||
|
hevc_data()
|
||
|
}
|
||
|
|
||
|
hevc_data()
|
||
|
{
|
||
|
for(i = 0; i < v; i++) {
|
||
|
hevc_data_byte b(8)
|
||
|
}
|
||
|
}
|
||
|
|
||
|
|
||
|
3.2) Semantics
|
||
|
--------------
|
||
|
|
||
|
'file_magic' is defined as 0x425047fb.
|
||
|
|
||
|
'pixel_format' indicates the chroma subsampling:
|
||
|
|
||
|
0 : Grayscale
|
||
|
1 : 4:2:0
|
||
|
2 : 4:2:2
|
||
|
3 : 4:4:4
|
||
|
|
||
|
The other values are reserved.
|
||
|
|
||
|
The chroma samples in the 4:2:0 and 4:2:2 formats are sampled
|
||
|
at the same position as JPEG [2].
|
||
|
|
||
|
'alpha1_flag' and 'alpha2_flag' give information about the alpha plane:
|
||
|
|
||
|
alpha1_flag=0 alpha2_flag=0: no alpha plane.
|
||
|
|
||
|
alpha1_flag=1 alpha2_flag=0: alpha present. The color is not
|
||
|
premultiplied.
|
||
|
|
||
|
alpha1_flag=1 alpha2_flag=1: alpha present. The color is
|
||
|
premultiplied. The resulting non-premultiplied R', G', B' shall
|
||
|
be recovered as:
|
||
|
|
||
|
if A != 0
|
||
|
R' = min(R / A, 1), G' = min(G / A, 1), B' = min(B / A, 1)
|
||
|
else
|
||
|
R' = G' = B' = 1 .
|
||
|
|
||
|
alpha1_flag=0 alpha2_flag=1: the alpha plane is present and
|
||
|
contains the W color component (CMYK color). The resulting CMYK
|
||
|
data can be recovered as follows:
|
||
|
|
||
|
C = (1 - R), M = (1 - G), Y = (1 - B), K = (1 - W) .
|
||
|
|
||
|
In case no color profile is specified, the sRGB color R'G'B'
|
||
|
shall be computed as:
|
||
|
|
||
|
R' = R * W, G' = G * W, B' = B * W .
|
||
|
|
||
|
'bit_depth_minus_8' is the number of bits used for each component
|
||
|
minus 8. In this version of the specification, bit_depth_minus_8
|
||
|
<= 6.
|
||
|
|
||
|
'extension_present_flag' indicates that extension data are
|
||
|
present.
|
||
|
|
||
|
'color_space' specifies how to convert the color planes to
|
||
|
RGB. It must be 0 when pixel_format = 0 (grayscale):
|
||
|
|
||
|
0 : YCbCr (BT 601, same as JPEG and HEVC matrix_coeffs = 5)
|
||
|
1 : RGB (component order: G B R)
|
||
|
2 : YCgCo (same as HEVC matrix_coeffs = 8)
|
||
|
3 : YCbCr (BT 709, same as HEVC matrix_coeffs = 1)
|
||
|
4 : YCbCr (BT 2020 non constant luminance system, same as HEVC
|
||
|
matrix_coeffs = 9)
|
||
|
5 : reserved for BT 2020 constant luminance system, not
|
||
|
supported in this version of the specification.
|
||
|
|
||
|
The other values are reserved.
|
||
|
|
||
|
YCbCr is defined using the BT 601, BT 709 or BT 2020 conversion
|
||
|
matrices.
|
||
|
|
||
|
For RGB, G is stored as the Y plane. B in the Cb plane and R in
|
||
|
the Cr plane.
|
||
|
|
||
|
YCgCo is defined as HEVC matrix_coeffs = 8, full range. Y is
|
||
|
stored in the Y plane. Cg in the Cb plane and Co in the Cr
|
||
|
plane.
|
||
|
|
||
|
If no color profile is present, the RGB output data are assumed
|
||
|
to be in the sRGB color space [6].
|
||
|
|
||
|
'limited_range_flag': opposite of the HEVC video_full_range_flag.
|
||
|
The value zero indicates that the full range of each color
|
||
|
component is used. The value one indicates that a limited range
|
||
|
is used:
|
||
|
|
||
|
- (16 << (bit_depth - 8) to (235 << (bit_depth - 8)) for Y
|
||
|
and G, B, R,
|
||
|
- (16 << (bit_depth - 8) to (240 << (bit_depth - 8)) for Cb and Cr.
|
||
|
|
||
|
For the YCgCo color space, the range limitation shall be done on
|
||
|
the RGB data.
|
||
|
|
||
|
The alpha (or W) plane always uses the full range.
|
||
|
|
||
|
'reserved_zero' must be 0 in this version.
|
||
|
|
||
|
'picture_width' is the picture width in pixels. The value 0 is
|
||
|
not allowed.
|
||
|
|
||
|
'picture_height' is the picture height in pixels. The value 0 is
|
||
|
not allowed.
|
||
|
|
||
|
'picture_data_length' is the picture data length in bytes.
|
||
|
|
||
|
'extension_data_length' is the extension data length in bytes.
|
||
|
|
||
|
'alpha_data_length' is the alpha data length in bytes.
|
||
|
|
||
|
'extension_data()' is the extension data.
|
||
|
|
||
|
'extension_tag' is the extension tag. The following values are defined:
|
||
|
|
||
|
1: EXIF data.
|
||
|
|
||
|
2: ICC profile (see [4])
|
||
|
|
||
|
3: XMP (see [5])
|
||
|
|
||
|
4: Thumbnail (the thumbnail shall be a lower resolution version
|
||
|
of the image and stored in BPG format).
|
||
|
|
||
|
The decoder shall ignore the tags it does not support.
|
||
|
|
||
|
'extension_tag_length' is the length in bytes of the extension tag.
|
||
|
|
||
|
'hevc_header_length' is the length in bytes of the following data
|
||
|
up to and including 'trailing_bits'.
|
||
|
|
||
|
'log2_min_luma_coding_block_size_minus3',
|
||
|
'log2_diff_max_min_luma_coding_block_size',
|
||
|
'log2_min_transform_block_size_minus2',
|
||
|
'log2_diff_max_min_transform_block_size',
|
||
|
'max_transform_hierarchy_depth_intra',
|
||
|
'sample_adaptive_offset_enabled_flag', 'pcm_enabled_flag',
|
||
|
'pcm_sample_bit_depth_luma_minus1',
|
||
|
'pcm_sample_bit_depth_chroma_minus1',
|
||
|
'log2_min_pcm_luma_coding_block_size_minus3',
|
||
|
'log2_diff_max_min_pcm_luma_coding_block_size',
|
||
|
'pcm_loop_filter_disabled_flag',
|
||
|
'strong_intra_smoothing_enabled_flag', 'sps_extension_flag'
|
||
|
'sps_extension_present_flag', 'sps_range_extension_flag'
|
||
|
'transform_skip_rotation_enabled_flag',
|
||
|
'transform_skip_context_enabled_flag',
|
||
|
'implicit_rdpcm_enabled_flag', 'explicit_rdpcm_enabled_flag',
|
||
|
'extended_precision_processing_flag',
|
||
|
'intra_smoothing_disabled_flag',
|
||
|
'high_precision_offsets_enabled_flag',
|
||
|
'persistent_rice_adaptation_enabled_flag',
|
||
|
'cabac_bypass_alignment_enabled_flag' are
|
||
|
the corresponding fields of the HEVC SPS syntax element.
|
||
|
|
||
|
'trailing_bits' has a value of 0 and has a length from 0 to 7
|
||
|
bits so that the next data is byte aligned.
|
||
|
|
||
|
'hevc_data()' contains the corresponding HEVC picture data,
|
||
|
excluding the first NAL start code (i.e. the first 0x00 0x00 0x01
|
||
|
or 0x00 0x00 0x00 0x01 bytes). The VPS and SPS NALs shall not be
|
||
|
included in the HEVC picture data. The decoder can recover the
|
||
|
necessary fields from the header by doing the following
|
||
|
assumptions:
|
||
|
|
||
|
- vps_video_parameter_set_id = 0
|
||
|
- sps_video_parameter_set_id = 0
|
||
|
- sps_max_sub_layers = 1
|
||
|
- sps_seq_parameter_set_id = 0
|
||
|
- chroma_format_idc: for picture data:
|
||
|
chroma_format_idc = pixel_format
|
||
|
for alpha data:
|
||
|
chroma_format_idc = 0.
|
||
|
- separate_colour_plane_flag = 0
|
||
|
- pic_width_in_luma_samples = ceil(picture_width/cb_size) * cb_size
|
||
|
- pic_height_in_luma_samples = ceil(picture_height/cb_size) * cb_size
|
||
|
with cb_size = 1 << log2_min_luma_coding_block_size
|
||
|
- bit_depth_luma_minus8 = bit_depth_minus_8
|
||
|
- bit_depth_chroma_minus8 = bit_depth_minus_8
|
||
|
- scaling_list_enabled_flag = 0
|
||
|
|
||
|
3.3) HEVC Profile
|
||
|
-----------------
|
||
|
|
||
|
Conforming HEVC bit streams shall conform to the Main 4:4:4 16 Still
|
||
|
Picture, Level 8.5 of the HEVC specification with the following
|
||
|
modifications.
|
||
|
|
||
|
- separate_colour_plane_flag shall be 0 when present.
|
||
|
|
||
|
- bit_depth_luma_minus8 <= 6
|
||
|
|
||
|
- bit_depth_chroma_minus8 = bit_depth_luma_minus8
|
||
|
|
||
|
- explicit_rdpcm_enabled_flag = 0 (does not matter for intra frames)
|
||
|
|
||
|
- extended_precision_processing_flag = 0
|
||
|
|
||
|
- cabac_bypass_alignment_enabled_flag = 0
|
||
|
|
||
|
- high_precision_offsets_enabled_flag = 0 (does not matter for intra frames)
|
||
|
|
||
|
- If the encoded image is larger than the size indicated by
|
||
|
picture_width and picture_height, the lower right part of the decoded
|
||
|
image shall be cropped. If a horizontal (resp. vertical) decimation by
|
||
|
two is done for the chroma and that the width (resp. height) is n
|
||
|
pixels, ceil(n/2) pixels must be kept as the resulting chroma
|
||
|
information.
|
||
|
|
||
|
4) Design choices
|
||
|
-----------------
|
||
|
|
||
|
(This section is informative)
|
||
|
|
||
|
- Our design principle was to keep the format as simple as possible
|
||
|
while taking the HEVC codec as basis. Our main metric to evaluate
|
||
|
the simplicity was the size of a software decoder which outputs 32
|
||
|
bit RGBA pixel data.
|
||
|
|
||
|
- Pixel formats: we wanted to be able to convert JPEG images to BPG
|
||
|
with as little loss as possible. So supporting the same color space
|
||
|
(CCIR 601 YCbCr) with the same range (full range) and most of the
|
||
|
allowed JPEG chroma formats (4:4:4, 4:2:2, 4:2:0 or grayscale) was
|
||
|
mandatory to avoid going back to RGB or doing a subsampling or
|
||
|
interpolation.
|
||
|
|
||
|
- Alpha support: alpha support is mandatory. We chose to use a
|
||
|
separate HEVC monochrome plane to handle it instead of another
|
||
|
format to simplify the decoder. The color is either
|
||
|
non-premultiplied or premultiplied. Premultiplied alpha usually
|
||
|
gives a better compression. Non-premultiplied alpha is supported in
|
||
|
case no loss is needed on the color components.
|
||
|
|
||
|
- Color spaces: In addition to YCbCr, RGB is supported for the high
|
||
|
quality or lossless cases. YCgCo is supported because it may give
|
||
|
slightly better results than YCbCr for high quality images. CMYK is
|
||
|
supported so that JPEGs containing this color space can be
|
||
|
converted. The alpha plane is used to store the W (1-K) plane. The
|
||
|
data is stored with inverted components (1-X) so that the conversion
|
||
|
to RGB is simplified. The support of the BT 709 and BT 2020 (non
|
||
|
constant luminance) YCbCr encodings and of the limited range color
|
||
|
values were added to reduce the losses when converting video frames.
|
||
|
|
||
|
- Bit depth: we decided to support the HEVC bit depths 8 to 14. The
|
||
|
added complexity is small and it allows to support high quality
|
||
|
pictures from cameras.
|
||
|
|
||
|
- Picture file format: keeping a completely standard HEVC stream would
|
||
|
have meant a more difficult parsing for the picture header which is
|
||
|
a problem for the various image utilities to get the basic picture
|
||
|
information (pixel format, width, height). So we added a small
|
||
|
header before the HEVC bit stream. The picture header is byte
|
||
|
oriended so it is easy to parse.
|
||
|
|
||
|
- HEVC bit stream: the standard HEVC headers (the VPS and SPS NALs)
|
||
|
give an overhead of about 60 bytes for no added value in the case of
|
||
|
picture compression. Since the alpha plane uses a different HEVC bit
|
||
|
stream, it also adds the same overhead again. So we removed the VPS
|
||
|
and SPS NALs and added a very small header with the equivalent
|
||
|
information (typically 4 bytes). We also removed the first NAL start
|
||
|
code which is not useful. It is still possible to reconstruct a
|
||
|
standard HEVC stream to feed an unmodified hardware decoder if needed.
|
||
|
|
||
|
- Extensions: the metadata are stored at the beginning of the file so
|
||
|
that they can be read at the same time as the header. Since metadata
|
||
|
tend to evolve faster than the image formats, we left room for
|
||
|
extension by using a (tag, lengh) representation. The decoder can
|
||
|
easily skip all the metadata because their length is explicitly
|
||
|
stored in the image header.
|
||
|
|
||
|
5) References
|
||
|
-------------
|
||
|
|
||
|
[1] High efficiency video coding (HEVC) version 2 (ITU-T Recommendation H.265)
|
||
|
|
||
|
[2] JPEG File Interchange Format version 1.02 ( http://www.w3.org/Graphics/JPEG/jfif3.pdf )
|
||
|
|
||
|
[3] EXIF version 2.2 (JEITA CP-3451)
|
||
|
|
||
|
[4] The International Color Consortium ( http://www.color.org/ )
|
||
|
|
||
|
[5] Extensible Metadata Platform (XMP) http://www.adobe.com/devnet/xmp.html
|
||
|
|
||
|
[6] sRGB color space, IEC 61966-2-1.
|