forked from mirror/libbpg
519 lines
20 KiB
Text
519 lines
20 KiB
Text
BPG Specification
|
|
|
|
version 0.9.5
|
|
|
|
Copyright (c) 2014-2015 Fabrice Bellard
|
|
|
|
1) Introduction
|
|
---------------
|
|
|
|
BPG is a lossy and lossless picture compression format based on HEVC
|
|
[1]. It supports grayscale, YCbCr, RGB, YCgCo color spaces with an
|
|
optional alpha channel. CMYK is supported by reusing the alpha channel
|
|
to encode an additional white component. The bit depth of each
|
|
component is from 8 to 14 bits. The color values are stored either in
|
|
full range (JPEG case) or limited range (video case). The YCbCr color
|
|
space is either BT 601 (JPEG case), BT 709 or BT 2020.
|
|
|
|
The chroma can be subsampled by a factor of two in horizontal or both
|
|
in horizontal and vertical directions (4:4:4, 4:2:2 or 4:2:0 chroma
|
|
formats are supported). In order to be able to transcode JPEG images
|
|
or video frames without modification to the chroma, both JPEG and
|
|
MPEG2 chroma sample positions are supported.
|
|
|
|
Progressive decoding and display is supported by interleaving the
|
|
alpha and color data.
|
|
|
|
Arbitrary metadata (such as EXIF, ICC profile, XMP) are supported.
|
|
|
|
Animations are supported as an optional feature. Decoders not
|
|
supporting animation display the first frame of the animation.
|
|
|
|
2) Bitstream conventions
|
|
------------------------
|
|
|
|
The bit stream is byte aligned and bit fields are read from most
|
|
significant to least signficant bit in each byte.
|
|
|
|
- u(n) is an unsigned integer stored on n bits.
|
|
|
|
- ue7(n) is an unsigned integer of at most n bits stored on a variable
|
|
number of bytes. All the bytes except the last one have a '1' as
|
|
their first bit. The unsigned integer is represented as the
|
|
concatenation of the remaining 7 bit codewords. Only the shortest
|
|
encoding for a given unsigned integer shall be accepted by the
|
|
decoder (i.e. the first byte is never 0x80). Example:
|
|
|
|
Encoded bytes Unsigned integer value
|
|
0x08 8
|
|
0x84 0x1e 542
|
|
0xac 0xbe 0x17 728855
|
|
|
|
- ue(v) : unsigned integer 0-th order Exp-Golomb-coded (see HEVC
|
|
specification).
|
|
|
|
- b(8) is an arbitrary byte.
|
|
|
|
3) File format
|
|
--------------
|
|
|
|
3.1) Syntax
|
|
-----------
|
|
|
|
heic_file() {
|
|
|
|
file_magic u(32)
|
|
|
|
pixel_format u(3)
|
|
alpha1_flag u(1)
|
|
bit_depth_minus_8 u(4)
|
|
|
|
color_space u(4)
|
|
extension_present_flag u(1)
|
|
alpha2_flag u(1)
|
|
limited_range_flag u(1)
|
|
animation_flag u(1)
|
|
|
|
picture_width ue7(32)
|
|
picture_height ue7(32)
|
|
|
|
picture_data_length ue7(32)
|
|
if (extension_present_flag)
|
|
extension_data_length ue7(32)
|
|
|
|
if (extension_present_flag) {
|
|
extension_data()
|
|
}
|
|
|
|
hevc_header_and_data()
|
|
}
|
|
|
|
extension_data()
|
|
{
|
|
while (more_bytes()) {
|
|
extension_tag ue7(32)
|
|
extension_tag_length ue7(32)
|
|
if (extension_tag == 5) {
|
|
animation_control_extension(extension_tag_length)
|
|
} else {
|
|
for(j = 0; j < extension_tag_length; j++) {
|
|
extension_tag_data_byte b(8)
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
animation_control_extension(payload_length)
|
|
{
|
|
loop_count ue7(16)
|
|
frame_period_num ue7(16)
|
|
frame_period_den ue7(16)
|
|
while (more_bytes()) {
|
|
dummy_byte b(8)
|
|
}
|
|
}
|
|
|
|
hevc_header_and_data()
|
|
{
|
|
if (alpha1_flag || alpha2_flag) {
|
|
hevc_header()
|
|
}
|
|
hevc_header()
|
|
hevc_data()
|
|
}
|
|
|
|
hevc_header()
|
|
{
|
|
hevc_header_length ue7(32)
|
|
log2_min_luma_coding_block_size_minus3 ue(v)
|
|
log2_diff_max_min_luma_coding_block_size ue(v)
|
|
log2_min_transform_block_size_minus2 ue(v)
|
|
log2_diff_max_min_transform_block_size ue(v)
|
|
max_transform_hierarchy_depth_intra ue(v)
|
|
sample_adaptive_offset_enabled_flag u(1)
|
|
pcm_enabled_flag u(1)
|
|
if (pcm_enabled_flag) {
|
|
pcm_sample_bit_depth_luma_minus1 u(4)
|
|
pcm_sample_bit_depth_chroma_minus1 u(4)
|
|
log2_min_pcm_luma_coding_block_size_minus3 ue(v)
|
|
log2_diff_max_min_pcm_luma_coding_block_size ue(v)
|
|
pcm_loop_filter_disabled_flag u(1)
|
|
}
|
|
strong_intra_smoothing_enabled_flag u(1)
|
|
sps_extension_present_flag u(1)
|
|
if (sps_extension_present_flag) {
|
|
sps_range_extension_flag u(1)
|
|
sps_extension_7bits u(7)
|
|
}
|
|
if (sps_range_extension_flag) {
|
|
transform_skip_rotation_enabled_flag u(1)
|
|
transform_skip_context_enabled_flag u(1)
|
|
implicit_rdpcm_enabled_flag u(1)
|
|
explicit_rdpcm_enabled_flag u(1)
|
|
extended_precision_processing_flag u(1)
|
|
intra_smoothing_disabled_flag u(1)
|
|
high_precision_offsets_enabled_flag u(1)
|
|
persistent_rice_adaptation_enabled_flag u(1)
|
|
cabac_bypass_alignment_enabled_flag u(1)
|
|
}
|
|
trailing_bits u(v)
|
|
}
|
|
|
|
hevc_data()
|
|
{
|
|
for(i = 0; i < v; i++) {
|
|
hevc_data_byte b(8)
|
|
}
|
|
}
|
|
|
|
frame_duration_sei(payloadSize)
|
|
{
|
|
frame_duration u(16)
|
|
}
|
|
|
|
3.2) Semantics
|
|
--------------
|
|
|
|
'file_magic' is defined as 0x425047fb.
|
|
|
|
'pixel_format' indicates the chroma subsampling:
|
|
|
|
0 : Grayscale
|
|
1 : 4:2:0. Chroma at position (0.5, 0.5) (JPEG chroma position)
|
|
2 : 4:2:2. Chroma at position (0.5, 0) (JPEG chroma position)
|
|
3 : 4:4:4
|
|
4 : 4:2:0. Chroma at position (0, 0.5) (MPEG2 chroma position)
|
|
5 : 4:2:2. Chroma at position (0, 0) (MPEG2 chroma position)
|
|
|
|
The other values are reserved.
|
|
|
|
'alpha1_flag' and 'alpha2_flag' give information about the alpha plane:
|
|
|
|
alpha1_flag=0 alpha2_flag=0: no alpha plane.
|
|
|
|
alpha1_flag=1 alpha2_flag=0: alpha present. The color is not
|
|
premultiplied.
|
|
|
|
alpha1_flag=1 alpha2_flag=1: alpha present. The color is
|
|
premultiplied. The resulting non-premultiplied R', G', B' shall
|
|
be recovered as:
|
|
|
|
if A != 0
|
|
R' = min(R / A, 1), G' = min(G / A, 1), B' = min(B / A, 1)
|
|
else
|
|
R' = G' = B' = 1 .
|
|
|
|
alpha1_flag=0 alpha2_flag=1: the alpha plane is present and
|
|
contains the W color component (CMYK color). The resulting CMYK
|
|
data can be recovered as follows:
|
|
|
|
C = (1 - R), M = (1 - G), Y = (1 - B), K = (1 - W) .
|
|
|
|
In case no color profile is specified, the sRGB color R'G'B'
|
|
shall be computed as:
|
|
|
|
R' = R * W, G' = G * W, B' = B * W .
|
|
|
|
'bit_depth_minus_8' is the number of bits used for each component
|
|
minus 8. In this version of the specification, bit_depth_minus_8
|
|
<= 6.
|
|
|
|
'extension_present_flag' indicates that extension data are
|
|
present.
|
|
|
|
'color_space' specifies how to convert the color planes to
|
|
RGB. It must be 0 when pixel_format = 0 (grayscale):
|
|
|
|
0 : YCbCr (BT 601, same as JPEG and HEVC matrix_coeffs = 5)
|
|
1 : RGB (component order: G B R)
|
|
2 : YCgCo (same as HEVC matrix_coeffs = 8)
|
|
3 : YCbCr (BT 709, same as HEVC matrix_coeffs = 1)
|
|
4 : YCbCr (BT 2020 non constant luminance system, same as HEVC
|
|
matrix_coeffs = 9)
|
|
5 : reserved for BT 2020 constant luminance system, not
|
|
supported in this version of the specification.
|
|
|
|
The other values are reserved.
|
|
|
|
YCbCr is defined using the BT 601, BT 709 or BT 2020 conversion
|
|
matrices.
|
|
|
|
For RGB, G is stored as the Y plane. B in the Cb plane and R in
|
|
the Cr plane.
|
|
|
|
YCgCo is defined as HEVC matrix_coeffs = 8. Y is stored in the
|
|
Y plane. Cg in the Cb plane and Co in the Cr plane.
|
|
|
|
If no color profile is present, the RGB output data are assumed
|
|
to be in the sRGB color space [6].
|
|
|
|
'limited_range_flag': opposite of the HEVC video_full_range_flag.
|
|
The value zero indicates that the full range of each color
|
|
component is used. The value one indicates that a limited range
|
|
is used:
|
|
|
|
- (16 << (bit_depth - 8) to (235 << (bit_depth - 8)) for Y
|
|
and G, B, R,
|
|
- (16 << (bit_depth - 8) to (240 << (bit_depth - 8)) for Cb and Cr.
|
|
|
|
For the YCgCo color space, the range limitation shall be done on
|
|
the RGB data.
|
|
|
|
The alpha (or W) plane always uses the full range.
|
|
|
|
'animation_flag'. The value '1' indicates that more than one
|
|
frame are encoded in the hevc data. The animation control
|
|
extension must be present. If the decoder does not support
|
|
animations, it shall decode the first frame only and ignore the
|
|
animation information.
|
|
|
|
'picture_width' is the picture width in pixels. The value 0 is
|
|
not allowed.
|
|
|
|
'picture_height' is the picture height in pixels. The value 0 is
|
|
not allowed.
|
|
|
|
'picture_data_length' is the picture data length in bytes. The
|
|
special value of zero indicates that the picture data goes up to
|
|
the end of the file.
|
|
|
|
'extension_data_length' is the extension data length in bytes.
|
|
|
|
'extension_data()' is the extension data.
|
|
|
|
'extension_tag' is the extension tag. The following values are defined:
|
|
|
|
1: EXIF data.
|
|
|
|
2: ICC profile (see [4])
|
|
|
|
3: XMP (see [5])
|
|
|
|
4: Thumbnail (the thumbnail shall be a lower resolution version
|
|
of the image and stored in BPG format).
|
|
|
|
5: Animation control data.
|
|
|
|
The decoder shall ignore the tags it does not support.
|
|
|
|
'extension_tag_length' is the length in bytes of the extension tag.
|
|
|
|
'loop_count' gives the number of times the animation shall be
|
|
played. The value of 0 means infinite.
|
|
|
|
'frame_period_num' and 'frame_period_den' encode the default
|
|
delay between each frame as frame_period_num/frame_period_den
|
|
seconds. The value of 0 for 'frame_period_num' or
|
|
'frame_period_den' is forbidden.
|
|
|
|
'hevc_header_length' is the length in bytes of the following data
|
|
up to and including 'trailing_bits'.
|
|
|
|
'log2_min_luma_coding_block_size_minus3',
|
|
'log2_diff_max_min_luma_coding_block_size',
|
|
'log2_min_transform_block_size_minus2',
|
|
'log2_diff_max_min_transform_block_size',
|
|
'max_transform_hierarchy_depth_intra',
|
|
'sample_adaptive_offset_enabled_flag', 'pcm_enabled_flag',
|
|
'pcm_sample_bit_depth_luma_minus1',
|
|
'pcm_sample_bit_depth_chroma_minus1',
|
|
'log2_min_pcm_luma_coding_block_size_minus3',
|
|
'log2_diff_max_min_pcm_luma_coding_block_size',
|
|
'pcm_loop_filter_disabled_flag',
|
|
'strong_intra_smoothing_enabled_flag', 'sps_extension_flag'
|
|
'sps_extension_present_flag', 'sps_range_extension_flag'
|
|
'transform_skip_rotation_enabled_flag',
|
|
'transform_skip_context_enabled_flag',
|
|
'implicit_rdpcm_enabled_flag', 'explicit_rdpcm_enabled_flag',
|
|
'extended_precision_processing_flag',
|
|
'intra_smoothing_disabled_flag',
|
|
'high_precision_offsets_enabled_flag',
|
|
'persistent_rice_adaptation_enabled_flag',
|
|
'cabac_bypass_alignment_enabled_flag' are
|
|
the corresponding fields of the HEVC SPS syntax element.
|
|
|
|
'trailing_bits' has a value of 0 and has a length from 0 to 7
|
|
bits so that the next data is byte aligned.
|
|
|
|
'hevc_data()' contains the corresponding HEVC picture data,
|
|
excluding the first NAL start code (i.e. the first 0x00 0x00 0x01
|
|
or 0x00 0x00 0x00 0x01 bytes). The VPS and SPS NALs shall not be
|
|
included in the HEVC picture data. The decoder can recover the
|
|
necessary fields from the header by doing the following
|
|
assumptions:
|
|
|
|
- vps_video_parameter_set_id = 0
|
|
- sps_video_parameter_set_id = 0
|
|
- sps_max_sub_layers = 1
|
|
- sps_seq_parameter_set_id = 0
|
|
- chroma_format_idc: for picture data:
|
|
chroma_format_idc = pixel_format
|
|
for alpha data:
|
|
chroma_format_idc = 0.
|
|
- separate_colour_plane_flag = 0
|
|
- pic_width_in_luma_samples = ceil(picture_width/cb_size) * cb_size
|
|
- pic_height_in_luma_samples = ceil(picture_height/cb_size) * cb_size
|
|
with cb_size = 1 << log2_min_luma_coding_block_size
|
|
- bit_depth_luma_minus8 = bit_depth_minus_8
|
|
- bit_depth_chroma_minus8 = bit_depth_minus_8
|
|
- max_transform_hierarchy_depth_inter = max_transform_hierarchy_depth_intra
|
|
- scaling_list_enabled_flag = 0
|
|
- log2_max_pic_order_cnt_lsb_minus4 = 4
|
|
- amp_enabled_flag = 1
|
|
- sps_temporal_mvp_enabled_flag = 1
|
|
|
|
|
|
Alpha data encoding:
|
|
|
|
- If alpha data is present, all the corresponding NALs have
|
|
nuh_layer_id = 1. NALs for color data shall have nuh_layer_id =
|
|
0.
|
|
- Alpha data shall use the same tile sizes as color data and
|
|
shall have the same entropy_coding_sync_enabled_flag value as
|
|
color data.
|
|
- Alpha slices shall use the same number of coding units as color
|
|
slices and should be interleaved with color slices. alpha NALs
|
|
shall come before the corresponding color NALs.
|
|
|
|
Animation encoding:
|
|
|
|
- The optional prefix SEI with payloadType = 257 (defined in
|
|
frame_duration_sei()) specifies that the image must be repeated
|
|
'frame_duration' times. 'frame_duration' shall not be zero. If
|
|
the frame duration SEI is not present for a given frame,
|
|
frame_duration = 1 shall be assumed by the decoder. If alpha
|
|
data is present, the frame duration SEI shall be present only
|
|
for the color data.
|
|
|
|
3.3) HEVC Profile
|
|
-----------------
|
|
|
|
Conforming HEVC bit streams shall conform to the Main 4:4:4 16 Still
|
|
Picture, Level 8.5 of the HEVC specification with the following
|
|
modifications.
|
|
|
|
- separate_colour_plane_flag shall be 0 when present.
|
|
|
|
- bit_depth_luma_minus8 <= 6
|
|
|
|
- bit_depth_chroma_minus8 = bit_depth_luma_minus8
|
|
|
|
- explicit_rdpcm_enabled_flag = 0 (does not matter for intra frames)
|
|
|
|
- extended_precision_processing_flag = 0
|
|
|
|
- cabac_bypass_alignment_enabled_flag = 0
|
|
|
|
- high_precision_offsets_enabled_flag = 0 (does not matter for intra frames)
|
|
|
|
- If the encoded image is larger than the size indicated by
|
|
picture_width and picture_height, the lower right part of the decoded
|
|
image shall be cropped. If a horizontal (resp. vertical) decimation by
|
|
two is done for the chroma and that the width (resp. height) is n
|
|
pixels, ceil(n/2) pixels must be kept as the resulting chroma
|
|
information.
|
|
|
|
When animations are present, the next frames shall be encoded with the
|
|
following changes:
|
|
|
|
- P slices are allowed (but B slices are not allowed).
|
|
|
|
- Only the previous picture can be used as reference (hence a DPB size
|
|
of 2 pictures).
|
|
|
|
4) Design choices
|
|
-----------------
|
|
|
|
(This section is informative)
|
|
|
|
- Our design principle was to keep the format as simple as possible
|
|
while taking the HEVC codec as basis. Our main metric to evaluate
|
|
the simplicity was the size of a software decoder which outputs 32
|
|
bit RGBA pixel data.
|
|
|
|
- Pixel formats: we wanted to be able to convert JPEG images to BPG
|
|
with as little loss as possible. So supporting the same color space
|
|
(BT 601 YCbCr) with the same range (full range) and most of the
|
|
allowed JPEG chroma formats (4:4:4, 4:2:2, 4:2:0 or grayscale) was
|
|
mandatory to avoid going back to RGB or doing a subsampling or
|
|
interpolation.
|
|
|
|
- Alpha support: alpha support is mandatory. We chose to use a
|
|
separate HEVC monochrome plane to handle it instead of another
|
|
format to simplify the decoder. The color is either
|
|
non-premultiplied or premultiplied. Premultiplied alpha usually
|
|
gives a better compression. Non-premultiplied alpha is supported in
|
|
case no loss is needed on the color components. In order to allow
|
|
progressive display, the alpha and color data are interleaved (the
|
|
nuh_layed_id NAL field is 0 for color data and 1 for alpha
|
|
data). The alpha and color slices should contain the same number of
|
|
coding units and each alpha slice should come before the
|
|
corresponding color slice. Since alpha slices are usually smaller
|
|
than color slices, it allows a progressive display even if there is
|
|
a single slice.
|
|
|
|
- Color spaces: In addition to YCbCr, RGB is supported for the high
|
|
quality or lossless cases. YCgCo is supported because it may give
|
|
slightly better results than YCbCr for high quality images. CMYK is
|
|
supported so that JPEGs containing this color space can be
|
|
converted. The alpha plane is used to store the W (1-K) plane. The
|
|
data is stored with inverted components (1-X) so that the conversion
|
|
to RGB is simplified. The support of the BT 709 and BT 2020 (non
|
|
constant luminance) YCbCr encodings and of the limited range color
|
|
values were added to reduce the losses when converting video frames.
|
|
|
|
- Bit depth: we decided to support the HEVC bit depths 8 to 14. The
|
|
added complexity is small and it allows to support high quality
|
|
pictures from cameras.
|
|
|
|
- Picture file format: keeping a completely standard HEVC stream would
|
|
have meant a more difficult parsing for the picture header which is
|
|
a problem for the various image utilities to get the basic picture
|
|
information (pixel format, width, height). So we added a small
|
|
header before the HEVC bit stream. The picture header is byte
|
|
oriended so it is easy to parse.
|
|
|
|
- HEVC bit stream: the standard HEVC headers (the VPS and SPS NALs)
|
|
give an overhead of about 60 bytes for no added value in the case of
|
|
picture compression. Since the alpha plane uses a different HEVC bit
|
|
stream, it also adds the same overhead again. So we removed the VPS
|
|
and SPS NALs and added a very small header with the equivalent
|
|
information (typically 4 bytes). We also removed the first NAL start
|
|
code which is not useful. It is still possible to reconstruct a
|
|
standard HEVC stream to feed an unmodified hardware decoder if needed.
|
|
|
|
- Extensions: the metadata are stored at the beginning of the file so
|
|
that they can be read at the same time as the header. Since metadata
|
|
tend to evolve faster than the image formats, we left room for
|
|
extension by using a (tag, lengh) representation. The decoder can
|
|
easily skip all the metadata because their length is explicitly
|
|
stored in the image header.
|
|
|
|
- Animations: they are interesting compared to WebM or MP4 short
|
|
videos for the following reasons:
|
|
* transparency is supported
|
|
* lossless encoding is supported
|
|
* the decoding resources are smaller than with a generic video
|
|
player because only two frames need to be stored (DPB size = 2).
|
|
* the animations are expected to be small so the decoder can cache
|
|
all the decoded frames in memory.
|
|
* the animation can be decoded as a still image if the decoder
|
|
does not support animations.
|
|
Compared to the other animated image formats (GIF, APNG, WebP), the
|
|
compression ratio is usually much higher because of the HEVC inter
|
|
frame prediction.
|
|
|
|
5) References
|
|
-------------
|
|
|
|
[1] High efficiency video coding (HEVC) version 2 (ITU-T Recommendation H.265)
|
|
|
|
[2] JPEG File Interchange Format version 1.02 ( http://www.w3.org/Graphics/JPEG/jfif3.pdf )
|
|
|
|
[3] EXIF version 2.2 (JEITA CP-3451)
|
|
|
|
[4] The International Color Consortium ( http://www.color.org/ )
|
|
|
|
[5] Extensible Metadata Platform (XMP) http://www.adobe.com/devnet/xmp.html
|
|
|
|
[6] sRGB color space, IEC 61966-2-1
|