Skip to content

Flac

https://xiph.org/flac/
https://xiph.org/flac/documentation_format_overview.html
https://xiph.org/flac/format.html

Base

FLAC流的基本结构是: • 四个字节的字符串“ fLaC” • STREAMINFO 元数据块 • 零个或多个其他元数据块 • 一个或多个音频帧 前四个字节用于标识FLAC流。 后面的元数据包含有关流的所有信息,音频数据本身除外。 元数据之后是编码的音频数据。

The basic structure of a FLAC stream is: • The four byte string "fLaC" • The STREAMINFO metadata block • Zero or more other metadata blocks • One or more audio frames The first four bytes are to identify the FLAC stream. The metadata that follows contains all the information about the stream except for the audio data itself. After the metadata comes the encoded audio data.

元数据

FLAC定义了几种类型的元数据块(有关完整列表,请参见格式页)。元数据块可以是任何长度,并且可以自定义。解码器应当允许跳过任何它无法解析元数据类型。不过有一个元数据类型是必需的:STREAMINFO 块。 该块包含有采样率、声道数量等信息,并且可以帮助解码器管理其缓冲区的数据,例如最小和最大数据速率以及最小和最大块大小。 STREAMINFO 块中还包含有未编码音频数据的MD5签名。 这对于检查整个流是否存在传输错误很有用。

其他块允许填充,查找表,标签,CUE 表和特定于应用程序的数据。 有一些 flac 选项可用于添加 PADDING 块或指定跳转点。对于 FLAC 的跳转来说跳转点不是必需的,但是它们可以加快跳转速度,或在编辑应用程序中用作提示。

另外,如果您需要自定义元数据块,则可以定义自己的元数据块并在此处请求ID。 然后,您可以在编码时保留正确大小的 PADDING 块,并在编码后用 APPLICATION 块覆盖 padding 块。 生成的流将与 FLAC 兼容; 知道您的定义的元数据的解码器可以正确的使用它,其余的解码器也将安全的忽略它。

METADATA

FLAC defines several types of metadata blocks (see the format page for the complete list). Metadata blocks can be any length and new ones can be defined. A decoder is allowed to skip any metadata types it does not understand. Only one is mandatory: the STREAMINFO block. This block has information like the sample rate, number of channels, etc., and data that can help the decoder manage its buffers, like the minimum and maximum data rate and minimum and maximum block size. Also included in the STREAMINFO block is the MD5 signature of the unencoded audio data. This is useful for checking an entire stream for transmission errors.

Other blocks allow for padding, seek tables, tags, cuesheets, and application-specific data. There are flac options for adding PADDING blocks or specifying seek points. FLAC does not require seek points for seeking but they can speed up seeks, or be used for cueing in editing applications.

Also, if you have a need of a custom metadata block, you can define your own and request an ID here. Then you can reserve a PADDING block of the correct size when encoding, and overwrite the padding block with your APPLICATION block after encoding. The resulting stream will be FLAC compatible; decoders that are aware of your metadata can use it and the rest will safely ignore it.

音频数据

元数据之后是编码的音频数据。 音频数据和元数据不会交织。 像大多数音频编解码器一样,FLAC将未编码的音频数据拆分为多个块,并分别对每个块进行编码。 编码的块被打包到一个帧中然后添加到流中。参考编码器对整个流使用单个块大小,不过对于 FLAC 来说,并不是必须这样做。

AUDIO DATA

After the metadata comes the encoded audio data. Audio data and metadata are not interleaved. Like most audio codecs, FLAC splits the unencoded audio data into blocks, and encodes each block separately. The encoded block is packed into a frame and appended to the stream. The reference encoder uses a single block size for the whole stream but the FLAC format does not require it.

块大小是编码的重要参数。 如果太小,帧开销将降低压缩率。 如果太大,则压缩机的建模阶段将无法生成有效的模型。 了解 FLAC 的建模将帮助您通过更改块大小来改善某些输入的压缩率。 在大多数的情况下,对 44.1kHz 音频使用线性预测(LP),最佳块大小在 2-6 k 个 samples之间。 在这种情况下,flac 的默认块大小为 4096。使用快速固定的预测变量,通常最好使用较小的块大小,因为帧头较小。

BLOCKING

The block size is an important parameter to encoding. If it is too small, the frame overhead will lower the compression. If it is too large, the modeling stage of the compressor will not be able to generate an efficient model. Understanding FLAC's modeling will help you to improve compression for some kinds of input by varying the block size. In the most general case, using linear prediction on 44.1kHz audio, the optimal block size will be between 2-6 ksamples. flac defaults to a block size of 4096 in this case. Using the fast fixed predictors, a smaller block size is usually preferable because of the smaller frame header.

声道间去相关

在立体声输入的情况下,一旦数据被阻塞,就可以选择通过通道间去相关阶段。 通过以下转换将左右声道转换为中央和侧面声道:中=(左+右)/ 2,侧面=左-右。 与联合立体声(Joint Stereo)不同,这是一个无损过程。 对于普通的CD音频,这将会产生显著的额外压缩。 Flac 为此提供了两个选项:-m始终压缩块的左右和中间版本,并采用最小的帧; -M自适应地在左右和中间之间切换。

INTER-CHANNEL DECORRELATION

In the case of stereo input, once the data is blocked it is optionally passed through an inter-channel decorrelation stage. The left and right channels are converted to center and side channels through the following transformation: mid = (left + right) / 2, side = left - right. This is a lossless process, unlike joint stereo. For normal CD audio this can result in significant extra compression. flac has two options for this: -m always compresses both the left-right and mid-side versions of the block and takes the smallest frame, and -M, which adaptively switches between left-right and mid-side.

建模

在下一个阶段,编码器尝试使用某种函数对信号进行逼近,逼近的结果(称为残余,残差或误差)需要用更少的比特来对样本进行编码。该函数的参数也必须被传输,因此它们不应该太复杂导致耗尽了节省的空间。FLAC有两种形成近似值的方法:1)将简单多项式拟合到信号;2)通用线性预测编码(LPC)。在这里,我不涉及细节,仅涉及编码选项的一些一般性信息。

首先,固定多项式预测(用-l 0指定)要快得多,但不如LPC准确。最大 LPC 阶数越高,模型将越慢,但越精确。 而且,随着阶数的增加,精确度的提升也越来越难。 同样,在某个时候(通常在9阶左右),编码器中猜测最佳使用顺序的部分将开始出错,并且压缩率也会略有下降。 到那时,您将不得不使用详尽的搜索选项 -e 来克服这一点,这要慢得多。

其次,固定预测变量的参数可以3位传输,而LPC模型的参数则取决于每采样位数和 LPC 阶数。这意味着帧头长度取决于您选择的方法和阶数,并可能影响最佳块大小。

MODELING

In the next stage, the encoder tries to approximate the signal with a function in such a way that when the approximation is subracted, the result (called the residual, residue, or error) requires fewer bits-per-sample to encode. The function's parameters also have to be transmitted so they should not be so complex as to eat up the savings. FLAC has two methods of forming approximations: 1) fitting a simple polynomial to the signal; and 2) general linear predictive coding (LPC). I will not go into the details here, only some generalities that involve the encoding options.

First, fixed polynomial prediction (specified with -l 0) is much faster, but less accurate than LPC. The higher the maximum LPC order, the slower, but more accurate, the model will be. However, there are diminishing returns with increasing orders. Also, at some point (usually around order 9) the part of the encoder that guesses what is the best order to use will start to get it wrong and the compression will actually decrease slightly; at that point you will have to you will have to use the exhaustive search option -e to overcome this, which is significantly slower.

Second, the parameters for the fixed predictors can be transmitted in 3 bits whereas the parameters for the LPC model depend on the bits-per-sample and LPC order. This means the frame header length varies depending on the method and order you choose and can affect the optimal block size.

残差编码

一旦生成了模型,编码器就会对原始信号进行近似处理,以获得残差(误差)信号。 然后对错误信号进行无损编码。 为此,FLAC充分利用了以下事实:误差信号通常具有拉普拉斯(双面几何)分布,并且存在一组称为莱斯码(Rice Code)的特殊霍夫曼码,可用于有效地编码此类信号,这种编码方式快速且无需字典。

莱斯编码涉及找到与信号分布匹配的单个参数,然后使用该参数生成代码。 随着分布的变化,最佳参数也会发生变化,因此FLAC支持一种允许根据需要更改参数的方法。 残差可以分为几个上下文或分区,每个都有自己的 Rice 参数。 Flac 允许您使用 -r 选项指定如何完成分区。 通过使用选项-r n,n,可以将残差分解为2 ^ n个分区。 参数n称为分区顺序。 此外,通过指定-r m,n,可以使编码器搜索m到n个分区顺序,并以最佳顺序搜索。 通常,n的选择不会影响编码速度,但是m,n会影响编码速度。 m和n之间的差异越大,编码器搜索最佳顺序所花费的时间就越长。 块大小也会影响最佳顺序。

RESIDUAL CODING

Once the model is generated, the encoder subracts the approximation from the original signal to get the residual (error) signal. The error signal is then losslessly coded. To do this, FLAC takes advantage of the fact that the error signal generally has a Laplacian (two-sided geometric) distribution, and that there are a set of special Huffman codes called Rice codes that can be used to efficiently encode these kind of signals quickly and without needing a dictionary.

Rice coding involves finding a single parameter that matches a signal's distribution, then using that parameter to generate the codes. As the distribution changes, the optimal parameter changes, so FLAC supports a method that allows the parameter to change as needed. The residual can be broken into several contexts or partitions, each with it's own Rice parameter. flac allows you to specify how the partitioning is done with the -r option. The residual can be broken into 2^n partitions, by using the option -r n,n. The parameter n is called the partition order. Furthermore, the encoder can be made to search through m to n partition orders, taking the best one, by specifying -r m,n. Generally, the choice of n does not affect encoding speed but m,n does. The larger the difference between m and n, the more time it will take the encoder to search for the best order. The block size will also affect the optimal order.

音频帧的前面是帧头,后面是帧脚。 标头以同步码开头,包含解码器播放流所需的最少信息,例如采样率,每个采样的位等。它还包含块或采样号以及帧标头的 8 位 CRC 。同步代码,帧头 CRC 和块/样本数允许重新同步和搜索,即使在没有搜索点的情况下也是如此。 帧尾包含整个编码帧的 16 位 CRC ,用于错误检测。 如果参考解码器检测到 CRC 错误,它将生成一个静音块。

FRAMING

An audio frame is preceded by a frame header and trailed by a frame footer. The header starts with a sync code, and contains the minimum information necessary for a decoder to play the stream, like sample rate, bits per sample, etc. It also contains the block or sample number and an 8-bit CRC of the frame header. The sync code, frame header CRC, and block/sample number allow resynchronization and seeking even in the absence of seek points. The frame footer contains a 16-bit CRC of the entire encoded frame for error detection. If the reference decoder detects a CRC error it will generate a silent block.

其他

为方便起见,参考解码器知道如何跳过ID3v1和ID3v2标签。 但是请注意,FLAC 规范不需要兼容的实现来以任何形式支持ID3,因此强烈建议不要使用它们。

Flac 具有验证选项 -V ,可在编码时验证输出。使用此选项,解码器与编码器并行运行,并将其输出与原始输入进行比较。 如果发现差异,flac将停止并显示错误。

MISCELLANEOUS

As a convenience, the reference decoder knows how to skip ID3v1 and ID3v2 tags. Note however that the FLAC specification does not require compliant implementations to support ID3 in any form and their use is strongly discouraged.

flac has a verify option -V that verifies the output while encoding. With this option, a decoder is run in parallel to the encoder and its output is compared against the original input. If a difference is found flac will stop with an error.

参考资料: https://xiph.org/flac/documentation_format_overview.html