Navigation

Benchmark of Nayuki’s FLAC encoder

Introduction

I built my own FLAC encoder starting from basic principles, in order to gain a deep understanding of how the FLAC format format works. My codebase is still a work in progress with inadequate documentation, but runnable code can be found in its GitHub repository. The code is considered experimental, and any output should be verified with the Xiph.Org official FLAC tool’s test option (flac -t YourFile.flac) before any serious usage/distribution/archival.

The main feature I wanted to explore in my encoder was the use of variable block sizes. Xiph.Org’s documentation states in multiple places[0][1] that the reference encoder only implements logic to generate fixed/constant block sizes[2]. What I knew was that when given a set of possible block sizes to choose from (e.g. {2048, 3072, 4096, 6144} samples), it is possible to apply dynamic programming to find the optimal sequence of block sizes to use to encode the whole audio file.

In this benchmark I tested the output file size and execution time of FLAC encoders from these vendors:

Summary of the best compression results for each vendor, under the subset and lax modes:

Vendor

Subset/Lax

Song A

Song B

Song C

Song D

Xiph.Org

Subset

11 196 245

11 539 714

38 167 311

31 371 358

Flake

Subset

11 227 995

11 563 030

38 235 155

31 402 095

Nayuki

Subset

11 198 081

11 333 596

38 227 336

31 366 408

Xiph.Org

Lax

11 105 532

11 532 241

38 093 818

31 232 660

Flake

Lax

11 092 458

11 433 924

38 132 915

31 115 846

Nayuki

Lax

11 032 067

11 288 787

38 081 574

31 032 849

We can see that in the subset format, Xiph.Org and Nayuki each take the lead half of the time, and Flake trails behind noticeably. When there are no format constraints, Flake is often better than Xiph.Org, and Nayuki’s encoder consistently produces the smallest output – though at the cost of immense CPU time.

The FLAC subset format is produced by the official FLAC tool by default, and is recommended for best compatibility toward embedded music players with limited CPU/memory. Because the official tool forces you add the --lax option to escape the subset format constraints, it suggests that they don’t recommend you to do it. Therefore in this benchmark, I clearly label the results achieved under the subset format versus the results achieved under no constraints (“lax”). Having access to the full/lax FLAC format is important because it dramatically raises the maximum LPC order, Rice partition order, and block size. In practice this saves around 100 KB for a typical 5-minute song (or roughly 0.3 absolute percentage points).

Comment: Soft pop music with one voice and a small set of instruments – it compresses more readily than loud pop music.

Note: The links to YouTube copies of these songs are for listening only, not for encoder testing. Lossy and lossless copies of clips #2 and #3 can be found online (as they are pop music). But the exact versions of clips #0 and #1 produced by Philips Classics are hard or impossible to find – if you search online you will almost certainly encounter a different performance/recording of the same piece of classical music.

File size benchmark

The following table shows the output .flac file size (in bytes) when encoded with various parameters/switches. Note that the default padding is disabled (which saves about 8 KB). You can hover over cells in the “Profile” column to see the detailed list of encoding options. Cells are color-coded from red for largest/worst to blue for smallest/best.

Vendor

Profile

Song A

Song B

Song C

Song D

Raw samples

24 550 176

38 542 224

47 987 856

51 767 520

Xiph.Org

00 (Subset)

12 012 649

12 765 650

40 260 345

34 644 421

Xiph.Org

01 (Subset)

12 041 679

12 605 764

39 654 273

33 656 746

Xiph.Org

02 (Subset)

11 989 312

12 584 084

39 642 052

33 514 701

Xiph.Org

03 (Subset)

11 331 657

12 124 438

39 134 639

32 687 770

Xiph.Org

04 (Subset)

11 335 394

11 992 890

38 318 774

31 703 722

Xiph.Org

05 (Subset)

11 292 888

11 916 555

38 314 870

31 615 686

Xiph.Org

06 (Subset)

11 260 189

11 735 452

38 301 526

31 532 011

Xiph.Org

07 (Subset)

11 228 542

11 759 302

38 215 718

31 449 049

Xiph.Org

08 (Subset)

11 217 563

11 688 041

38 210 760

31 427 443

Xiph.Org

09 (Subset)

11 217 561

11 688 041

38 210 760

31 427 443

Xiph.Org

10 (Subset)

11 219 012

11 694 989

38 202 896

31 417 135

Xiph.Org

11 (Subset)

11 202 102

11 567 040

38 198 970

31 382 581

Xiph.Org

12 (Subset)

11 196 245

11 539 714

38 167 311

31 371 358

Flake

00 (Subset)

11 366 474

11 918 232

38 465 695

31 836 311

Flake

01 (Subset)

11 305 452

11 704 346

38 354 061

31 596 711

Flake

02 (Subset)

11 276 927

11 703 583

38 278 805

31 478 365

Flake

03 (Subset)

11 251 067

11 619 430

38 241 267

31 406 576

Flake

04 (Subset)

11 248 041

11 593 401

38 241 150

31 404 713

Flake

05 (Subset)

11 231 746

11 573 035

38 245 383

31 411 903

Flake

06 (Subset)

11 227 995

11 563 030

38 235 155

31 402 095

Nayuki

00 (Subset)

11 948 303

12 501 207

39 562 748

33 409 321

Nayuki

01 (Subset)

11 264 792

11 402 799

38 329 603

31 502 982

Nayuki

02 (Subset)

11 223 624

11 367 675

38 246 673

31 390 886

Nayuki

03 (Subset)

11 217 348

11 356 585

38 244 567

31 382 724

Nayuki

04 (Subset)

11 200 349

11 341 761

38 229 105

31 371 984

Nayuki

05 (Subset)

11 199 483

11 338 107

38 228 320

31 369 881

Nayuki

06 (Subset)

11 198 810

11 335 321

38 227 899

31 368 308

Nayuki

07 (Subset)

11 198 081

11 333 596

38 227 336

31 366 408

Xiph.Org

13 (Lax)

11 157 753

11 714 779

38 157 743

31 334 704

Xiph.Org

14 (Lax)

11 157 751

11 714 779

38 157 743

31 334 704

Xiph.Org

15 (Lax)

11 139 539

11 560 395

38 197 497

31 315 586

Xiph.Org

16 (Lax)

11 113 210

11 559 055

38 145 088

31 258 435

Xiph.Org

17 (Lax)

11 114 838

11 578 463

38 108 506

31 232 660

Xiph.Org

18 (Lax)

11 128 191

11 610 087

38 097 994

31 235 965

Xiph.Org

19 (Lax)

11 105 532

11 532 241

38 093 818

31 241 266

Flake

07 (Lax)

11 134 291

11 592 342

38 183 267

31 218 520

Flake

08 (Lax)

11 130 256

11 566 606

38 180 273

31 210 587

Flake

09 (Lax)

11 120 167

11 548 126

38 188 437

31 227 148

Flake

10 (Lax)

11 120 457

11 541 471

38 178 257

31 215 731

Flake

11 (Lax)

11 097 885

11 471 292

38 145 971

31 139 616

Flake

12 (Lax)

11 097 541

11 459 784

38 141 017

31 130 928

Flake

13 (Lax)

11 102 137

11 453 551

38 137 336

31 122 361

Flake

14 (Lax)

11 092 458

11 433 924

38 132 915

31 115 846

Nayuki

08 (Lax)

11 157 900

11 370 585

38 222 635

31 294 038

Nayuki

09 (Lax)

11 115 962

11 377 733

38 261 969

31 248 579

Nayuki

10 (Lax)

11 184 068

11 317 177

38 182 939

31 319 823

Nayuki

11 (Lax)

11 050 146

11 309 086

38 126 946

31 078 568

Nayuki

12 (Lax)

11 172 036

11 305 836

38 158 824

31 301 848

Nayuki

13 (Lax)

11 033 035

11 291 614

38 081 695

31 034 557

Nayuki

14 (Lax)

11 032 067

11 288 787

38 081 574

31 032 849

Notes: The size of the raw samples includes only the audio data, but not even critical metadata like sample rate, bit depth, and number of channels. Dividing the .flac file size by the raw samples size gives the compression ratio, where smaller is better (e.g. 38081695÷47987856 ≈ 0.794 for Song C). A minimal WAV file is always 44 bytes larger than the raw samples alone, and does contain critical metadata necessary for playback.

Xiph.Org: The preset compression levels from -0 to -8 (represented in my profiles #00 to #08) are all so fast that you might as well use -8 to get the best compression. Looking at profiles #15 to #18, increasing the block size from 3072 to 8192 samples sometimes helps and sometimes hurts. By contrast, Nayuki’s encoder manages to use plenty of large 12288-sample blocks in the most efficient encoding of all the test clips. Exhaustive LPC order search (-e) is one of the better things you can do (e.g. #14 vs. #16). The -p option helps somewhat (#11 vs. #12, and #16 vs. #19). Increasing the maximum Rice partition order basically does nothing (#08 vs. #09, and #13 vs. #14).

Flake: Within the subset format, the best setting for Flake (even with variable block sizes enabled) cannot match for Xiph.Org’s smaller output size. However Flake pulls ahead under the lax format, and the best setting usually beats Xiph.Org under lax mode. We can see in profiles #08 to #10 that -v 0 (constant block size) is sometimes better than -v 1 (variable block size heuristic 1), but -v 2 (variable block size heuristic 2) is always better than both (but takes 5× more CPU time).

Nayuki, subset: When using the same constraints as Xiph.Org, such as the subset format with constant block size, Nayuki’s encoder basically cannot compete with Xiph.Org on output size (see Xiph.Org #12 vs. Nayuki #03). Because both encoders are exhaustively searching all the available LPC orders (up to 12), Rice partition orders (up to 8), and stereo coding modes, it implies that the only place where Nayuki’s encoder encoder lags behind Xiph.Org is the calculation/choice of LPC coefficients. When only constrained to the subset format but allowing variable block size (with a maximum of 4608 samples), Nayuki’s encoder gains some ground (see Xiph.Org #12 vs. Nayuki #07), but the inconsistent results don’t justify the expenditure of CPU time.

Nayuki, lax: Under lax mode with the full power of the FLAC file format, Nayuki’s variable block size encoding can finally beat Xiph.Org’s best output consistently (Xiph.Org #19 vs. Nayuki #14), but again at a huge cost of CPU time. Presumably Nayuki’s encoder is still disadvantaged compared to Xiph.Org in terms of LPC coefficients; if an encoder can combine the qualities of near-optimal variable block sizing with superior LPC coefficient selection, it would be interesting to see how much smaller the output could be.

There are tools to examine what block sizes are used in a FLAC file, which is helpful to know when a file uses variable block size encoding. You can use the official FLAC tool’s -a option to analyze an existing .flac file, which generates a megabyte-long text file with information about every subframe of encoded audio. Also you can use Nayuki’s ShowFlacFileStats tool, which prints histograms/graphs to the console.

Encoding time benchmark

The table below shows the number of seconds to encode the uncompressed WAV file to a FLAC file. You can hover over cells in the “Profile” column to see the detailed list of encoding options. Cells are color-coded from red for slowest/worst to blue for fastest/best.

Vendor

Profile

Song A

Song B

Song C

Song D

Audio length

139.2

218.5

272.0

293.5

Xiph.Org

00 (Subset)

0.2

0.3

0.4

0.4

Xiph.Org

01 (Subset)

0.2

0.3

0.4

0.4

Xiph.Org

02 (Subset)

0.2

0.3

0.5

0.5

Xiph.Org

03 (Subset)

0.2

0.3

0.4

0.4

Xiph.Org

04 (Subset)

0.2

0.3

0.5

0.5

Xiph.Org

05 (Subset)

0.3

0.4

0.6

0.6

Xiph.Org

06 (Subset)

0.3

0.5

0.8

0.8

Xiph.Org

07 (Subset)

0.4

0.6

0.8

0.8

Xiph.Org

08 (Subset)

0.5

0.8

1.2

1.2

Xiph.Org

09 (Subset)

0.6

1.0

1.4

1.4

Xiph.Org

10 (Subset)

0.6

1.0

1.4

1.4

Xiph.Org

11 (Subset)

2.7

4.1

5.7

6.0

Xiph.Org

12 (Subset)

20.1

31.6

42.6

44.7

Xiph.Org

13 (Lax)

2.1

2.7

4.3

4.8

Xiph.Org

14 (Lax)

2.3

3.0

4.9

5.2

Xiph.Org

15 (Lax)

30.1

44.4

63.4

66.6

Xiph.Org

16 (Lax)

30.4

44.5

64.0

67.1

Xiph.Org

17 (Lax)

29.9

43.6

62.8

65.7

Xiph.Org

18 (Lax)

30.2

43.7

63.7

66.4

Xiph.Org

19 (Lax)

225.3

330.2

474.1

497.7

Flake

00 (Subset)

0.6

0.9

1.3

1.4

Flake

01 (Subset)

0.8

1.2

1.7

1.8

Flake

02 (Subset)

0.9

1.4

1.9

2.0

Flake

03 (Subset)

1.1

1.7

2.2

2.4

Flake

04 (Subset)

1.6

2.6

3.4

3.6

Flake

05 (Subset)

1.7

2.6

3.4

3.7

Flake

06 (Subset)

8.6

14.0

17.6

19.2

Flake

07 (Lax)

1.9

3.0

4.0

4.4

Flake

08 (Lax)

5.0

7.9

10.1

11.0

Flake

09 (Lax)

5.0

7.9

10.2

11.1

Flake

10 (Lax)

25.5

40.9

51.9

56.5

Flake

11 (Lax)

4.9

7.8

10.0

10.9

Flake

12 (Lax)

25.2

40.4

51.2

55.8

Flake

13 (Lax)

4.8

7.7

9.7

10.6

Flake

14 (Lax)

24.3

39.0

49.5

53.9

Nayuki

00 (Subset)

7.7

10.3

17.9

17.3

Nayuki

01 (Subset)

9.9

14.5

23.5

23.1

Nayuki

02 (Subset)

13.6

20.3

31.2

31.3

Nayuki

03 (Subset)

121.4

193.4

268.4

283.0

Nayuki

04 (Subset)

357.8

561.3

764.9

795.2

Nayuki

05 (Subset)

642.9

1003.1

1344.1

1426.9

Nayuki

06 (Subset)

1180.3

1862.0

2456.9

2583.3

Nayuki

07 (Subset)

2160.2

3465.1

4357.3

4678.9

Nayuki

08 (Lax)

26.4

40.8

58.1

60.2

Nayuki

09 (Lax)

40.5

63.0

87.0

91.1

Nayuki

10 (Lax)

110.2

172.0

234.4

245.1

Nayuki

11 (Lax)

369.3

583.9

773.1

821.0

Nayuki

12 (Lax)

831.1

1305.7

1758.2

1844.7

Nayuki

13 (Lax)

2775.7

4398.4

5816.8

6187.1

Nayuki

14 (Lax)

4430.0

7017.4

9189.0

9754.3

Notes: The audio length can be used to gauge how fast an encoder is relative to real time. In terms of raw data rate, all the test clips are 44100 Hz, 16-bit, stereo (CD quality). Benchmarks were performed on Windows 8.1 Pro 64-bit, Intel Core i5-4690, 3.50 GHz, single core operation. For Xiph.Org FLAC 1.3.2, the 64-bit EXE was used. Flake 0.11 only has a 32-bit EXE. Nayuki’s FLAC encoder was run on the Oracle Java 1.8.0_45 64-bit JVM. Files are read from and written to an SSD, so I/O time is considered negligible.

Xiph.Org’s encoder is fast for all but the ludicrous exhaustive searches. The strongest standard (non-custom) option of -8 runs well over 100× real time speed. Even in lax mode with the maximum LPC order of 32 (-l 32) and exhaustive LPC order search (-e), it still runs several times faster than real time. Only when exhaustive LPC coefficient optimization is enabled (-p) does the amount of time start to become unreasonable (e.g. profile #19 in my trials). Beyond the settings I explored, not many options remain to increase this encoder’s search space and run time – the only ones left are the block size and list of apodization functions. All in all, the reference encoder implementation is a solid, efficient piece of engineering.

Flake is also quite fast, in more or less the same order of magnitude as Xiph.Org’s encoder. Indeed the author stated fast encoding as a feature on their project page. I would say that for the same computation time it produces roughly the same compression ratio as Xiph.Org. It appears that the most intensive setting is still faster than some of Xiph.Org’s intensive settings, and I have exhausted the ways to make Flake search more thoroughly. The variable block size option -v 2 is about 5 times slower than -v 1 (see profile #05 vs. #06, #13 vs. #14, etc.), but almost always produces smaller files. The option value -v 1 takes the same amount of time as -v 0 (constant block size), but sometimes produces bigger files and sometimes smaller ones (see #04 vs. #05, and #08 vs. #09)

Nayuki’s encoder is slow for a myriad of reasons. It is written in Java instead of C plus assembly/SIMD. It hasn’t implemented multithreading for the embarrassingly parallel parts of the search. As a budding project it focuses on correctness and just getting a working implementation at all. It uses 64-bit arrays to support encoding 32-bit integer audio files (whereas Xiph.Org only supports up to 24 bits), but this is an overhead because 32-bit arrays would be enough to support 16- and 24-bit audio (and I don’t think anyone actually uses 32-bit integer audio). And it relies heavily on exhaustive search (for LPC order, Rice partition order, block size switching, etc.), without trying to implement any heuristics to reduce the search space (the other encoders deploy numerous heuristics).

Encoding profiles used

This table lists the details for the profiles used in benchmarking. For Xiph.Org and Flake, the command line options/flags are shown (plus --totally-silent or -q to suppress needless printing for batch processing). For Nayuki’s encoder, values passed into the Java API are shown (it doesn’t support command line options); their meanings won’t be explicitly explained until a proper article for the encoder/library is written.

Choosing which set of options to test/report is a search problem in a multi-dimensional space. Using human-based heuristics, I only explored a small part of the space to comparatively illustrate key points about how certain options (e.g. block size, LPC order) affect output size and run time.

Note that I left a good chunk of options for Nayuki’s encoder unexplored due to immense increases in computation time. The running time is more or less independent of baseSize, but it is proportional to the sum of the sizeMultiples list, together which determine the set of usable block sizes. Also, the run time is more or less proportional to maxLpcOrder minus minLpcOrder, and proportional to 2 to the power of lpcRoundVars (exponential time!).

Conclusion

Xiph.Org’s FLAC encoder is a fine piece of software in many ways – feature-rich, very fast, and good ratio of bytes-saved-to-CPU-time-spent. I showed that it is possible in practice to beat its compressed output size by about 100 KB per song, by applying variable block sizes intelligently. However Nayuki’s FLAC encoder is slow, and the tiny savings in file size are probably not worth the extra CPU time. It is questionable whether Nayuki’s encoder serves any practical purpose other than to show what magnitude of benefit is achieved by switching block sizes in a near-optimal way.

This work nearly exhausts all the possibilities that the current FLAC file format offers for shrinking audio data. Certainly the stereo coding modes, LPC order, Rice partition order, and Rice partition parameters are all searched exhaustively. The only places with room for improvement (however small) are the choice of LPC coefficients (where Xiph.Org currently beats Nayuki), the precision of LPC coefficients (probably not important), and trying a more fine-grained set of variable block sizes (say {100,200,300,...,19900} versus {5000,10000,15000}). There already exists other lossless audio codecs with slightly better compression than FLAC, albeit with tradeoffs in CPU time, licensing, etc. It would be interesting to see an exploration in adding new coding methods to FLAC, such as Huffman coding (instead of Rice coding), better modeling of non-stationary distributions, and who knows what.

Footnotes

[0]: “--blocksize=#: [...] The reference encoder uses the same block size for the entire stream.” (Source)

[1]: “Blocking: [...] Though FLAC allows the block size to vary within a stream, the reference encoder uses a fixed block size.” (Source)

[2]: Except that the final block may be shorter than the constant block size. This desirable behavior ensures that the FLAC file has the exact same number of samples as the input, unlike some other codecs which may silently introduce padding.