Re: .deb format: let's use 0.939, zstd, drop bzip2

On Wed, May 08, 2019 at 10:45:21PM +0300, Adrian Bunk wrote:
> On Wed, May 08, 2019 at 07:38:26PM +0200, Adam Borowski wrote:
> > So let's pick compressors to enable. For compression ratio, xz still wins
> > (at least among popular compressors). But there's a thing to say about
> > zstd: firefox.deb zstd -19 takes to unpack:
> > * 2.644s .xz, stock dpkg
> > * 2.532s .xz, my tool (libarchive based)
> > * 0.290s .zst, my tool
> > * 0.738s .gz, stock dpkg
> > * 0.729s .gz 0.939, stock dpkg
> > * 0.729s .gz 0.939, stock dpkg
> > File sizes being 60628216 gz, 47959544 zstd, 44506304 xz.
> >
> > XFCE install total: 723M xz, 773M zstd, 963M gzip.
> >
> > Thus, even though we'd want to stick with xz for the official archive, speed
> > gains from zstd are so massive that it's tempting to add support for it,
> > at least for non-official uses, possibly also for common Build-Depends.
> >...
>
> Is this single-threaded or parallel?
This one was single-threaded (AKA: all cores were available to the
decompressor, running dpkg-deb without any arguments).
Most other tests were on a fully loaded processor, with one (hyper-)thread
available per task.
> pbzip2 decompression speed scales nicely with the number of CPUs,
> and in general for anyone interested in massive speed gains the
> way forward would be towards parallel decompression.
Just tested pbzip2 on its own, without dpkg, commandline being:
time (pbzip2 -cdfr <TARBALL|tar xf -)
* 5.018s
File size: 54841864 (without ar and control).
So it's incredibly slow for very weak compression.
> > But, the dlopen idea shows a potential victim: bzip2. Let's kill it.
> >
> > Stats for Buster's packages:
> > .deb format:
> >
> > With not a single package in the archive still using bz2,
>
> You were only looking at binary packages,
> for source packages bz2 is still pretty common.
Well yeah, but that's dpkg-dev, where size of the toolchain matters little.
I don't think anyone is going to build packages on a machine without
adequate storage. On the other hand, runtime often means a tiny router or a
massively oversubscribed container hosting.
But my main point was not to help bitty boxes, but to slow the growth of
bloat somehow. When we add libraries, it's good to retire outdated ones
sometimes.
> > removing support
> > would be reasonable. It'd be okay to give a clear error message telling the
> > user to install libbz2-1.0 (dlopen) or bzip2 (pipe) -- so folks can still
> > unpack historic .debs if need be.
>
> It would be neither reasonable nor okay to create such hassle for users
> for no benefits at all.
>
> And if the tiny 75 kB libbz2 would be considered a problem,
> the huge 650 kB libzstd would obviously never be an option
> for packages in the archive.
It's already in, thus the effective cost is not 650kB but 0. On the other
hand, the utility of libbz2 is only unpacking very old .debs. That's
something useful, but in no way needed on every machine.
I just checked Stretch: not a single .bz2, either control nor data. I'm not
going to download all of Jessie just to check -- but even assuming something
was left by Jessie's time, by Bullseye trying to install such a .deb will
mean mixing packages 3 releases apart.
Also, many other tools keep depending on libbz2, so it'll likely remain
present on most systems (even if gpgv (transitively-Required) also drops the
dependency). And if it declines in popularity -- it'll likely remain in
the archive for a long long time, just like ncompress and arj do.
Compressors are easy to keep on life support, and important enough that
none which have seen some real use would be dropped.
Meow!
--
⢀⣴⠾⠻⢶⣦⠀ I've read an article about how lively happy music boosts
⣾⠁⢰⠒⠀⣿⡁ productivity. You can read it, too, you just need the
⢿⡄⠘⠷⠚⠋⠀ right music while doing so. I recommend Skepticism
⠈⠳⣄⠀⠀⠀⠀ (funeral doom metal).