R.Daneel approves of these ideas! Some needed patches to my scraper
(where did LocationID go?) but overall I'm very happy to see these
changes.
There are many ways to safely inspect tarballs, even to get around the
zip bomb. I won't claim this is perfect, but it works for Aur3 and
me.
Listing paths has already been mentioned. Dotfiles, dotdirs, src/,
pkg/ are all simple red flags. For that manner, *any* directories are
often a sign something is wrong. As mentioned, you can get the size
of files before extracting. I don't know enough about tars to know if
an attacker could lie about the size. But even if they can,
time/memory quotas greatly limit the damage as DoS could acheive.
Files can also be processed as streams. I originally did binary
detection via "file" (which needs the contects to be extracted, it
can't be streamed through stdin) but have since implemented a
stream-based UTF8 detector. Stream processing gets around disk
attacks. Make the stream processor interruptable (when time quota is
exceded) and it can return an estimate. By the way, I am not
suggesting binary detection. It is just an example of something that
lends itself very well to this method.
-Kyle
http://kmkeen.com