Tar archive can contain multiple file entries with the same name. GNU tar info page reads:
[...] files are
extracted from an archive in the order in which they were archived.
Thus, when the archive is extracted, a file archived later in time will
replace a file of the same name which was archived earlier
[...]
When you extract the archive, the older version of the file
will be extracted first, and then replaced by the newer version when it is extracted.
It looks like first Archive::Tar creates the symlink (first entry) and then writes the regular file content into the symlink (second entry).
GNU tar indeed replaces the symlink with a regular file.

> Attached patch simply removes every existing (non-directory) file
> that's going to be extracted. Even in non-secure mode.
>
> Is it fine, or too strict, or even dangerous?
>

Doesn't this just shorten the race and yet still have the same problem?
If the attacker does something like:
perl -e 'symlink "attack_vector" "to.extract" while 1'
Then I would think there's a reasonable chance that the symlink will get created between the unlink and the open while Archive::Tar is busy checking the length, if it is a file, and instantiating an IO::File object.
I don't think the unlink is a negative, in that it avoids a case where you might run out of space to extract two copies of the file, but shouldn't this use a tempfile with a reasonably random name to avoid the race and then atomically rename it to the correct name?

Talking to ewilhelm at pdx.pm, the solution is probably to use the O_EXCL flag for the open. This is not guaranteed to work on NFS, so you would also need a lockfile for that. No idea what guarantees you're looking for with this.
https://man.openbsd.org/open#O_EXCL

>
> Talking to ewilhelm at pdx.pm, the solution is probably to use the
> O_EXCL flag for the open. This is not guaranteed to work on NFS, so
> you would also need a lockfile for that. No idea what guarantees
> you're looking for with this.
>
> https://man.openbsd.org/open#O_EXCL

Here is a patch that adds the O_EXCL flag to the open call. Removing the unlink patch with just this one shows that it works, as the test suite extracts over files.

>
>
> Doesn't this just shorten the race and yet still have the same problem?
>
> If the attacker does something like:
> perl -e 'symlink "attack_vector" "to.extract" while 1'

The attack referred to in this ticket is not to do with people
extracting into an attacker controlled directory, but people extracting
an attacker-controlled archive. So there is no change for the attacker
to run this code.
Show quoted text

> Then I would think there's a reasonable chance that the symlink will get created between the unlink and the open while Archive::Tar is busy checking the length, if it is a file, and instantiating an IO::File object.
>
> I don't think the unlink is a negative, in that it avoids a case where you might run out of space to extract two copies of the file, but shouldn't this use a tempfile with a reasonably random name to avoid the race and then atomically rename it to the correct name?

This isn't to say that the attack you describe isn't valid, but it would
be much, much harder to pull off, since it would require the victim to be
unpacking an archive - and this isn't a bug in Archive-Tar so much as a
user behaviour problem, right?
Or am I misunderstanding completely?
Dominic.

> On Wed, Jun 13, 2018 at 11:17:51PM -0400, Andrew Fresh via RT wrote:
> This isn't to say that the attack you describe isn't valid, but it
> would
> be much, much harder to pull off, since it would require the victim to
> be
> unpacking an archive - and this isn't a bug in Archive-Tar so much as
> a
> user behaviour problem, right?
>
> Or am I misunderstanding completely?
>
> Dominic.

I think you are correct, I must have I misunderstood the problem when first reading the ticket. The existing patch should solve the problem of a malicious tar file extracting outside of the current working directory with a symlink. Thank you for explaining again.