The issue is that Perl's internal representation of a file
is 32-bit. So when Perl tries to seek to a particular
location of a very large file, well Perl doesn't understand
what position 5_003_904_123 is.

There are two solutions. One is to compile a new version
of Perl that does understand how to handle large
files. The other is to keep Perl from trying to seek. The
easiest way to do the other is for Perl to read from a pipe,
not a filehandle. That can be done by writing your script
so that it reads from STDIN, or by converting your opens
so that they look kind of like this:

A lot of languages claim to have "no arbitrary size limits" for strings and files or that the limits to the sizes of their thingies is limited only by available memory or hard disk space.

What I have learned from Tilly's post is that we Perl advocates cannot make such a claim. Perl programs can break if confronted with a file size of greater than 2 GB. And since hard disks often come with much more space than that, and since the poster obviously has a need to work with such files, this deficiency is not a trivial one.

"Nothing is difficult for the man who doesn't have to do it himself." All hail the worthy work of the Perl Porters who got us where we are today (with a little help from ActiveState and thus B. Gates.) I intend no disparagement of their magnificent work.

I wouldn't take this particular one too badly. When Perl 5
came out it wasn't clear how the industry would handle the
32-bit barrier in file-size, so there was no way to write
Perl support for it. You can hardly blame people for not
writing support for what didn't yet exist.

According to
Dominic
Dunlop Perl had limited support for 64-bit files in
5.005_03, and it is (as noted above) a compile-time option
in 5.6. But that compile-time option will not work on all
platforms, and not all people on platforms that do support
it have used it. And note that support for 64-bit files
needs to be present in the operating system. If you are
running Linux, that support is first present in 2.4. If you
are running FreeBSD it has been there for a few years now.

Anyways all 32-bit computer applications have arbitrary
limits imposed on them by the hardware. And the above
question is the leading edge of a trainwreck we will see in
slow motion over the next few years. The problem is that
if your naming scheme is 32-bits, then it only has about
4 billion names. Waste a bit here or there, and you are
limited to 1 or 2 billion. Segment your architecture in
some way, and you find that real world limits tend to hit at
1, 2, 3, or 4 GB. Often with a hack (such as large file
support or Intel's large RAM support) you can push that off
in particular places. But, for instance, Perl on a 32-bit
platform will never support manipulating a string of length
3 GB. It isn't going to happen. And Perl is not alone.

But thanks to Moore's law, it is a question of time before
people want to do exactly that. And so as users needs keep
on crossing the magic threshold people at first find their
workarounds, and then will have to switch to 64 bit
platforms. Which won't be pretty, but it will happen.
And the trillion dollar question is whose 64-bit chip is
going to win. Right now people tend to use alphas. AMD's
proposal is (I have heard) technically worse but makes
for the easiest upgrade from x86. Intel has a huge amount
of marketing muscle. In 5 years the answer will seem
obvious in retrospect and everyone else is going to be
playing catch-up. And playing catch-up for a very long
time - the 128-bit conversion is decades off and there is
no guarantee that Moore's law will continue until then.