This reads in the 2.5GB file, and serializes the output matrix. The input file is read in "lazily", so no intermediate data-structures are built and minimal memory is used. The initial load takes a long time, but each subsequent load (of the serialized file) is fast. Please let me if you have tips!

If you're specifying the number of columns a-priori, why not do something more like this: gist.github.com/2465280 ? On a side note, to make an array from a generator, use np.fromiter.
–
Joe KingtonApr 22 '12 at 16:56