Hi all,
I've got a fairly large (but not huge, 58mb) tab seperated text file, with
approximately 200 columns and 56k rows of numbers and strings.
Here's a snippet of my code to create a numpy matrix from the data file...
####
data = map(lambda x : x.strip().split('\t'), sys.stdin.readlines())
data = array(data)
###
It causes the following error:
data = array(data)
ValueError: setting an array element with a sequence
If I take the 1st 40,000 lines of the file, it works fine.
If I take the last 40,000 lines of the file, it also works fine, so it isn't
a problem with the file.
I've found a few other posts complaining of the same problem, but none of
their fixes work.
It seems like a memory problem to me. This was reinforced when I tried to
break the dataset into 3 chunks and stack the resulting arrays - I got an
error message saying "memory error".
Also, I don't really understand why reading in this 57mb txt file is taking
up ~2gb's of RAM.
Any advice? Thanks in advance
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090923/04e9cd3b/attachment.html