Hi All,
I am trying to put together some rule for parsing text strings/files in
fromfile, fromstring so that the two are consistent. Tickets relevant to
this are #1116 <http://projects.scipy.org/numpy/ticket/1116> and
#883<http://projects.scipy.org/numpy/ticket/883>. The question here is
the interpretation of the separators, not the parsing
of the numbers themselves. Below is the current behavior of fromstring,
fromfile, and python split for content of "", "1", "1 1", " " respectively.
fromstring :
In [5]: fromstring("", sep=" ")
Out[5]: array([ 0.])
In [6]: fromstring("1", sep=" ")
Out[6]: array([ 1.])
In [7]: fromstring("1 1", sep=" ")
Out[7]: array([ 1., 1.])
In [8]: fromstring(" ", sep=" ")
Out[8]: array([ 0.])
fromfile:
In [1]: fromfile("tmp", sep=" ")
Out[1]: array([], dtype=float64)
In [2]: fromfile("tmp", sep=" ")
Out[2]: array([ 1.])
In [3]: fromfile("tmp", sep=" ")
Out[3]: array([ 1., 1.])
In [4]: fromfile("tmp", sep=" ")
Out[4]: array([ 0.])
split:
In [9]: "".split(" ")
Out[9]: ['']
In [10]: "1".split(" ")
Out[10]: ['1']
In [11]: "1 1".split(" ")
Out[11]: ['1', '1']
In [12]: " ".split(" ")
Out[12]: ['', '']
Differences:
1) When the string/file is empty fromfile returns and empty array, split
returns an empty string, and fromstring converts the empty string to a
default value. Which should we use?
2) When the string/file contains only a single seperator fromfile/fromstring
both return a single value, while split returns two empty strings. Which
should we use?
My preferences would be to return empty arrays whenever the string/file is
empty, but I don't feel strongly about that. I think the single separator
should definitely produce two values.
Also, wouldn't a missing value be better interpreted as nan than zero in the
float case?
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090524/9a19288b/attachment.html