Hi,
On Tue, Mar 15, 2011 at 5:55 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
> Hi,
>> On Tue, Mar 15, 2011 at 5:30 PM, Christoph Gohlke <cgohlke@uci.edu> wrote:
>>>>>> On 3/15/2011 5:13 PM, Matthew Brett wrote:
>>> Hi,
>>>>>> On Tue, Mar 15, 2011 at 10:23 AM, Matthew Brett<matthew.brett@gmail.com> wrote:
>>>> Hi,
>>>>>>>> On Tue, Mar 15, 2011 at 10:12 AM, Pauli Virtanen<pav@iki.fi> wrote:
>>>>> Tue, 15 Mar 2011 10:06:09 -0700, Matthew Brett wrote:
>>>>>> Sorry to ask, and I ask partly because I'm in the middle of a py3k port,
>>>>>> but is this the right fix to this problem? I was confused by the
>>>>>> presence of the old PyString_AsString function.
>>>>>>>>>> It's not a correct fix. The original code seems also wrong ("index" can
>>>>> either be Unicode or Bytes/String object), and will probably bomb when
>>>>> indexing with Unicode strings on Python 2. The right thing to do is to
>>>>> make it show the repr of the "index" object.
>>>>>>>> OK - I realize I'm being very lazy here but, do you mean:
>>>>>>>> PyErr_Format(PyExc_ValueError,
>>>>>> "field named %s not found.",
>>>>>> PyString_AsString(PyObject_Repr(index)));
>>>>>> Being less lazy, and having read the cpython source, and read
>>> Christoph's mail more carefully, I believe Christoph's patch is
>>> correct...
>>>>>> Unicode indexing of structured array fields doesn't raise an error on
>>> python 2.x; I assume because PyString_AsString is returning a char*
>>> using the Unicode default encoding, as per the docs.
>>>>>>> I think the patch is correct for Python 3 but, as Pauli pointed out, the
>> original code can crash also under Python 2.x when indexing with an
>> unicode string that contains non-ascii7 characters, which seems much
>> less likely and apparently has been undetected for quite a while. For
>> example, this crashes for me on Python 2.7:
>>>> import numpy as np
>> a = np.zeros((1,), dtype=[('f1', 'f')])
>> a[u's'] = 1 # works
>> a[u'µ'] = 1 # crash
>>>> So, the proposed patch works for Python 3, but there could be a better
>> patch fixing also the corner case on Python 2.
>> Thanks. How about something like the check further up the file:
>> if (PyUnicode_Check(op)) {
> temp = PyUnicode_AsUnicodeEscapeString(op);
> }
> PyErr_Format(PyExc_ValueError,
> "field named %s not found.",
> PyBytes_AsString(temp));
>> ? I'm happy to submit a pull request with tests if that kind of thing
> looks sensible to y'all,
That fix works for 2.6 but crashes 3.2 when it is rejecting (by design
I think) field indexing with byte strings.
I've put a test function in this branch which exercises the routes I
could think of. It will crash current 2.x because of the unicode
error that Pauli pointed out, but I think it corresponds to the
expected behavior:
https://github.com/matthew-brett/numpy/compare/numpy:master...matthew-brett:struct-arr-fields
Do these tests look correct? I'd be happy to hear why the code above
does not work, it wasn't obvious to me.
See you,
Matthew