Hi
On 01/05/18 22:28, Andrew Bartlett via samba-technical wrote:
> G'Day Noel,
>> Thanks so much for continuing the python3 work. This is really
> important and I'm so glad to be able to pass on the baton here.
Well I hope I am not going to be alone in working on this and I hope
everyone who was also contributing will still do so, I don't really have
the background knowledge (or even python skills) but I'm happy to keep
pushing on as best and as hard as I can
>> One thing that came up in a discussion in the Catalyst office regarding
> this work is worth raising more broadly.
>> It is exceedingly common in Samba's use of ldb to use:
>> username = str(res[0]["samAccountName"])
>> This works because of
>> static PyObject *py_ldb_msg_element_str(PyLdbMessageElementObject *self)
> {
> struct ldb_message_element *el = pyldb_MessageElement_AsMessageElement(self);
>> if (el->num_values == 1)
> return PyStr_FromStringAndSize((char *)el->values[0].data, el->values[0].length);
> else
> Py_RETURN_NONE;
> }
Not always :-/ It seems some attributes are not strings e.g. guids can
be binary also same for security descriptors. These can fail with
str(res[0]["blah"]) as there could easily be a decode error before even
the py c code returns (I've even had to deal with this in my WIP)
>> However equally common is:
>> username = str(res[0]["samAccountName"][0])
probably more common is just the plain res[0]["samAccountName"][0] the
str doesn't do anything in this case I think and the majority of the
code I have seen doesn't enclose the value in the 'str' function
>> This works because in python2 it just returns the string. However in
> python3 I'm told it will return "b'username'" (no so helpful).
>> As all strings in LDAP are UTF8 (I'm willing to assert that for sanity)
> I think we need the MessageElement to contain not byte buffers, but a
> subclass of byte buffers that have a string function that converts
> automatically produces a utf8 string for str().
not sure exactly what you mean here because doesn't decode provide the
same functionality?
e.g. res[0]["samAccountName"][0].decode('utf8')
or do you mean change the api so that 'res[0]["samAccountName"][0]' will
now return an object that provides a 'str' method *and* additionally
some sort or a 'to_bytes' [1] type method this would mean we would have
to modify
- res[0]["blah"][0]'
+ str(res[0]["blah"][0])'
with the exception of those attributes that we require binary content
for where they would have to
- res[0]['binaryAttr'][0]
+ res[0]['binaryAttr'][0].to_bytes()'
However there doesn't seem really to be much difference in effort here
than just adding the decode where necessary like
- res[0]['blah'][0]
+ res[0]['blah][0].decode('uft8')
Now I readily admit I am not really a python programmer nor have really
a huge amount of knowledge of the samba python api so I guess I am
missing something ?
Also if anyone has an easy list of what attributes definitely have
binary content that would be useful
>> Do you think you could have a look at that? Otherwise, converting
> samba-tool and our other ldb-calling code is going to get very tricky.
yep, I am already experiencing that, I've already converted a hunk of
the samba_tool tests (those exercising the api) to python3 (you can see
the progress https://github.com/samba-team/samba/pull/161 - please note,
this is a WIP branch, there's only a pull request for visibility and CI
exposure) The string/binary issue around attributes is annoying. I'd
welcome any more input, suggestions or other possible solution there.
Noel
[1] I expected python3 to provide a 'tp_bytes' type c-function hook,
afaik in native python you can define a '__bytes__' method. However this
doesn't seem to be the case.