[Discourse.ros.org] [Client Libraries] Python3 and strings

[Discourse.ros.org] [Client Libraries] Python3 and strings

Hi everyone,

I am currently writing some python2/python3 libraries to work with ROS messages, and I am in need of some information.

How to treat the 'string' message field in python3 ??
There is no info about that in http://wiki.ros.org/msg , but in python3 we need to specify encoder/decoder whenever we change a string into a list of bytes and vice versa...

[Discourse.ros.org] [Client Libraries] Python3 and strings

That page does mention:
> unicode strings are currently not supported as a ROS data type. utf-8 should be used to be compatible with ROS string serialization. In python 2, this encoding is automatic for unicode objects, but decoding must be done manually. In python 3, both encoding and decoding are automatic.

So on the user side, you just need to make sure that the encoding for the string you're sending is utf-8.

With that in mind, that block from the msg wiki page seems sufficient to me:
> unicode strings are currently not supported as a ROS data type. utf-8 should be used to be compatible with ROS string serialization. In python 2, this encoding is automatic for unicode objects, but decoding must be done manually. In python 3, both encoding and decoding are automatic.

[Discourse.ros.org] [Client Libraries] Python3 and strings

We are obviously using unicode codec `UTF-8` to encode and decode it, and the matching python type is a [unicode string](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str). So looking at this code, I would say :
' A `string` field in a ROS message is a unicode string, and will be encoded/decoded using UTF-8 for serialization/deserialization'

[Discourse.ros.org] [Client Libraries] Python3 and strings

As per your recommendation, I think mentioning utf-8 string as the serialization type would be fine (though not sure if that is the right thing with the C++ client library), but it would be better to use/recommend the str type for Python2 since there is no automatic decoding into a unicode string for Python 2. So it would just be type str for both Python 2 and 3.

[Discourse.ros.org] [Client Libraries] Python3 and strings

But `str` in Python3 is `unicode` in Python2, and having different ways to serialize data between different versions of python will break a few things in many places ("why my message is garbled on this node and not that one?").
We could do that, but it would require a "big warning" everywhere we mention this topic...

=> I **could not find any REP** specification regarding the message serialization, and how to match the types of the supported languages and integrate deserialization with it. I seems it's something we need to drive implementation (especially given ROS supports multiple languages) and prevent "incomplete features" as much as possible.

The current serialization code **breaks** :
- when we pass a `bytes` in python 3 (no `encode` method) fix attempt [here](https://github.com/ros/genpy/pull/85/files)
- when we pass a `unicode` in python2 (receiving end lose the encoding)
- when we pass a `str` in python3 (receiving python2 end lose the encoding)

=> we need a solution (design fix) that integrates properly for all supported languages...

[Discourse.ros.org] [Client Libraries] Python3 and strings

I think fully supporting unicode strings would require a lot of effort, more so on the C++ client libraries.
Sadly not a solution, but for now the recommendation of just sticking to ascii strings would prevent the issues mentioned in the the second and third bullets affecting user code.