I have a legacy application in which the UI and business logic are already reasonably well-separated. There is a proposal to separate them even further, turning the core application into a "service" (without UI) and writing a kind of "UI Server" as part of it to which various UIs (potentially in various languages, and for various different devices and operators) can connect to get/set the data to drive the application via those UIs.

On the surface, this seems to make a nice separation between the business logic, which is currently completely ignorant of the possibility of Unicode, and the UI, which is going to need to get very familiar with Unicode to support multiple languages in the UI(s).

Now this application essentially monitors a production process, and has very little (if any) traffic directly from UI to database via the business layer. It strikes me that the "natural language" of this process might as well be Chemistry or Mathematics, and so internally I should stick to the language that best describes it, so long as I can translate from that language into anything any UI requires (which I believe should be possible). This leads me to prefer (the simplicity and familiarity and least work path of) retaining old-fashioned 8-bit chars over moving to Unicode.

Are there any technical reasons to reject keeping the business logic part of an application ignorant of Unicode like this? And even if the "natural language" of the system were "English-without-too-many-strings-or-dates" rather than Chemistry, would that make a difference?

4 Answers
4

You can actually have both. Unicode supports an encoding where all characters are represented as a (variable length) sequence of 8-bit units: UTF-8.

Assuming that you refer to ASCII with your 'old-fashioned 8-bit chars', then you can almost trivially support UTF-8, because UTF-8 is a proper superset of ASCII: All characters in the original 7-bit ASCII set are present in UTF-8 with the same code value. The characters in the various 'extended ASCII' sets (as well as the rest of the characters in Unicode) are encoded as multi-byte values. This encoding is done in such a way that for each individual byte you can tell if it represents 1) a single-byte character, 2) the start of a multi-byte character, or 3) another byte in a multi-byte character.

The only area that you must be careful in when going from ASCII (or other fixed-length encodings) to UTF-8 (and Unicode in general) is text/string processing. As UTF-8 (and Unicode in general) doesn't use fixed-length encodings, any algorithms that assume a fixed-length encoding can easily give incorrect results.

Are there any technical reasons to reject keeping the business logic part of an application ignorant of Unicode like this?

While you're reasonably certain that your system doesn't need Unicode (now), I don't see what you gain by precluding it. Unless you're using an environment with horrible Unicode support, I imagine it'll be more work to go that route.

And even if you really don't ever need it, most other people viewing this answer probably will.

There's likely a performance hit to transition between encodings at the UI/BL boundary.

There may be serialization challenges to transition between encodings depending on your boundary transport.

And even if the "natural language" of the system were "English-without-too-many-strings-or-dates" rather than Chemistry, would that make a difference?

Sure. In something like a physical science where the papers and discussions all take place in English (do they?) you can be reasonably certain that nobody is going to come in and demand localization in the software's lifetime. In "English with few strings" software, success will likely lead to non-English companies wanting to use the application - leading to localization demands.

I have edited to clarify that the purpose of the UI needing "to get very familiar with Unicode" is so that it can support multiple languages. So you may like to edit your answer to provide more detail on points 2 & 3, and de-emphasize your localization concerns :-)
–
omataiJul 22 '14 at 2:48

You're speaking of internationalization, at least for the UI. That means user-names etc.:
Have a user named André Svónögrödäß (yes, him) and store his profile on the server: BOINK! translations and transscriptions everywhere., or tell him that his name is "wrong", while the other 99.9% of software in the world accept his name.

Also: Can you 110% guarantee that no business-data around the globe will ever be dependant on Unicode?
I'd wager the cost of implementing it now with a slight risk of not really needing it vs. the cost of implementing when the users start complaining.

It does not always mean user names. Maybe it only means "roles" like "user" and "technician". And maybe if all you're ever doing is monitoring the manufacture of bottles, you can describe all their colours, styles and sizes with product numbers. So long as your UI can translate such data into any language, why should you force the business logic to speak Unicode? Nice user name example, by the way :-)
–
omataiJul 22 '14 at 21:56

@omatai True what you're saying, but then you'd exchange implementing unicode-support in the business area for implementing a "translation"-layer between business and UI. I'd still say, why not make it native unicode.
–
MarkJul 23 '14 at 4:45

I'm not sure I like this answer, and even though it's my own, I'm not going to prefer it. But I can't rule out in my own mind that it is a good thing to do in some situations, so I'll offer it.

If you've seen the movie "Gravity", you'll have seen Sandra Bullock's character trying to control a spacecraft whose controls are labelled in Russian. But those controls could equally be labelled with numbers. Now it would be entirely possible to write the kind of user manual used in the film in any language at all, referring only to the numbers of the controls. Indeed, those control labels could all be small displays which display a number until a language is identified, at which point they could display language-appropriate labels.

So here's the whole point of the example: no matter how you change the labels, the spacecraft is still going to fly the same. (Also, it helps that it's a non "English-centric" example to ground this in).

My question (and this answer) stem from recognising that there are certain applications for situations like this (such as production process control and monitoring) where the essential application (or the spaceship in this example) doesn't conform strongly to the normal "database/business logic/UI" model of doing things. Instead, maybe a "state machine/business and-or process logic/UI/database" model of things makes more sense.

That is to say, we all aim for the "holy grail" of separation between UI and business logic... but then we hamper ourselves in that effort by storing elements in the database that are not essential to the business logic in the slightest. But where else would we store them if there is only one database, and it is only accessed via the business logic?

So maybe there is a case to be made (in certain applications that suit it) for keeping the business logic independent of the database that supports the UI, and the UI independent of the database (or state machine) that supports the business logic.

This would allow the business logic to be written in a more natural language (which may be FORTRAN, PLC ladder diagrams, LISP, "Chemistry", etc, and which may be a terrible language to write a UI in) while writing a UI for multiple languages and/or devices in something which is suited to that purpose.

As I say, I'm not sure I like this solution, but I can see situations where it is well worth consideration.