The idea being that one could start a business capitalizing on the relatively cheap availability of video conferencing tools to sell distributed interpretation services.

Well, I talked to my sister about this idea. She’s a nurse.

The concept is D.O.A., and here’s why: there are strict rules about how the interaction between doctors, patients, and interpreters are to take place. Specifically, the interpreter is not allowed to be a “participant” in the conversation: the interpreter must not speak directly to the patient. The patient looks only at the doctor, never at the interpreter.

That’s a rule.

Which obviates the whole point of the webcam idea. Perhaps the VOIP aspect would still be doable, however.

I find myself working with an awful lot of languages (you’ll see why when Jonas and I launch our project), and I often have to learn just enough characters to determine that a particular script seems to be rendering correctly. We have to know if rendering problems are caused by some kind of configuration problem that we can fix, or if it’s something out of our control: “Sorry, no hieroglyphics in Unicode, not our problem!”

Debugging such stuff is not the same thing as actually being able to read in all these languages: in most cases it’s enough to learn just a bit about how the script is put together and how characters combine, and perhaps a few words for testing purposes.

So here’s an example of a typical problem that I face. Compare a the two screenshot clips I took this morning. I added the red-bordered boxes to point out the differences:

Even if you don’t know Devanāgarī from a salad fork, it doesn’t take much to guess that something is askew in my Firefox’s rendering of that page. (Never mind the fact that the word “Hindi” is actually spelled incorrectly… Doh!) Opera seems to get it right.

Now I’m not going to get into the details of how Devanagari works in Hindi at the moment (primarily because I don’t know much, heheh). The main problem for me is that there are so many possible causes for any problem in text rendering. Is this a configuration problem on my end, or is it some pernicious software problem buried in a library underneath the text?

The font could be bad.

The browser?

Is it the case that my operating system is missing some library? (Linux, in my case.) If so, what library? Can I upgrade something to fix it? Who ya gonna call?

In this particular case, the comparison above leads me to suspect #2, of course. But you get the picture here: these kinds of problems are a mess. Particularly in the open source world, it’s hard to know what to do in this situation. And I’m moderately techie. Imagine what a run of the mill user faces.

I was chatting with Chad Fowler and he made an interesting observation: for the development of any given application, in order to be sure, really sure, that everything is okay for every particular writing system, each development group would have to have someone who can read each language. Which, er, ain’t gonna happen.

And it shouldn’t really have to: the operating system is supposed to abstract the basic rendering of text away from coding.

OSX is pretty darn good at this. But then, it’s also a very closed system: it’s all tested, Apple owns and delivers a wide variety of high-quality (proprietary) fonts with its machines, and there are far fewer points of variation than you’ll see in your average Linux distribution.

Matters in Windows are less variable than Linux, but more complex than OSX, as Michael Kaplan can attest in great detail at his excellent blog.

I think these complexities are makes many programmers reticent about Unicode: they’ve been burned in the past with encoding matters, gotten a glimpse of the gruesome entrails underlying text rendering on their platform, and decided I just don’t have time to really learn how all these text rendering variables fit together.

And quite frankly, despite being something of a Unicode zealot myself, I can sympathize.

I’ve recently become a fan of Sitepoint’s books on programming. They’re very cleanly put together, and generally speaking seem to be quite up to date. Here are a couple of titles I went ahead and took the plunge on:

I like this book quite a bit. The CSS reference in the back is almost worth the price of admission… there are references online (duh) but I guess I’m just still a sucker for paper. There’s a lot of useful info on styling text, which turns out to have more tricks available than I’d ever heard of. One thing about this book that annoyed me intensely was in chapter 6, “Putting Things in Their Place,” when he gives a Javascript solution to the problem of getting columns to flow to equal heights. Admittedly, he gives an alternative, but there are a lot of pure CSS solutions to this problem out there, and one would think that if there’s a reliable one out there, that this would be the book to find it. So yeah, that bit rubbed me the wrong way.

I’ve been looking forward to this one for quite a while. At that link you can get the first four chapters for free. To be honest, I debated whether to buy the book, because judging from the table of contents, it seems that most of the stuff that I had doubts about was in the free sample chapters. But I’m a big fan of the author and editor: Stuart Langridge through the ridiculously awesome LugRadio (or listen on Odeo) and Javascript/Python guru Simon Willison. So in the end I felt pretty good about picking up a copy. Haven’t started digging in yet. One nit to pick: forty smackers is a lot to ask for a book that’s just 300 pages. Not saying it won’t turn out to be worth it in the end, but dag.

Now, here’s the weird part: I was just trying to type up a list of stuff, in a run-of-the mill text file (in my favorite text editor, gedit). And I had this urge to reorganize the list drag-and-drop style. Except, my word processor couldn’t do that . Kinda backwards, eh? Usually web interfaces are thought of as the relatively impoverished cousins of desktop apps…

Yes, friends and neighbors, the browser will eat the desktop, sooner or later.

on second thought…
I guess gedit sort of does have drag and drop: you can highlight a line and then drag and drop that. But it’s still not as simple as Backpack’s lists, because it’s really hard to grab or skip newlines just by highlighting–with an HTML list there are bullets.

Try typing the string jxauxdo in that box. And press “Trovu”, if you like, that will search Google for ĵaŭdo (Esperanto for “Thursday”). Notice that jx → ĵ and ux → ŭ “on the fly,” as you type. (Come to think of it, maybe “transliteration” isn’t the right word for this process…)

So, backing up a bit, Esperanto has a few odd characters in its orthography:

Even today those characters are relatively rare in fonts–if you can’t see them I imagine this post may not make too terribly much sense. 8^)

The good doktoro even got a little flak back in the day, for choosing to include such unusual characters in a supposedly universal language. Nowadays, however, they’re all in Unicode–here’s the full info for ŝ, for example:

U+015D LATIN SMALL LETTER S WITH CIRCUMFLEXŝ

But pragmatically speaking, there’s still a problem with input. Suppose you are a gold-star-wearing green-flag-waving Esperanto afficionado, and you want to post something on the internet. How do you actually type these characters? The “right” answer is that you install a keyboard layout for the language in question, and you memorize its layout.

This is a pain, of course.

And it’s nothing new: in the (typographical) bad old days of all-ASCII USENET, Unicode wasn’t widely available, and what people would generally do (for many languages, not just Esperanto) was come up with all-ASCII transliteration systems. The “x-system” added to the table above was probably the most popular. It so happens that there is no letter x in Esperanto, so it didn’t cause any massive problems with ambiguity.

And the function gets called with an onkeyup="xAlUtf8(this.value)" inside the input tag.

(Using onkeyup is actually sort of verboten these days–it should be done with unobtrusively, etc.)

So anyway, that’s a pretty interesting way to enter some unusual characters. It’s interesting to muse on just how far one could take this approach. Would it be possible to create a script that would handle an entire writing system? Say, a script that would convert an entire textarea from an ASCII-based transliteration to Unicode characters, on the fly? Japanese and Chinese are definitely excluded from this approach (every Chinese character in RAM? Er, no.) but people who use those languages generally already have keyboard input taken care of.

That would be neat: you could, for instance, have textareas where users without keyboard layouts could input something in Amharic or Persian or whatever without having the keyboard layout actually installed.

But as it stands, it’s just simple substitution, and no string which is to be substituted can be a substring of another such string. In order to handle a more generalized set of substitutions, you’d probably need to use a Trie structure. (nice trie implementation in Python by James Tauber. )

I’m sure there are complications that would arise from what’s called “font shaping” — that is, how operating systems combine adjacent characters. In Arabic or Thai, for instance, characters vary depending on which characters they’re adjacent to. How does this process affect text in textareas, for instance, or text which is mushed around with Javascript?

If you right click on any text and choose “Inspect element,” the DOM Inspector will show you where that element is. Then if you choose the little dropdown as shown, and select “Computed style” like this:

…you can look up the value of font-family for whatever element it was that you selected. This is easier than trying to work it out by looking through reams of stylesheets and HTML, methinks.

This is forward-thinking on the part of the Indian government; for a long time it seemed to be the case that the only major website that encoded Hindi in UTF-8 was a foreign site, BBCHindi. Most news sites in Hindi use any of a bewildering array of proprietary encodings, with a proprietary font to accompany it. (Intended presumably to lock in users).

So if you’re like me, you like to look at lots of code and compare stuff . In the case of Rails, I wanted to get a general feel, for instance, for the sorts of stuff that goes in /app/controllers or /app/models in various projects.

It so happens that Jesse Ruderman has written some navigation bookmarklets that work great for nagivating around those Rails projects: if you drag those two linked words (”increment” and “decrement”) to your toolbar and then visit the first project, you can click them to navigate around.