Reflecting on “Computing’s Final Frontier”

In the March issue of PC Magazine, John Dvorak comments on four areas of computer technology in his column entitled “Computing’s Final Frontier“: voice recognition; machine translation (MT); optical character recognition (OCR); and spell-checkers. Basically he’s decrying how little progress has been made on these in recent years relative to the vast improvements in computer capacities.

I’d like to comment briefly on all four. Two of those – voice recognition, or actually speech recognition, and MT – are areas that I think have particular importance and potential for non-dominant languages (what I’ve referred to elsewhere as “MINELs,” for minority, indigenous, national, endangered or ethnic, and local languages) including African languages on which I’ve been focusing. OCR is key to such work as getting out-of-print books in MINELs online. And spell-checkers are fundamental.

Voice recognition. Dvorak seems to see the glass half empty. I can’t claim to know the technology as he does, and maybe my expectations are too low, but from what I’ve seen of Dragon NaturallySpeaking, the accuracy of speech recognition in that specific task environment is quite excellent. We may do well to separate out two kinds of expectations: one, the ability of software to act as an accurate and dutiful (though at times perhaps a bit dense) scribe, and the other as something that can really analyze the language. For some kinds of production, the former is already useful. I’ll come back to the topic of software and language analysis towards the end of this post.

Machine translation. I’ve had a lot of conversations with people about MT, and a fair amount of experience with some uses of it. I’m convinced of its utility even today with its imperfections. It’s all too easy, however, to point out the flaws and express skepticism. Of course anyone who has used MT even moderately has encountered some hilarious results (mine include English to Portuguese “discussion on fonts” becoming the equivalent of “quarrels in baptismal sinks,” and the only Dutch to English MT I ever did which yielded “butt zen” from what I think was a town name). But apart from such absurdities, MT can do a lot – I’ll enjoy the laughs MT occasionally provides and take advantage of the glass half full here too.

But some problems with MT results are not just inadequacies of the programs. From my experience using MT, I’ve come to appreciate the fact that the quality of writing actually makes a huge difference in MT output. Run-on sentences, awkward phrasing, poor punctuation and simple spelling errors can confuse people, so how can MT be expected to do better?

Dvorak also takes a cheap shot when he considers it a “good gag” to translate with MT through a bunch of languages back to the original. Well you can get the same effect with the old grapevine game of whispering a message through a line of people and see what you get at the end – in the same language! At my son’s school they did a variant of this with a simple drawing seen and resketched one student at a time until it got through the class. If MT got closer to human accuracy you’d still have such corruption of information.

A particularly critical role I see for MT is in streamlining the translation of various materials into MINELs and among related MINELs, using work systems that involve perhaps different kinds of MT software as well as people to refine the products and feedback into improvements. In my book, “smart money” would take this approach. MT may never replace the human translator, but it can do a lot that people can’t.

Optical character resolution. Dvorak finds fault with OCR, but I have to say that I’ve been quite impressed with what I’ve seen. The main problems I’ve had have been with extended Latin characters and limited dictionaries – and both of those are because I’m using scanners at commercial locations, not on machines where I can make modifications. In other words I’d be doing better than 99% accuracy for a lot of material if I had my own scanners.

On the other hand, when there are extraneous marks – even minor ones – in the text, the OCR might come up with the kind of example Dvorak gives of symbols mixed up with letters. If you look at the amazing work that has been done with Google Patent Search, you’ll notice on older patents a fair amount of misrecognized character strings (words). So I’d agree that it seems like one ought to be able to program the software to be able to sort out characters and extraneous marks through some systematic analysis (a series of algorithms?) – picking form out of noise, referencing memory of texts in the language, etc.

In any event, enhancing OCR would help considerably with more digitization, especially as we get to digitizing publications in extended Latin scripts on stenciled pages and poor quality print of various sorts too often used for materials in MINELs.

Spell-checkers. For someone like me concerned with less-resourced languages, the issues with spell-checkers are different and more basic – so let me get that out of the way first. For many languages it is necessary to get a dictionary together first, and that may have complications like issues of standard orthographies and spellings, variant forms, and even dictionary resources being copyrighted.

In the context of a super-resourced language like English, Dvorak raises a very valid criticism here regarding how the wrong word correctly spelled is not caught by the checker. However, it seems to me that the problem would be appropriately addressed by a grammar-checker, which should spot words out of context.

This leads to the question of why we don’t have better grammar-checkers? I recall colleagues raving in the mid-90s about the then new WordPerfect Grammatik, but it didn’t impress me then (nevertheless, one article in 2005 found it was further along than Word’s grammar checker). The difference is more than semantic – grammar checkers rely on analysis of language, which is a different matter than checking character strings against dictionary entries (i.e., spell-checkers).

Although this is not my area of expertise, it seems that the real issue beneath all of the shortcomings Dvorak discusses is the applications of analysis of language in computing (human language technology). Thus some of the solutions could be related – algorithms for grammar checking could spot properly-spelled words out of place and also be used in OCR to analyze a sentence with an ambiguous word/character string. These may in turn relate to the quality of speech recognition. The problems in MT are more daunting but in some ways related. So, a question is, are the experts in each area approaching these with reference to the others, or as discrete and separate problems?

A final thought is that this “final frontier” – what I have sometimes referred to as “cutting edge” technologies – is particularly important for speakers of less-resourced languages in multilingual societies. MT can save costs and make people laugh in the North, but it has the potential to help save languages and make various kinds of information available to people who wouldn’t have it otherwise. Speech recognition is useful in the North, but in theory could facilitate the production of a lot of material in diverse languages that might not happen otherwise (it’s a bit more complex than that, but I’ll come back to it another time). OCR adds increments to what is available in well-resourced languages, but can make a huge difference in available materials for some less-resourced languages, for which older publications are otherwise locked away in distant libraries.

So, improvement and application of these cutting edge technologies is vitally important for people / markets not even addressed by PC Magazine. I took issue with some of what Dvorak wrote in this column but ultimately his main point is spot on in ways he might not have been thinking of.

Thanks Vadim for your reply. I looked at your DIgital Sonata site and noted that you are doing some work on, among other languages, Swahili (using data from Kamusi). I would be interested in knowing more about what you’re doing with Nazrene in Nairobi and the languages involved.

I am myself a linguist working in this field, and to be honest I find Mr. Dvorak opinions somewhat uninformed. It is true that the final solution has not been made in any of the areas he mentions, but it seems to me that he tried out some programs in the 80’s and formed and opinion that he has stuck to since.

I am myself working in a Machine Translation project where spoken language is first translated by (professional) humans to another language and then by machine to a third language. Everyone evaluating this has noted that a lot of the errors stems from the human translation, and not the automatic.

These techniques work, but there are still restrictions. Speech recognition needs either to be in a limited domain or trained for one particular person and machine translation does not produce perfect translations, but enable people to understand the text or act as the starting point for post-editing.

Thanks for your reply, Soren. I wonder if there is anywhere a broad comparison/contrast of human translation and MT errors. I mean, we all know there are so many ways to make mistakes, but what are the tendencies/patterns, and what do they mean for translation strategies using both. I personally find it useful to use even the mediocre free MT for English < -> French and then clean up as it saves time and reduces inserting my interpretations into the text.

An analogy would be the difference in types of errors between typos and scanos. Spell-checking as far as I’ve ever seen is not geared to interpreting OCR errors which tend to be different than typing errors. I wonder if anyone in that field has analyzed scanning errors for spell-checkers.

Speech recognition is something that some of us hang a lot of hopes on, and I know of some interesting project proposals for African languages. However the dimensions you mention are critical to ultimate usability. Actually there are other issues too, like the availability and quality of corpora for many languages. What’s encouraging is that good people want to work on all this – what’s needed is resources to support the work (same old story).

Well, I’ve got Dvorak beat by over a decade in terms of involvement in speech recognition (though not in continuous involvement in it). I was in the Army for two years in the late 60s with the group in charge of disbursing *all* government money for speech research, and a civilian co-worker and I took a trip around the US to report first-hand on speech recognition research. A lot of smart guys were doing it then, and I can tell you that a substantial portion of them were charlatans (smart charlatans, but charlatans). The problem then was training the system, which had to be repeated for each person. There were some partial successes, enough that the Post Office had bought a system so that package sorters could call out the ZIP code digits, which could be recognized moderately well.

So much money could be made with a system that really worked that people were very interested in maximizing their results in their reports — whence the charlatans. I left the Army early in 1970 firmly convinced that speech recognition worthy of the name would *never* become practical. While still convinced of that (!), apparently such systems as Dragon Naturally Speaking have made great strides since 1970, though I still read curmudgeonly reviews (like Dvorak’s) of these systems. I have at least gotten to the point that I am willing to entertain the notion that statistical systems well-informed by linguistics could save at least some people some transcribing work. I really got so jaded by over-reporting of results in the 1960s, though, that I just can’t go any farther than that.

The problem with OCR, as far as I’ve been able to see as a user of some of the best systems available (OmniPage and IRIS, though I haven’t necessarily used the most recent versions of these lately) is that, unlike how I imagine speech recognition to be currently, they don’t use *enough* AI — most of the recognition errors they make would go away even with just the use of a dictionary. Of course, as you bemoan, the old computer adage applies: Garbage In, Garbage Out, although even that is fixable (or at least betterable) with some decent AI.

Thanks Jim, I appreciate the “ruminations” and background. I always find it helpful to have such perspectives in understanding where things are (and might be) going.

I agree with your comments on OCR. I wonder if the ideal approach in this would be to have a single package for OCR, spell-checking, and grammar-checking (as well as perhaps something like the Controlled English example above), using the same AI applications to handle the range of text-handling tasks within a particular language (adjusted as necessary to task context).

As a user, not developer, I have been involved for a number of years in both voice recognition (Dragon Naturally Speaking since ver. 3) and machine translation (Systran from 1999-2002). I am favorably impressed with recent progress in both areas.

On the voice recognition front, I am amazed at and fully satisfied with the accuracy and speed of the latest speaker-dependent, continuous speach algorithms. I now use NaturallySpeaking v9.5 and the voice recognition aspects of the program have been practical for daily use for several years. Sadly, ever since Dragon Systems was sold to ScanSoft, now Nuance, several years ago, the program has become virtually unusable due to problems unrelated to the voice recognition algorithms. (Ever since version 7 it has become a resource hog that routinely crashes, often affecting other programs – even on my new 2GHz duo core machine – so I rarely use it anymore…)

On the machine translation front, I oversaw the application of Systran’s Enterprise translation system in four language pairs (English -> French, German, Spanish, and Italian) for the real-time translation of personalized email newsletters at NCR Corporation. I wrote several papers documenting that experience which many users deemed successful, but which was later discontinued due to corporate budget cuts. The most valuable insight I gained was this: the target users of machine translation are usually not the people who are asked to evaluate it. Evaluators are, understandably, fluent in both the source and target languages and, as experts, they are alternately amused and apalled by the mistakes MT systems make. Users, on the other hand, are not fluent in the source language (if they were, they would not be using MT) and for them, even a moderately intelligible translation may be much better than none at all. As I wrote in one article on this subject:

“Although most of us do not expect business documents to be pure poetry, when we are presented with information that could plausibly have been written by a 5 year-old, we are apt to be vocal in our displeasure or snide in our laughter. On the other hand, if we find ourselves in a job in which much of what we are expected to do is communicated in a language that we do not understand well, we may be more tolerant of odd words, poor syntax, and awkward style in order to have access to useful information.” (from “Nutzlos, Bien Pratique, or Muy Util?: Business Users Speak Out on the Value of Pure Machine Translation” – see http://www.roi-learning.com/dvm/pubs/articles/tatc-24/)

Thanks Verne, you make some excellent points. I actually have used Systranet.com for translation of Chinese to English – the results are obviously way far from production, but more than sufficient to facilitate understanding of the gist. In tandem with hanzi < -> pinyin converters, Chinese IME, and an online dictionary (plus my bare minimal knowledge of Chinese) it is possible to do simple research on Chinese sites. That’s a big deal, with MT at the center.

I would like to communicate about Systran EN, FR, PT pairings which are significant in some panAfrican email communications.

I realize that what I am going to write will look like hype, but I cannot help it: I cannot but express my sheer amazement at the complaints about the OCR quality in the Dvorak’s original article and meek objections on the part of those who replied.
As long as you use ABBYY FineReader you get 97-99% accuracy at the recognition of typed text in several dozens of languages, and it became a kind of standard level for OCR here in Russia, so any grumbling at the OCR quality is just incomprehensible.

Thanks Sergei, glad you brought this up. I am familiar with the ABBYY product by name and reputation, but have never used it. One of its strengths from what I hear is handling extended-Latin scripts such as are used in some African languages.

I may be wrong but I think that most folks will agree that OCR of clean copy is remarkably good. Where I have seen problems is where there is some extraneous marks that even when minimal, can throw the software way off. I would like to work more with OCR (my own copy once I can afford the hardware & software) to test the capabilities in some of my work, but that’s for later.

Two years later…I still find Dragon insufficient (I get about 90-95% accuracy, but I need at least 99.9% to make it better/faster than transcribing myself), OCR chaotic without serious human intervention, MT usable only if you know both languages anyways, and I’m in desperate need of something better than Grammatik…

When you consider how far computer software really has come, taking into account the ability to use server clusters with 100000s of cores, I’d have to say that I agree with all of Dvorak’s points as laid out by you. However, I agree with you even more so that grammar checkers are rudimentary. I have yet to find anything better than Grammatik, and that isn’t intended as a compliment of Grammatik. I think Google has the answer, but just has to want to do it. I believe Google does more complex language analysis on webpage content than any current language software programs. Google just has to assign a few programmers to tweak some of their analyses and integrate it into their word processor and I can see myself finally having a reason to use their Web office tools.

With the aid of a server warehouse, if someone were willing to dedicate the time, it should not be all that difficult to write a database with every possible correct grammar permutation. It would take some man hours to set up and there would be errors, but it would just be a matter of really wanting to do it. The technology is there. For that matter, Google could take a few million of the books it has scanned from reputable publishers and use the sentences in those to create comparative analyses of text, making the assumption that any sets of text that occur X times are likely to be acceptable. There would be errors at first, but that’s what beta testing is all about.

Another option would be to go in and manually program each dictionary word as an object with assigned properties dictating each word’s use. This would take longer, but is likely to be more accurate. This option could also be combined with the previous method for more comprehensive accuracy. Whenever disagreements occur between what is drawn from published works from respected publishers and a word’s object properties, an editor would have to analyze it manually and determine if the word needs it’s properties changed or if the usage of the words in the books should be labeled as incorrect usage.

So it really shouldn’t be that complex to create a grammar checker using computers to analyze the language to come up with a set of language rules, for which it can then write.

In the mean time, I absolutely find those technologies lacking. But I think the grammar checker is the most lagged when you consider what technology can do now.

Voice recognition is extremely complex and I think we’re about a decade away from usable without human intervention.

MT is complex because most aspects of language really have to be interpretted, rather than translated. That means computers will have to actually have a type of artificial intelligence/neural network capable of cognitively grasping a sentence and then writing another sentence with the same meaning. (The other option is a monster database of translations for every likely permutation. This database would be too enormous for current practical application.) I think we’re at least a decade away from accurate MT, though current MT is sometimes the best option.

OCR is another cognitive task, but fortunately with a more limited number of permutations. Printed text with common fonts is already faster to scan and edit than to type. However, reliable OCR that can make up for splotches and missing pixels, figure out good handwriting, and read uncommon fonts is probably quite a ways away from the present; not as far as voice recognition.