BibleTech: 2009

I’m considering putting in a proposal for BibleTech: 2009, but I’m not sure yet. I need to decide soon because the deadline is Monday.

If I make a presentation proposal it would probably focus on the the application of SIL’s Language Explorer to Koine Greek. I would probably focus on morphology and parsing, since that’s what I’ve done the most on, though I might move into the program’s potential for word studies and dictionary creation as well, since I’ve dabbled quite a bit with that (though that’s the dabling that I lost with my computer crashing a few weeks ago). I’m thinking of a proposal something like:

At BibleTech: 2008, SIL’s software development team presented some of the software they are developing for linguists and translators to use on the field. The focus of their presentation was specifically on programs that related to the task of translation. Thus the Data Notebook, Graphite, Translator’s Workplace, and the Translation Editor were the focus of their presentation. Language Explorer, on the other hand, is a program developed more specifically for the linguistic work that accompanies a translation. This includes language analysis, morphology, syntax, discourse analysis, and dictionary making for the many languages that have not been studied, much less received a translation of the Bible.

But exactly because of the program’s FLExibility (FLEx = Field Works Language Explorer) for the description of any language, it is perfect for the analysis of Biblical languages as well. This paper seeks to show the value of FLEx for the study of Biblical languages with an eye toward lexicography and especially the program’s potential for morphological parsing of Koine Greek texts though morpheme-by-morpheme interlinearization and the development of a morpheme lexicon.

Now I’m not sure if this is the best expression of what I’m thinking. Essentially I want to present the work that I’ve done (and hopefully will continue to do between now and March 27th) with an eye toward the possibility of actually being able to automate the process (which FLEx is designed to do). My end hope is that I’ll have a rough morphology of Koine Greek built/written that would make it possible for me to parse various Greek texts that do not have morphological databases readily available – such as the Duke Papyri, the Packard Greek Epigraphy or perhaps even some of the earlier texts from Migne’s Patrologia Graecae. I would love to parse Chrysostom, though that might require some adaption. How much the language had changed in those few hundred years between Paul and Bishop John is beyond my knowledge.

But all of that is a long way off. Right now, I’ve only finished indicative ω verbs (both regular and contract). Goals for March would probably be the completion of verbal inflectional morphology and then also at least one of the noun declensions – though I’ve been in communication with one SIL linguist who believes that he can explain Greek noun variation through phonology rather than declensions. I haven’t seen how just yet, but I’m very curious and we’re continuing to communicate.

Sounds a great idea! SIL does amazing linguistic work on target languages, and needs to match it with the source languages. But I guess you will have quite a lot of ambiguities to deal with. And don’t go down the rabbit trail which a previous SIL person went down of trying to turn parsing of Greek into an automatic translation program – it didn’t work!

> make it possible for me to parse various Greek texts that do not have morphological databases readily available – such as the Duke Papyri, the Packard Greek Epigraphy…

I’d suggest that you not begin with papyri and epigraphical materials. I think you’d find too many lacunae and irregular forms to implement of “first draft” parser. The papyri are notorious for containing non-standard forms and just plain lots of spelling mistakes. Auto parsing is always challenging and subject to a LOT of manual revision, but if you want to experiment with that, tackle an edited text first–one that’s closer to a formal or literary genre rather than the scribbles of some of the papyri.

Peter: Are you serious, someone tried to do that? I don’t think you need to worry. I won’t head down that path.

Dr. Decker: Any possibility of looking at the papyri would be an incredibly long way off that, right now, is no more than a distant desire. And now that you’ve reminded me of the issue of spelling, it might be less than that.

Here is part of what I wrote about this project in 2002, for a lesson at SIL ETP and following a discussion of Babelfish:

A linguist called Tom Pittman, who has worked in association with SIL, has tried to develop a rather different method of computer translation called BibleTrans. This approach is based on the idea that the hard part of the translation process is the exegesis, and that re-expression in the target language is easy; also that, in the case of Bible translation, the exegesis (in principle) only has to be done once, and the re-expression many times. So he has attempted to define a language-independent abstract representation of the meaning of the original text, in such a way that it can be re-expressed easily in any target language. The step of representing the meaning (Tom speaks of “encoding the meaning”), in the “semantic database” is hard and cannot be done by computer, but only has to be done once. Re-expression in each of many target languages is intended to be easy; it is supposed to be simple to define a set of rules for each target language for correct expression of the meaning. …

Apparently the BibleTrans project has recently (March 2002) been terminated, with only a small part of the “semantic database” complete. …

The most that this approach can possibly claim to do is to produce a rough first draft in each target language.

The BibleTrans project website is no longer available.

So this project was not based on an automatic parse of the Greek, but I think that was its starting point.

But I can think of one target language (apart from modern Greek) for which an automated adaptation of the Greek just might produce a viable first draft translation: Russian! Greek and Russian have so much in common syntactically that an adaptation might at least be comprehensible. But it would need to be done with a highly complex analysis and synthesis tool, working not just at the word level and able to do some reordering.

Well, Mike, I will believe that even the most sophisticated machine translation is useful for Bible translation when I see programs like BabelFish able to translate even simple Bible texts meaningfully from any one language to any other (which is not very closely related). I did a test of this in 2002 and was appalled at how bad the results were even from German to English. Here (repeated today) is its version of Psalm 23, showing a complete failure to parse the German:

The gentleman is not my Hirte, me anything will lack. It feasts me on a green Aue and leads me to the fresh water. It refresh my soul. It lead me on right road around its name sake. And whether I already walked, am afraid in finstern the valley I no misfortune; because you are with me, your putting and staff comfort me. You prepare a table in the face of my enemies before me. You salbest my head with oil and gives me fully. Good and mercy will follow me my life long, and I will remain in the house of the gentleman always.

I tried the same German text in Google Translate, and it does somewhat better, at least putting the “not” in the right place:

The Lord is my shepherd, I shall not want. He graze me on a green and Aue führet me to fresh water. He erquicket my soul. He führet me on the right road to his name’s sake. And whether I have wandered in the dark valley, I fear no accident, because thou art with me thy rod mating and comfort me. You already have a table before me in the face of my enemies. You anoint my head with oil and give me fully. Good and mercy will follow me my life, and I will remain in the house of the Lord forever.

Google also offers quite an impressive selection of languages. But it doesn’t do too well with Ephesians, presumably because it is expecting modern Greek. And this is what it did with the original Hebrew of Psalm 23:

1 song Uncle Jehovah Aray not Ahsser:
2 Irvicni the veld – who Mngeota Inalane:
3 mental Ishobev Engeni Evmogali – justice for his name:
4 and that – not go Evgye shadow of death – Ira that bad – you Amdy שבטך ומשענתך Hama Engemni:
5 תערך ago | שלחן against Carary Esnat Kossy head with oil saturation:
But 6 | Good benevolence Eradpuni all – days of my life in Oshevti – Jehovah longevity: