Louw&Nida: The Best Ontology

Apart from all the consistency checks we have put in to make it harder to build incorrect semantic trees, BibleTrans does not much care what ontology we use to encode the semantic data: everything is just numbers and (user-supplied) rules for generating text from those numbers. The first working prototype of BibleTrans in 1996 used an ad-hoc ontology similar in concept to Wierzbicka molecules, but such an approach has difficulty being robust, complete, and consistent. I met Steve Beale shortly after I had this prototype running. He was working on a PhD in machine translation (MT) at the time, and provided me with numerous valuable insights concerning the nature of MT ontology. He recommended, and I adopted his recommendation at that time, that I use the L&N lexicon as a base ontology in BibleTrans. That is still a good choice, as I explain here.

The main advantage that Louw&Nida brings to Bible Translation is that the concepts are complete and reasonably well-defined. If you need to represent an idea expressed in the Biblical text, there is probably a L&N numbered item expressing that concept. However, the L&N lexicon only deals with semantic concepts directly tied to Greek lexical words; there is substantial semantics in the case and tense morphology, as well as the word order and discourse genre. We needed a little more.

In 1999 Tod Allman, who had his own prototype translation engine similar in concept to BibleTrans, sought a meeting between the three of us, during which we were able to agree on a unified ontology based on the L&N lexicon enhanced by what we called the Allman-Beale-Pittman format, or ABP for short. BibleTrans still uses this ontology essentially unchanged, because nothing better exists.

Steve Beale, however, is trying to generate grammars automatically from an existing translated corpus, and L&N lacks the atomicity and orthogonality to make that feasible. In my opinion, that is a problem inherent in natural language and human variability, which can be overcome mechanically (that is, using computers) only by examining corpora consisting of many millions of words, if at all. Of course among the first one million words translated into any language is always the complete Bible, which makes Beale's efforts somewhat futile for Bible translation. When I last looked at his work, Beale has achieved small successes, but only by using his own hand-translated documents, which naturally embody the same mental model used by his generation engine. In any case, three months after our 1999 meeting, Beale told me ``I am really getting to hate L&N. It is not an ontology!'' Of course it is an ontology, and a very good one for Bible translation, just not for the artificial intelligence kinds of things Beale is trying to do.

About the same time, Tod Allman broke away for his own reasons, and together with Beale they developed an English-like ontology based on Wierzbicka molecules. Combined with his existing word-substitution translation engine, Allman turned it into a PhD eleven years later.

It is customary in scientific literature, but not particularly in the public interest, to omit mention of failures and blind alleys. I was not surprised, therefore, to find no reference to L&N in either Beale's nor Allman's published documents. But BibleTrans represents ``prior art'' in Allman's case, and is so similar in concept to his own work that it should have been cited to show how his implementation differs from it, as I have done here. BibleTrans is the better technology and I have no such fear of comparison.

Allman's TBTA offers two significant advantages over BibleTrans. First, BibleTrans has a much higher initial development cost before we can start translating Old Testament texts, because Louw&Nida only covers New Testament Greek; it would need to be extended for Hebrew and Aramaic. Furthermore, the English-like TBTA ontology affords a much lower buy-in cost. The ontology is not inherently restricted to a predefined set of concepts like L&N, so any time a database encoder gets stuck or is too lazy to find an existing concept among the many thousands of previously defined terms, they can invent and add to the ontology a new and potentially inconsistent concept. That makes the encoding process go much faster, but at a horrendous cost in the back-end translation grammers, which must deal with all these inconsistencies. Allman reports 50 chapters from the Bible already encoded, which is reasonable for one guy doing it, but it is less than 5% of the total. We have only 8 chapters for BibleTrans (again, only one person), but we spent comparably less time doing it. But both TBTA and BibleTrans must aggregate the efforts of several people to encode the whole Bible, and that is where the differences will start to become apparent. Perhaps Allman can keep a tight rein on his encoding team, but I found it difficult to maintain consistency over the tiny ontology I worked with (alone), before switching to L&N.

The cost of the back-end grammars is already becoming apparent. Allman reports about 300 grammar rules in a typical translation grammar for his few chapters of test, where BibleTrans needs about half that. His actual translation into English of Philippians is comparable to that by BibleTrans. There is insufficient data at this time to determine which engine is easier to use.

It is important to recognize that any translation will introduce inaccuracies in mapping the ontology of the source language into that of the target language, but the L&N concepts used in the BibleTrans semantic database are still first-century Jewish Greek, so the losses in the first stage of translation (the database encoding) are minimal. All of the BibleTrans translation errors come from the single conversion from this ontology to the target language. TBTA is like the so-called ``front translation'' method of two-step translation, which is lossy in both steps. Compared to manual translation, I suppose the extra errors from TBTA are tolerable.

Tom Pittman