Apart from all the
consistency checks we have put in to make it harder to build incorrect
semantic trees, BibleTrans does not much care what ontology we use to encode
the semantic data: everything is just numbers and (user-supplied) rules
for generating text from those numbers. The first working prototype of
BibleTrans in 1996 used an ad-hoc ontology similar in concept to Wierzbicka
molecules, but such an approach has difficulty being robust, complete,
and consistent. I met Steve Beale shortly after I had this prototype running.
He was working on a PhD in machine translation (MT) at the time, and provided
me with numerous valuable insights concerning the nature of MT ontology.
He recommended, and I adopted his recommendation at that time, that I use
the L&N lexicon as a base ontology in BibleTrans. That is still a good
choice, as I explain here.
The main advantage that
Louw&Nida brings to Bible Translation is that the concepts are complete
and reasonably well-defined. If you need to represent an idea expressed
in the Biblical text, there is probably a L&N numbered item expressing
that concept. However, the L&N lexicon only deals with semantic concepts
directly tied to Greek lexical words; there is substantial semantics in
the case and tense morphology, as well as the word order and discourse
genre. We needed a little more.
In 1999 Tod Allman, who
had his own prototype translation engine similar in concept to BibleTrans,
sought a meeting between the three of us, during which we were able to
agree on a unified ontology based on the L&N lexicon enhanced by what
we called the Allman-Beale-Pittman format, or ABP for short. BibleTrans
still uses this ontology essentially unchanged, because nothing better
exists.
Steve Beale, however,
is trying to generate grammars automatically from an existing translated
corpus, and L&N lacks the atomicity and orthogonality to make that
feasible. In my opinion, that is a problem inherent in natural language
and human variability, which can be overcome mechanically (that is, using
computers) only by examining corpora consisting of many millions of words,
if at all. Of course among the first one million words translated into
any language is always the complete Bible, which makes Beale's efforts
somewhat futile for Bible translation. When I last looked at his work,
Beale has achieved small successes, but only by using his own hand-translated
documents, which naturally embody the same mental model used by his generation
engine. In any case, three months after our 1999 meeting, Beale told me
``I am really getting to hate L&N. It is not an ontology!'' Of course
it is an ontology, and a very good one for Bible translation, just
not for the artificial intelligence kinds of things Beale is trying to
do.
About the same time,
Tod Allman broke away for his own reasons, and together with Beale they
developed an English-like ontology based on Wierzbicka molecules. Combined
with his existing word-substitution translation engine, Allman turned it
into a PhD eleven years later.
It is customary in scientific
literature, but not particularly in the public interest, to omit mention
of failures and blind alleys. I was not surprised, therefore, to find no
reference to L&N in either Beale's nor Allman's published documents.
But BibleTrans represents ``prior art'' in Allman's case, and is so similar
in concept to his own work that it should have been cited to show how his
implementation differs from it, as I have done here. BibleTrans is the
better technology and I have no such fear of comparison.
Allman's
TBTA offers two significant advantages over BibleTrans.
First, BibleTrans has a much higher initial development cost before we
can start translating Old Testament texts, because Louw&Nida only covers
New Testament Greek; it would need to be extended for Hebrew and Aramaic.
Furthermore, the English-like TBTA ontology affords
a much lower buy-in cost. The ontology is not inherently restricted to
a predefined set of concepts like L&N, so any time a database encoder
gets stuck or is too lazy to find an existing concept among the many thousands
of previously defined terms, they can invent and add to the ontology a
new and potentially inconsistent concept. That makes the encoding process
go much faster, but at a horrendous cost in the back-end translation grammers,
which must deal with all these inconsistencies. Allman reports 50 chapters
from the Bible already encoded, which is reasonable for one guy doing it,
but it is less than 5% of the total. We have only 8 chapters for BibleTrans
(again, only one person), but we spent comparably less time doing it. But
both TBTA and BibleTrans must aggregate the efforts
of several people to encode the whole Bible, and that is where the differences
will start to become apparent. Perhaps Allman can keep a tight rein on
his encoding team, but I found it difficult to maintain consistency over
the tiny ontology I worked with (alone), before switching to L&N.
The cost of the back-end
grammars is already becoming apparent. Allman reports about 300 grammar
rules in a typical translation grammar for his few chapters of test, where
BibleTrans needs about half that. His actual translation into English of
Philippians is comparable to that by BibleTrans. There is insufficient
data at this time to determine which engine is easier to use.
It is important to recognize
that any translation will introduce inaccuracies in mapping the ontology
of the source language into that of the target language, but the L&N
concepts used in the BibleTrans semantic database are still first-century
Jewish Greek, so the losses in the first stage of translation (the database
encoding) are minimal. All of the BibleTrans translation errors come from
the single conversion from this ontology to the target language. TBTA
is like the so-called ``front translation'' method of two-step translation,
which is lossy in both steps. Compared to manual translation, I suppose
the extra errors from TBTA are tolerable.
Tom Pittman