Details on How BibleTrans Works


To properly understand BibleTrans, you need to consider three parts: the semantic database, the target language grammar, and the translation engine.
 

Semantic Database

There are two key concepts to understanding the BibleTrans semantic database: the ontology, and the morphology.

Ontology refers to the set of concepts we use to encode the Biblical text, sort of like an alphabet or vocabulary, a finite set of numbers that we can arrange in infinite different ways to communicate all the ideas of Scripture. Yes, they are numbers. This is, after all, a computer thing. Everything inside the computer is numbers -- even the text is just a sequence of numbers representing the letters of the alphabet (65=A, 66=B, etc.), and similarly pictures are just a set of numbers representing the color of the pixel at each point on the screen.

For the BibleTrans ontology we use the Louw&Nida Greek-English Lexicon by Semantic Domains, where every different sense of every Greek word is assigned a different number: 12.1 is God (Greek qeoV theos, really), 25.43 is love (Greek agaph agape), and so on. To this we have added a couple hundred structural concepts not represented by any Greek word. Because Drs Louw and Nida did a very good job of isolating each separate concept to its own unique number, we get a completely unambiguous representation of what the Greek says.

Morphology is about shape, in this case how the concepts relate to each other. In the Greek text (and English and every other spoken language) the concepts/words are just strung together like beads on a string, but when we think about it, that's not what we are thinking. We think in sentences and noun phrases and questions+replies and so on. In the English sentence "I was walking past the pink house on the corner," the two words "was walking" are spoken together because the verb "to walk" does not by itself completely express the fact that this sentence is about the extended duration of time that I was walking in a certain place, and that it was sometime in the past. You can also say "I walked past the pink house," and the focus of the sentence no longer emphasizes the duration of the stroll, but only that I walked (not riding a bicycle or some other means of locomotion). Furthermore, this is the pink house on the corner, not some other house possibly on some other part of the street or some other color, and we know from the form of the sentence that the hearer is expected to know which house I am referring to. "The pink house on the corner" is a noun phrase, and "was walking" is a verb phrase. In grade school, some of us had to learn how to diagram sentences. It seemed onerous and artificial at the time, but that is really what is implicitly going on inside our heads, even if we don't use words like "noun phrase" to describe it.

Those diagrams we learned how to make on the blackboard in grade school describe the shape of English sentences. If we substitute Louw&Nida numbers for the words, the connecting lines are the essence of the BibleTrans semantic database morphology. Here is a piece of John 3:16

We call this a "tree" because of the branching relationship linking the concept numbers together. In this picture you also see differently shaped icons at each node (branching point) of the tree, and labels on the icons and on the linking lines. Those text labels are for the conveninece of the person looking at the tree data. There are only numbers in the database. Even the links are numbers.
 

Target Language Grammar

When adults first learn a language, they normally start thinking about how to say each idea they have learned so far (and often how that is different from their native language). After a while it becomes second nature and they just say it, which is more the way children learn a language from the start. BibleTrans is like the adult learner, in that we tell it how to say each of the concepts in the BibleTrans ontology, and give it some rules for putting these concepts together. For example, the typical English sentence starts with a noun phrase (the subject), then the verb with all its helper words, then possibly a direct object. In German the structure is similar, except that the compound verbs are split apart and the important verb is moved to the end of the sentence. It sounds strange to us, but perfectly natural to them. In BibleTrans you just tell it where the parts of each sentence go, something like rearranging the parts of this string by just dragging them into position:

Other parts of the grammar consist in putting the proper endings on words to express tense of verbs, agreement with nouns, and person and (in languages where it matters) gender and honorifics. These are most easily summarized in paradigm tables, and that is also how the linguist would put that data into BibleTrans.

Finally, there is vocabulary. For most Louw&Nida word concepts in the ontology, we need only to say "this is a verb of declension 3, and the root is spelled..." or "this is a feminine noun spelled..." and so on. This goes fairly quickly after the basic forms of the language have been specified.
 

BibleTrans Translation Engine

The BibleTrans translation engine is like a little robot that walks around in the semantic tree, and at each node looks in its grammar to see what to do with that concept. That's all! All the complexity is in the grammar, where it should be. For example, in the John 3:16 example (see the picture above), the engine arrives at the root proposition (a formal word meaning sentence clause) and gets ready to do a sentence clause. This requires that it know who is the subject (God) and the direct object (people of the world), and of course what God did to those people (love). This grammar knows that the subject comes first (in some languages it's in a different position), then the verb, then the object, followed by subordinate clauses, in this case what God did as a result of loving the people of the world (but that's getting ahead of the story).

The subject is a proper noun in English, so it gets capitalized, which would happen anyway because it's the first word of the sentence (the grammar must know these things!), and it's a masculine person, so it also gets assigned to the 3rd-singular-masc pronoun for later reference.

Now we come to the verb. This is distant past tense, and it includes a modifier expressing intensity of quality. In English some adverbs come before the verb, and other come after the verb. The grammar must say which. The translation engine just steps through the word order template and plugs in each part where it goes.

The BibleTrans engine generates output text just the way you would speak it, each word in order. It never goes back to fix a mistake or missing word. Instead it plans ahead like those German speakers, holding in mind words that come later until it is time to say them. If you think about it, you will realize that we do that in English too, but mostly without thinking about it. Suppose you are watching the kids on a playground, and one of them strikes another. Your thinking might go (very quickly) like this: "Oh, the girl! The boy! He hit her!" But when you get around to speaking it, "The boy hit the girl," even though you were thinking about the girl first, and had to hold it in mind until you got to that part of the sentence. In BibleTrans we do this with what we call variables, places to hold parts of the sentence until it's time to say them. In fact, the engine generally goes on a fact-finding mission around the sentence first, to see what it needs to know before it says anything at all. That way if there are agreement issues, you know about them soon enough to do the right thing. This is less obvious in English which is mostly uninflected; in other languages it's very important.

Let's follow the BibleTrans translation engine as it walks through the John 3:16 tree, which we have reproduced in more detail here:

The engine begins with some initialization that does not concern us here, and proceeds to examine the John3:16 Proposition node. This is concept number 0.5, so it opens the (very large and complex) rule number 0.5 and sets up the parameters for a conventional sentence main clause. After a quick scan through the subtrees to see what's there, it knows that the subject comes from the Agent subtree, and the direct object comes from the Patient subtree.

The subject is first. The 0.3 Thing node is a prototypical noun phrase, which the long and complex rule 0.3 sets up, again by scanning through its subtrees to see what it has. This is a relatively simple noun phrase, with no adjectives (which come before the head noun in English), and no relative clauses (which come at the end of the noun phrase in English). The lexical entry rule 12.1 God tells us that (in English) "God" is a proper name and takes no article. Other languages (like Greek) are different. The 94.265 node is significant in English only with demonstratives like "this" and "that"; other languages use different syntax depending on whether the noun is near or far or even out of sight, so the information must be available in the tree for those languages. So for us today the noun phrase rule has very little to do: just spell out the name "God" and set the verb agreement to 3rd-person-singular and the 3rd-sing-masc pronoun to noun #1, then return to the proposition rule.

Following the form for generic main clauses, the BibleTrans translation engine now advances to the verb. We must be concerned with tense, aspect, and mode. Tense has to do with when this event took place, which this tree node is marked for the unknown past, because the Greek text does not inform us as to exactly when God was loving the people of the world (that may be a hot theological question, but it's not a translation issue). Aspect is about the duration and kind of action, whether the action of the verb is instantaneous or over a period of time, whether we are looking at starting or finishing, and so on. The Greek text uses the aorist tense, which tends to disregard the durational aspect of the verb action, but we suppose that God's love did in fact extend over a period of time, so it is marked as durative. This is called "implicit information" because it is not explicit in the text. Mode refers to whether this is a question or contrafactual (subjunctive in English), or something like that, which this clause is normal. There is a separate rule that knows how to inflect verbs for past, present, or future, and for subject agreement in the present tense.

The next part of the clause is the direct object. If there were an indirect object already assigned to a pronoun, that would come before the direct object (as in "Bill hit me the ball"), but that is not the case here. The direct object is the people of the world (the Greek word is kosmos, but it obviously is not the physical universe that God loved, but rather all the people in it), and they are semantically plural even though the Greek word is singular, so the tree is so marked. The default definite referential (not shown, because it is the default) causes the noun phrase rule to generate a definite article, so we have so far: "God loved the people of the world..." This is assigned to the 3rd-plural pronoun, which is independent from the 3rd-sing-masc pronoun (as well as the 3rd-sing-fem pronoun, which is so far still unassigned). Note that "people of the world" is generated as if it were a single word, because as far as BibleTrans is concerned, it is.

Adverbs of degree can precede the verb in English, but it's not necessary, so this English grammar doesn't bother, and it comes here after the direct object.

The last part of this clause are the words "so that" connecting the main clause to the subordinate clause, which tells the reader what the result of God's love was. This is another proposition, and it restarts the 0.5 rule at the beginning, but this time for a subordinate clause. The subject is again God, but now the pronoun generator notices that it has been assigned to a pronoun, so it substitutes that pronoun ("he") for the subject. Similarly, when it gets to the Son, it is (implicitly, because the Greek doesn't say) God's son, and the pronoun generator generates the correct pronoun in the possessive case, "his".

The tense of this second clause is set to historic past, certainly for us reading this verse, and possibly also for Nicodemus when Jesus was speaking to him. However it's not clear if this is part of what Jesus said, or a commentary by the evangelist (the red letters in your Bible are a later addition, inserted by the editors, not part of the Greek). In any case, the Greek verb is past tense, so we preserve that fact in translation.

As we noted earlier, when the indirect object is a pronoun, it comes before the direct object in English, which is the case here, because the 3rd-plural pronoun was assigned to the people of the world, and the tree specifies (again implicitly) that God gave his son to them.

This clause also has a subordinate clause, the purpose for this gift, which is that everybody who believes would not perish but have eternal life. It's not necessary to go through all that now, because you can see how it's going.

Back to main BibleTrans page
 

rev. 04 Feb 7