The BibleTrans Virtual Machine


This document describes the virtual machine (VM) language that BibleTrans compiles rules into for execution during translation. Both the compiler and the execution engine are in the ExecEng program that is run as a separate thread within BibleTrans.

The VM comprises three components: a store arranged as a stack with deep name-search ability, an execution unit that fetches out byte-codes from "Code" resources in the selected language file and executes them, plus some minimal I/O ability, input from Tree resource data in a selected Tree data file, and output to structured text resources again in the language file. The Tree data is described elsewhere, mostly in the online documentation for building semantic trees.
 

Stack

The stack aggregates 32-bit tagged data into a LIFO stack, using two integers to hold the type tag and name, plus the data itself. Names are numerical indexes, the line numbers of a text list of names stored in the Vars#2 resource in the language file. An index to those lines is stored in the Vars#1 resource in the same file. The name list is built when the grammar is compiled; thereafter all reference to those names is by number, which is very fast.

The tags support three fundamental data types: integer, tree, and text, with a null tag serving as integer 0 or a null tree node or the empty text string. Short strings of five characters or less are packed into the stack item directly; larger strings are indexed and reference-counted in a string variable in memory (which disappears when the translation finishes). individual strings are also removed  from the list when the last reference to them disappears. Booleans are like C, zero/null being false, anything else (nominally 1) true.

There are also four special tags for marking subroutine (rule) frames.

All variables are stored on the stack -- possibly multiple instances, with global variables at the bottom, then other instnaces allocated when that category is opened. When the compiled code calls for a variable by name, the stack is searched from top to bottom for the first item with that name, and returns its value. Stores into a named variable only store it into the variable if it is in the current subroutine frame; otherwise (and explicitly new variables) the value with its name and type is pushed on top of the stack. Normally a rule begins by pushing default values for all its local variables. These are automatically removed when the subroutine exits. The SetVar rules allow the first named variable deeper in the stack to be altered.

The stack can be viewed in the Debugger window, and also in the DebugLog log file when logging it is enabled.
 

Predefined Variables

Some variable names are defined for use within the translation engine. Some of these are private to the engine, and are distinguished by a space in the name (which cannot occur in user-defined variables). However, it may be useful to know about these when examining the stack. Other variables have predefined names but only do something if the user has declared them somewhere in the grammar. These are of course ordinary names.

The following are the predefined variables:
 

Inaccessible Predefined Variables

Node # -- This variable contains a reference to the current active node in the tree, when calling a lexical rule.

Pass # -- This variable is blank when prescanning a subtree for analysis, and then either 1 or -4 when generating text. The -4 is generally the second pass through the previously prescanned subtree, while 1 is the default single-pass value.

Pron # -- This variable holds the value returned by the pronoun selector, until it is time to set the selected pronoun to the current noun number.

Cont # -- Two variables carry context information into a syntax line rule. This variable is all zero except for a bit that tells the syntax line which of its items is a tree list variable loaded with a content subtree.

Line # -- This variable is the second of two parameters to a syntax line rule. It holds a number representing the selected line, when there are more than one to choose from. This variable also holds a copy of the tree list variable sent to built-in rule "Do Tree List".

if res -- This variable holds the value returned from a Conditional Value rule, until it is copied into a variable or used in constructing another conditional value.

is Prop -- This variable is used to separate Thing-modifier adpositions from Proposition-modifier adpositions, which have separate shape connection rules, despite that they are the same shape in the tree.
 

User-Accessible Predefined Variables

Gloss -- When you add a Gloss item to your syntax line, the value of this variable will be displayed on a second (interlinear) line in the structured output window as a gloss for the text generated at the same time.

LexRuleForm -- This variable is preloaded with the lexical rule form number (between 1 and 31) whenever a lexical rule is called, but only if LexRuleForm was previously blank. If your grammar calls for different lexical rule forms for the same class of tree node, and the different forms correspond to different syntactic structures in the generated text, you can use the rule form number stored in this variable to select your syntax line or in a conditional value to effect the syntax variations. If you are re-using this variable, you may need to clone it or explicitly clear it to blank for a new value to be set.

ListCount, ListPosition -- When a Variable Connection rule is linked to a variable used in a Syntax Line, and that variable is also connected to a "content" or "modifiers" slot in a Tree Connection (node shape) rule for that syntax line, then the variable connection rule gets called individually for each node in the subtree list. Two variables track the position in that list of nodes. ListCount holds the total number of nodes in the list, and ListPosition starts at 1 for the first subtree, and increments for each subsequent node until the last, when ListPosition = ListCount. You can use these variables to generate different syntax for the first or last items in the list (or any in between, if you so wish).

PronSuppress -- When you check one or more of the checkboxes at the bottom of the Setting Up Pronouns page, the bits of this variable will reflect the checked pronouns. You can change this variable during translation to alter which pronouns activate from time to time.

Something -- If you define this variable in your grammar, then BibleTrans will load it with the concept number (either 3 or some value 91-96) of anything that could be the parent node over 0.220 Something. Inside the lexical rule for 0.220 you can test this variable to see whether it is a semantic role (meaning that the role is completely unspecified), or a 0.3 Thing node, which means that the noun here is not particularly known, but does have modifiers, for example if it is the placeholder for a content question (as would be translated "what?" in something like "What is under the bed?")

CurrentVerse# -- If you enable the automatic Generate Verse Numbers checkbox in Verse Numbers and Punctuation, then this variable holds the current verse number, after it has been generated into the output text. There are places in the BibleTrans tree where the verse reference is encoded multiple times, or you could in the course of generation repeat a subtree with a verse reference; this prevents the verse from showing up several times in the output. Alternatively, you could set a blank here to force it to repeat.

NewParagraph, NewSentence -- These two variables hold 1 (true) when the corresponding 0.312 Paragraph or 0.313 Sentence nodes have been seen in the current subtree proposition, and are cleared to blank (false) at the sentence boundary during generation, if the automatic Capitalize First Word checkbox in Verse Numbers and Punctuation has been enabled, or if there are any non-blank entries in the sentence punctuation table. You can set these variables explicitly to force a new sentence or new paragraph where there is none in the semantic tree.

IllocutionSeen -- This variable captures the illocutionary force of the current proposition, as encoded in the marker nodes 0.122 Imperative - 0.124 Yes-No Interrogative, to select an appropriate entry from the sentence punctuation table in Verse Numbers and Punctuation. A value of 1 represents 0.122 Imperative, and 2 represents some form of question; zero is neither. This value is multiplied times 4 after it is evaluated, because the next sentence analysis occurs before the prior sentence final punction is generated.

ReciprocalNoun -- If you define this variable in your grammar, then BibleTrans will load it with a reference to whatever 0.3 Thing has a noun number matching that of the 0.91 Agent (or 0.96 Participant). This happens during the proposition prescan, when the 0.210 Reciprocal marker is encountered. Otherwise the variable is untouched (presumably blank as initialized).
 

VM Opcodes

Most VM operations are a single byte and may operate on whatever is at the top of the stack. A few operations take a 1- or 2-byte operand immediate following it in the code; 2-byte numbers can be integers or variable name indices, or subroutine name references. The mnemonic names shown here are used in the DebugLog disassembly during compilation or execution (when enabled). The operations are listed here in numerical order, with unused operators omitted:

00 Nop -- No Operation
Do nothing. This operation should never happen in properly compiled code.

01 nnnn Lino -- Line Number
This is used to identify a source line number, so that the Debugger can stop here and mark the correct "line" of the rule source code.

02 nnnn OpFr -- Open Frame
A new frame is opened for the indexed rule, before parameters for that rule invocation are pushed onto the stack above it, then a CallFr jumps to the rule code. When the rule exits, everything down to the frame marker is popped off.

03 CallFr -- Call Framed Rule
This assumes a prior OpFr, and jumps to the designated rule.

04 Stop -- Pause
This allows programmed (static) breakpoints. Most breakpoints are dynamic, linked to line numbers or variables or tree nodes or particular output sites.

05 CallLN -- Call Lexical Rule
This assumes that a Tree node is on top of the stack, and opens up a frame for the lexical rule corresponding to that tree node, then jumps to it. If the top of stack is null, then no rule is called. In either case, the tree node or null is removed from the stack top when it finishes.

06 AnoLst -- Iterate List of Tree Nodes
This is like CallLN, except that the called lexical rule returns to this same opcode with its sibling tree node on the stack, so that a list of nodes (typically a noun or proposition modifier list) has each node's lexical rule called in succession.

07 EnoLst -- Iterate All but Last of a List of Tree Nodes
This is like AnoLst except that it stops and leaves the last node on the list (or null, if the list is empty) on the top of stack. This can be used when additional punctuation or a conjunction is needed before the last item in a list.

08 OK -- OK
The current rule is terminated in success. Execution resumes after the calling operation.

09 Done -- Done
The translation is terminated successfully and all temporary windows closed.

0A nnnn Jump -- Jump +/-n bytes
The immediate operand is added to the current position to jump somewhere else in the same code resource.

0B nnnn BrF -- Branch if False
The top of stack is popped; if zero or null, the immediate operand is added to the current position to jump somewhere else in the same code resource. Otherwise the next operation in sequence is executed.

0C nnnn NuVar -- New Variable
The given name is added to the value on top of the stack, making it a named variable.

0D nnnn Sto -- Store into Named Variable
The stack is popped and the named variable is replaced by that value.

0E nnnn Ld -- Load from Named Variable
The value of the name variable is pushed onto the stack.

0F nnnn int -- Integer Constant
The immediate value is pushed onto the stack as an integer.

10 Null -- Null
A null is pushed onto the stack.

11 nnnn str -- String Literal
The immediate value indexes one of the string literals table in the current code resource, which is pushed onto the stack.

13 nn Tree -- Tree Part
The immediate byte selects one of eight (integer or Tree, or possibly comment string) parts of the Tree node on the stack top, and replaces it there.

14 Pack -- Pack
Two integers are popped off the stack and assembled into a single integer, which is pushed back on. The former stack top is the low half.

15 Swap -- Swap
The top two items on the stack are exchanged.

16 Pop -- Pop
The top of the stack is popped and discarded.

17 Dupe -- Duplicate
A copy of the top of the stack is pushed on top of it.

1A Rot3 -- Rotate Top 3 Items
The top item on the stack is removed and inserted under the third. If the stack starts with [A,B,C,D...] (A on top), after Rot3 the stack will be [B,C,A,D...].
 

Output

The next five operators generate output text:

1B Pgph -- Paragraph
This starts a new paragraph in the output stream.

1C Capz -- Capitalize
This capitalizes the next word in the output stream, if that makes any sense.

1D NoWds -- No Word Space
Successive Emit operations usually generate separate words in the output text. This operator eliminates the word space between the two Emits surrounding it, so they come out as a single word.

1E Emit -- Emit Text
The top item on the stack is emitted as a number or whatever text is there. You do not need to insert spaces in the emitted text, they will be automatically inserted between Emits unless you use the NoWds operation to prevent it.

1F Gloss -- Gloss
In the structured output window there are two lines for emitted text. The top line is the normal Emit text, and below it is whatever is output using the Gloss operator. In the plain text window, the gloss and structure is omitted unless you specify that the gloss is to be included inline.
 

ALU

The remaining operators mostly do arithmetic and other kinds of actions on stack data. Apart from type coercions, most of the operations directly match their corresponding Turk/2 operators. Most of these also match the on-line operator descriptions in the BibleTrans Conditional Value Rule documentation.

20 + -- Plus
The top two numbers are popped off the stack, added, and the sum pushed back on. If either value is text that looks like a number, it is converted to a number.

21 - -- Minus
The top two numbers are popped off the stack, the top is subtracted from the second, and the difference pushed back on. If either value is text that looks like a number, it is converted to a number.

22 * -- Multiply
The top two numbers are popped off the stack, multiplied, and the product pushed back on. If either value is text that looks like a number, it is converted to a number.

23 / -- Divide
The top two numbers are popped off the stack, the top is divided into the second, and the quotient pushed back on. If either value is text that looks like a number, it is converted to a number.

24 % -- Modulo
The top two numbers are popped off the stack, the top is divided into the second, and the remainder pushed back on. If either value is text that looks like a number, it is converted to a number.

25 & -- AND
The top two numbers are popped off the stack, and logically (bitwise) ANDed, and the result pushed back on. If either value is text that looks like a number, it is converted to a number.

26 | -- OR
The top two numbers are popped off the stack, logically (bitwise) ORed, and the result pushed back on. If either value is text that looks like a number, it is converted to a number.

27 ^ -- XOR
The top two numbers are popped off the stack, logically (bitwise) exclusive-ORed, and the result pushed back on. If either value is text that looks like a number, it is converted to a number.

29 Cat -- Catenate
The top two text strings are popped off the stack, concatinated (top to the right), and the result pushed back on. If either value is not text, it is converted to text.

2A < -- Less
The top two values are popped off the stack and compared; if the top is greater than the next, 1 is pushed back on, otherwise null is pushed. If only one of the values is a number but the other is text that looks like a number, it is converted to a number; otherwise the number is converted to text before comparing. In a numerical compare, 123 is greater than 15 but as text "123" is less than "15".

2B >= -- Greater or Equal
The top two values are popped off the stack and compared; if the top is not greater than the next, 1 is pushed back on, otherwise null is pushed.

2C <= -- Less or Equal
The top two values are popped off the stack and compared; if the top is not less than the next, 1 is pushed back on, otherwise null is pushed.

2D > -- Greater
The top two values are popped off the stack and compared; if the top is less than the next, 1 is pushed back on, otherwise null is pushed.

2E = -- Equal
The top two values are popped off the stack and compared; if they are equal, 1 is pushed back on, otherwise null is pushed.

2F != -- Unequal (/=)
The top two values are popped off the stack and compared; if they are unequal, 1 is pushed back on, otherwise null is pushed. If only one of the values is a number but the other is text that looks like a number, it is converted to a number; otherwise the number is converted to text before comparing.

30 Len -- Length
The text string at the top of the stack is popped off and replaced with the number of charaters in it. Null is unchanged. If the top value is a number, it is replaces with the number of digits, increased +1 if negative.

31 Offs -- Offset
The top two text strings are popped off the stack, then if the top string contains the second, the offset (the number of characters to its left) to it is pushed; otherwise -1 is pushed.

32 Subst -- Substring
The text string at the top of the stack is popped off and a substring extracted from it; the length is the number second on the stack, and the offset is the number below that. All three values are replaced by the resulting substring.

33 Replc -- Replace
Four values are popped off the stack, the top three as in Subst, and then another text string to replace the substring identified by the top three. The new composite is pushed in their place.

34 ItmNo -- Item Number
The top two text strings are popped off the stack, then if the top string is a comma-delimited list of items, and the second is one of those items, the item number is pushed; otherwise zero is pushed. Item 1 is all the text from the front to the first comma, item 2 the text between the first and second commas, etc.

35 DelItm -- Delete Item
The text string at the top of the stack and the number below it are both popped off; if the top is a comma-delimited list of items, and the second is the item number of one of them, that item is deleted and the remaining string pushed back onto the stack. Item 1 is all the text from the front to the first comma, item 2 the text between the first and second commas, etc.

36 Item -- Item Of
The text string at the top of the stack and the number below it are both popped off; if the top is a comma-delimited list of items, and the second is the item number of one of them, that item is extracted and pushed back onto the stack. Item 1 is all the text from the front to the first comma, item 2 the text between the first and second commas, etc.

37 CntItm -- Count Items
The text string at the top of the stack is popped off and replaced with the number commas in it +1, except a null is replaced by zero.

38 SubTr -- SubTree (kid)
The tree node at the top of the stack is popped and replaced by its immediate subtree, or null if it's not a tree or has no subtree.

39 NxtNo -- SiblingTree (bro)
The tree node at the top of the stack is popped and replaced by the next sibling Tree in its list, if any, or else null if it's not.

3A PutItm -- Put Into Item
Three values are popped off the stack, the top two as in Item, and then another text string to replace the item identified by the top two. The new composite is pushed in their place.

3C LNinTr -- L&N in Tree (LNin)
The tree node at the top of the stack, and the number under it are both popped and replaced by true (1) if the number  represents a L&N concept in the tree, not more than 9 subtrees deep, and false (null) if no such concept number can be found.

3D Bref -- Bible Reference (ref)
The tree node at the top of the stack is popped and replaced by the Bible reference as a text string of three numbers separated by commas, book (1-66), chapter, verse, or else null if there is no verse attached to this node.

3E NouNo -- Noun Number (noun#)
The tree node at the top of the stack is popped and replaced by the noun number from the ThingList, if it has one, or else null (zero).

3F UpNo -- Parent (dad)
The tree node at the top of the stack is popped and replaced by its immediate parent tree, or null if it's not a tree or is the book root.

41 LookTab -- Table Lookup
The top number is popped off the stack and taken as the index number of a lookup table; That table's access values are fetched and the corresponding table value is pushed.

42 DWIM -- Do What I Mean
The value at the top of the stack is popped; if it is a tree node, that tree's lexical rule is called (the same as CallLN); otherwise the value is (converted to text if necessary, then) emitted, same as Emit.

43 GetLN -- L&N from Tree (L&N)
The tree node at the top of the stack is popped and replaced by Louw&Nida concept number of that node packed into a single number, D*1000+C.

44 NxTrLs -- Next Tree from List (head, tail)
The value at the top of the stack, which should be a text string list of tree nodes formed by the TrLsApd operator, is popped off and replaced with the first tree node in the list under the remainder of the list with that node removed. Discarding the top leaves the node, implementing head; keeping it but discarding the tree node implements tail.

45 TrLsApd -- Tree List Append
The tree node at the top of the stack and the text string below it are both popped off and replaced with the string extended by adding the text representation of the tree node to its end with a '+' separator.

46 PN? -- Pronoun?
The value at the top of the stack, which should be a noun number from the ThingList, or else a Thing tree node or its noun, is popped off and replaced with the the pronoun number if there is a pronoun with this noun number, or else null if no pronoun refers to this Thing.

47 PWS -- PreWalk Setup
The tree node at the top of the stack is popped and replaced by the beginning of its modifier list (just past the head noun or verb) if it's a Thing or Proposition; it reaches through a semantic role marker to the underlying Thing, or if it is partway through a modifier list, just keeps that node; otherwise null is pushed instead. This prepares for the "(prewalk)" SetVar operator.

48 CntNds -- Count Nodes
The tree node at the top of the stack is popped and replaced by the number of tree nodes connected to it as siblings (including itself: a result of 1 means the node has no forward siblings). Any nodes to which this is a sibling are not counted.

49 Nn#inTr -- Noun Number in Tree
The tree node at the top of the stack and the number below it are both popped off and replaced with true (1) if that number is a noun somewhere in the first 9 levels of subtree within that node, or false (null) if not.

4A JumpLN -- Jump to L&N Lexical Rule
This is essentially the same as CallLN, except that the tree node is not made current. It is only used to process built-in Lexical Rule 0.311 ImplicitInfo if it has something to do.

4B xTab -- Table Access
This is only used to fetch an entry from the punctuation table, but works more or less like LookTab.

4C i7FF -- Infinity
This pushes the most positive number possible, 2147483647.

4D nnnn xSto -- Indexed Store
4E nnnn xLd -- Indexed Load
This pops off a pronoun index number, then loads or stores to the pronoun so indexed, but otherwise like Sto or Ld. The immediate value nnnn should be the variable reference of the first pronoun (all pronouns are sequential), and a zero on the stack top would access it.

4F nnnn xRng -- Index In Range
If the top of stack is not a number between zero and nnn, it is replaced with null; otherwise it is duplicated. The next operation would be a BrF to test either the copy or null, followed by code to pop and process the original value, which the BrF jumps over if not in range.

50 RcpTh -- Reciprocal Thing
The tree node at the top of the stack is popped. It should be a Proposition properly marked by a 0.210 Reciprocal modifier. This operation searches for a semantic role whose Thing noun number matches that of the first semantic role (typically 0.91 Subject), and stores that Thing node into predefined Tree variable "ReciprocalNoun". This operator is used in the default built-in lexical rule for 0.210 Reciprocal.

51 Shft -- Shift
The top two numbers are popped off the stack, and replaced by the top value shifted left the number of bit positions indicated in the second. If that number is negative, the shift is to the right.

52 Svnt -- Selected Variant
When translation begins, a list of selected variants is made from resource Adat#3008. If the translation has not chosen to display all variants, then whenever a 0.310 Variant Interpretation node is encountered, the selection is chosen from the list (if it can be found), or else the first subtree is chosen, and pushed onto the stack.

53 LastN -- Last N Characters
Like Subst, the text string at the top of the stack is popped off and a substring extracted from it; the length is the number second on the stack, which is the length from the end of the string to preserve. Both values are replaced by the resulting substring.

54 Sto0 -- Store Replacing Zero
This is essentially the same as Sto, except that it will not replace a non-zero value. It is used for initializing global variables used in rules.

55 CmpTree -- Compare Tree (CmpTr)
The tree node at the top of the stack (which should be one of the propositions in a compare relation) is popped and replaced by the one role or adposition subtree from that proposition whose Thing differs from the other proposition in the relation (in predefined variable "Node#"), or null if there are more than one difference. A different adposition over the same or different noun number counts as a single difference.

56 Nshape -- Node Shape
The tree node at the top of the stack is popped and replaced by the node shape bits from the Tree. There will be irrelevant bits in the upper positions of the number (see Tree Nodes definition), which can be removed by the AND operator.

58 RelProTr -- Relative Propositions List (RelPrs)
The tree node at the top of the stack is popped and replaced by a list of the relative clause subtrees under that node.

59 Trecur -- Tree Recursion Check
A badly written grammar might attempt to call the same rule from within itself recursively, with no way out of the loop. This operation marks each tree node on entry to its lexical rule, then unmarks it on the way out. If it comes to a node already marked, the translation is aborted.

5A UpLvl -- Uplevel Variable Reference
This  is used for accessing a variable deeper in the stack than its first reference. It's being reworked...

2012 October 1