This document describes the virtual machine (VM)
language that BibleTrans compiles rules into for execution during translation.
Both the compiler and the execution engine are in the ExecEng program that
is run as a separate thread within BibleTrans.
The VM comprises three components: a store
arranged as a stack with deep name-search ability,
an execution unit that fetches out byte-codes from
"Code" resources in the selected language file and executes them,
plus some minimal I/O ability, input from Tree resource data in a selected
Tree data file, and output to structured text resources
again in the language file. The Tree data is described elsewhere, mostly
in the online documentation for building semantic trees.
The tags support three fundamental data types: integer, tree, and text, with a null tag serving as integer 0 or a null tree node or the empty text string. Short strings of five characters or less are packed into the stack item directly; larger strings are indexed and reference-counted in a string variable in memory (which disappears when the translation finishes). individual strings are also removed from the list when the last reference to them disappears. Booleans are like C, zero/null being false, anything else (nominally 1) true.
There are also four special tags for marking subroutine (rule) frames.
All variables are stored on the stack -- possibly multiple instances, with global variables at the bottom, then other instnaces allocated when that category is opened. When the compiled code calls for a variable by name, the stack is searched from top to bottom for the first item with that name, and returns its value. Stores into a named variable only store it into the variable if it is in the current subroutine frame; otherwise (and explicitly new variables) the value with its name and type is pushed on top of the stack. Normally a rule begins by pushing default values for all its local variables. These are automatically removed when the subroutine exits. The SetVar rules allow the first named variable deeper in the stack to be altered.
The stack can be viewed in the Debugger window, and also in the DebugLog
log file when logging it is enabled.
The following are the predefined variables:
Pass # -- This variable is blank when prescanning a subtree for analysis, and then either 1 or -4 when generating text. The -4 is generally the second pass through the previously prescanned subtree, while 1 is the default single-pass value.
Pron # -- This variable holds the value returned by the pronoun selector, until it is time to set the selected pronoun to the current noun number.
Cont # -- Two variables carry context information into a syntax line rule. This variable is all zero except for a bit that tells the syntax line which of its items is a tree list variable loaded with a content subtree.
Line # -- This variable is the second of two parameters to a syntax line rule. It holds a number representing the selected line, when there are more than one to choose from. This variable also holds a copy of the tree list variable sent to built-in rule "Do Tree List".
if res -- This variable holds the value returned from a Conditional Value rule, until it is copied into a variable or used in constructing another conditional value.
is Prop -- This variable is used to separate Thing-modifier
adpositions from Proposition-modifier adpositions, which have separate
shape connection rules, despite that they are the same shape in the tree.
LexRuleForm -- This variable is preloaded with the lexical rule form number (between 1 and 31) whenever a lexical rule is called, but only if LexRuleForm was previously blank. If your grammar calls for different lexical rule forms for the same class of tree node, and the different forms correspond to different syntactic structures in the generated text, you can use the rule form number stored in this variable to select your syntax line or in a conditional value to effect the syntax variations. If you are re-using this variable, you may need to clone it or explicitly clear it to blank for a new value to be set.
ListCount, ListPosition -- When a Variable Connection rule is linked to a variable used in a Syntax Line, and that variable is also connected to a "content" or "modifiers" slot in a Tree Connection (node shape) rule for that syntax line, then the variable connection rule gets called individually for each node in the subtree list. Two variables track the position in that list of nodes. ListCount holds the total number of nodes in the list, and ListPosition starts at 1 for the first subtree, and increments for each subsequent node until the last, when ListPosition = ListCount. You can use these variables to generate different syntax for the first or last items in the list (or any in between, if you so wish).
PronSuppress -- When you check one or more of the checkboxes at the bottom of the Setting Up Pronouns page, the bits of this variable will reflect the checked pronouns. You can change this variable during translation to alter which pronouns activate from time to time.
Something -- If you define this variable in your grammar, then BibleTrans will load it with the concept number (either 3 or some value 91-96) of anything that could be the parent node over 0.220 Something. Inside the lexical rule for 0.220 you can test this variable to see whether it is a semantic role (meaning that the role is completely unspecified), or a 0.3 Thing node, which means that the noun here is not particularly known, but does have modifiers, for example if it is the placeholder for a content question (as would be translated "what?" in something like "What is under the bed?")
CurrentVerse# -- If you enable the automatic Generate Verse Numbers checkbox in Verse Numbers and Punctuation, then this variable holds the current verse number, after it has been generated into the output text. There are places in the BibleTrans tree where the verse reference is encoded multiple times, or you could in the course of generation repeat a subtree with a verse reference; this prevents the verse from showing up several times in the output. Alternatively, you could set a blank here to force it to repeat.
NewParagraph, NewSentence -- These two variables hold 1 (true) when the corresponding 0.312 Paragraph or 0.313 Sentence nodes have been seen in the current subtree proposition, and are cleared to blank (false) at the sentence boundary during generation, if the automatic Capitalize First Word checkbox in Verse Numbers and Punctuation has been enabled, or if there are any non-blank entries in the sentence punctuation table. You can set these variables explicitly to force a new sentence or new paragraph where there is none in the semantic tree.
IllocutionSeen -- This variable captures the illocutionary force of the current proposition, as encoded in the marker nodes 0.122 Imperative - 0.124 Yes-No Interrogative, to select an appropriate entry from the sentence punctuation table in Verse Numbers and Punctuation. A value of 1 represents 0.122 Imperative, and 2 represents some form of question; zero is neither. This value is multiplied times 4 after it is evaluated, because the next sentence analysis occurs before the prior sentence final punction is generated.
ReciprocalNoun -- If you define this variable in your
grammar, then BibleTrans will load it with a reference to whatever 0.3
Thing has a noun number matching that of the 0.91 Agent (or 0.96 Participant).
This happens during the proposition prescan, when the 0.210 Reciprocal
marker is encountered. Otherwise the variable is untouched (presumably
blank as initialized).
00 Nop -- No Operation
Do nothing. This operation should never happen in properly compiled
code.
01 nnnn Lino -- Line Number
This is used to identify a source line number, so that the Debugger
can stop here and mark the correct "line" of the rule source code.
02 nnnn OpFr -- Open Frame
A new frame is opened for the indexed rule, before parameters for that
rule invocation are pushed onto the stack above it, then a CallFr
jumps to the rule code. When the rule exits, everything down to the frame
marker is popped off.
03 CallFr -- Call Framed Rule
This assumes a prior OpFr, and jumps to the designated
rule.
04 Stop -- Pause
This allows programmed (static) breakpoints. Most breakpoints are dynamic,
linked to line numbers or variables or tree nodes or particular output
sites.
05 CallLN -- Call Lexical Rule
This assumes that a Tree node is on top of the stack, and opens up
a frame for the lexical rule corresponding to that tree node, then jumps
to it. If the top of stack is null, then no rule is called. In either case,
the tree node or null is removed from the stack top when it finishes.
06 AnoLst -- Iterate List of Tree Nodes
This is like CallLN, except that the called lexical
rule returns to this same opcode with its sibling tree node on the stack,
so that a list of nodes (typically a noun or proposition modifier list)
has each node's lexical rule called in succession.
07 EnoLst -- Iterate All but Last of a List of Tree
Nodes
This is like AnoLst except that it stops and leaves
the last node on the list (or null, if the list is empty) on the top of
stack. This can be used when additional punctuation or a conjunction is
needed before the last item in a list.
08 OK -- OK
The current rule is terminated in success. Execution resumes after
the calling operation.
09 Done -- Done
The translation is terminated successfully and all temporary windows
closed.
0A nnnn Jump -- Jump +/-n bytes
The immediate operand is added to the current position to jump somewhere
else in the same code resource.
0B nnnn BrF -- Branch if False
The top of stack is popped; if zero or null, the immediate operand
is added to the current position to jump somewhere else in the same code
resource. Otherwise the next operation in sequence is executed.
0C nnnn NuVar -- New Variable
The given name is added to the value on top of the stack, making it
a named variable.
0D nnnn Sto -- Store into Named Variable
The stack is popped and the named variable is replaced by that value.
0E nnnn Ld -- Load from Named Variable
The value of the name variable is pushed onto the stack.
0F nnnn int -- Integer Constant
The immediate value is pushed onto the stack as an integer.
10 Null -- Null
A null is pushed onto the stack.
11 nnnn str -- String Literal
The immediate value indexes one of the string literals table in the
current code resource, which is pushed onto the stack.
13 nn Tree -- Tree Part
The immediate byte selects one of eight (integer or Tree, or possibly
comment string) parts of the Tree node on the stack top, and replaces it
there.
14 Pack -- Pack
Two integers are popped off the stack and assembled into a single integer,
which is pushed back on. The former stack top is the low half.
15 Swap -- Swap
The top two items on the stack are exchanged.
16 Pop -- Pop
The top of the stack is popped and discarded.
17 Dupe -- Duplicate
A copy of the top of the stack is pushed on top of it.
1A Rot3 -- Rotate Top 3 Items
The top item on the stack is removed and inserted under the third.
If the stack starts with [A,B,C,D...] (A on top), after Rot3
the stack will be [B,C,A,D...].
1B Pgph -- Paragraph
This starts a new paragraph in the output stream.
1C Capz -- Capitalize
This capitalizes the next word in the output stream, if that makes
any sense.
1D NoWds -- No Word Space
Successive Emit operations usually generate separate
words in the output text. This operator eliminates the word space between
the two Emits surrounding it, so they come out as a single
word.
1E Emit -- Emit Text
The top item on the stack is emitted as a number or whatever text is
there. You do not need to insert spaces in the emitted text, they will
be automatically inserted between Emits unless you use the NoWds
operation to prevent it.
1F Gloss -- Gloss
In the structured output window there are two lines for emitted text.
The top line is the normal Emit text, and below it is whatever
is output using the Gloss operator. In the plain text window,
the gloss and structure is omitted unless you specify that the gloss is
to be included inline.
20 + -- Plus
The top two numbers are popped off the stack, added, and the sum pushed
back on. If either value is text that looks like a number, it is converted
to a number.
21 - -- Minus
The top two numbers are popped off the stack, the top is subtracted
from the second, and the difference pushed back on. If either value is
text that looks like a number, it is converted to a number.
22 * -- Multiply
The top two numbers are popped off the stack, multiplied, and the product
pushed back on. If either value is text that looks like a number, it is
converted to a number.
23 / -- Divide
The top two numbers are popped off the stack, the top is divided into
the second, and the quotient pushed back on. If either value is text that
looks like a number, it is converted to a number.
24 % -- Modulo
The top two numbers are popped off the stack, the top is divided into
the second, and the remainder pushed back on. If either value is text that
looks like a number, it is converted to a number.
25 & -- AND
The top two numbers are popped off the stack, and logically (bitwise)
ANDed,
and the result pushed back on. If either value is text that looks like
a number, it is converted to a number.
26 | -- OR
The top two numbers are popped off the stack, logically (bitwise) ORed,
and the result pushed back on. If either value is text that looks like
a number, it is converted to a number.
27 ^ -- XOR
The top two numbers are popped off the stack, logically (bitwise) exclusive-ORed,
and the result pushed back on. If either value is text that looks like
a number, it is converted to a number.
29 Cat -- Catenate
The top two text strings are popped off the stack, concatinated (top
to the right), and the result pushed back on. If either value is not text,
it is converted to text.
2A < -- Less
The top two values are popped off the stack and compared; if the top
is greater than the next, 1 is pushed back on, otherwise null is pushed.
If only one of the values is a number but the other is text that looks
like a number, it is converted to a number; otherwise the number is converted
to text before comparing. In a numerical compare, 123 is greater than 15
but as text "123" is less than "15".
2B >= -- Greater or Equal
The top two values are popped off the stack and compared; if the top
is not greater than the next, 1 is pushed back on, otherwise null is pushed.
2C <= -- Less or Equal
The top two values are popped off the stack and compared; if the top
is not less than the next, 1 is pushed back on, otherwise null is pushed.
2D > -- Greater
The top two values are popped off the stack and compared; if the top
is less than the next, 1 is pushed back on, otherwise null is pushed.
2E = -- Equal
The top two values are popped off the stack and compared; if they are
equal, 1 is pushed back on, otherwise null is pushed.
2F != -- Unequal (/=)
The top two values are popped off the stack and compared; if they are
unequal, 1 is pushed back on, otherwise null is pushed. If only one of
the values is a number but the other is text that looks like a number,
it is converted to a number; otherwise the number is converted to text
before comparing.
30 Len -- Length
The text string at the top of the stack is popped off and replaced
with the number of charaters in it. Null is unchanged. If the top value
is a number, it is replaces with the number of digits, increased +1 if
negative.
31 Offs -- Offset
The top two text strings are popped off the stack, then if the top
string contains the second, the offset (the number of characters to its
left) to it is pushed; otherwise -1 is pushed.
32 Subst -- Substring
The text string at the top of the stack is popped off and a substring
extracted from it; the length is the number second on the stack, and the
offset is the number below that. All three values are replaced by the resulting
substring.
33 Replc -- Replace
Four values are popped off the stack, the top three as in Subst,
and then another text string to replace the substring identified by the
top three. The new composite is pushed in their place.
34 ItmNo -- Item Number
The top two text strings are popped off the stack, then if the top
string is a comma-delimited list of items, and the second is one of those
items, the item number is pushed; otherwise zero is pushed. Item 1 is all
the text from the front to the first comma, item 2 the text between the
first and second commas, etc.
35 DelItm -- Delete Item
The text string at the top of the stack and the number below it are
both popped off; if the top is a comma-delimited list of items, and the
second is the item number of one of them, that item is deleted and the
remaining string pushed back onto the stack. Item 1 is all the text from
the front to the first comma, item 2 the text between the first and second
commas, etc.
36 Item -- Item Of
The text string at the top of the stack and the number below it are
both popped off; if the top is a comma-delimited list of items, and the
second is the item number of one of them, that item is extracted and pushed
back onto the stack. Item 1 is all the text from the front to the first
comma, item 2 the text between the first and second commas, etc.
37 CntItm -- Count Items
The text string at the top of the stack is popped off and replaced
with the number commas in it +1, except a null is replaced by zero.
38 SubTr -- SubTree (kid)
The tree node at the top of the stack is popped and replaced by its
immediate subtree, or null if it's not a tree or has no subtree.
39 NxtNo -- SiblingTree (bro)
The tree node at the top of the stack is popped and replaced by the
next sibling Tree in its list, if any, or else null if it's not.
3A PutItm -- Put Into Item
Three values are popped off the stack, the top two as in Item,
and then another text string to replace the item identified by the top
two. The new composite is pushed in their place.
3C LNinTr -- L&N in Tree (LNin)
The tree node at the top of the stack, and the number under it are
both popped and replaced by true (1) if the number represents a L&N
concept in the tree, not more than 9 subtrees deep, and false (null) if
no such concept number can be found.
3D Bref -- Bible Reference (ref)
The tree node at the top of the stack is popped and replaced by the
Bible reference as a text string of three numbers separated by commas,
book (1-66), chapter, verse, or else null if there is no verse attached
to this node.
3E NouNo -- Noun Number (noun#)
The tree node at the top of the stack is popped and replaced by the
noun number from the ThingList, if it has one, or else null (zero).
3F UpNo -- Parent (dad)
The tree node at the top of the stack is popped and replaced by its
immediate parent tree, or null if it's not a tree or is the book root.
41 LookTab -- Table Lookup
The top number is popped off the stack and taken as the index number
of a lookup table; That table's access values are fetched and the corresponding
table value is pushed.
42 DWIM -- Do What I Mean
The value at the top of the stack is popped; if it is a tree node,
that tree's lexical rule is called (the same as CallLN);
otherwise the value is (converted to text if necessary, then) emitted,
same as Emit.
43 GetLN -- L&N from Tree (L&N)
The tree node at the top of the stack is popped and replaced by Louw&Nida
concept number of that node packed into a single number, D*1000+C.
44 NxTrLs -- Next Tree from List (head,
tail)
The value at the top of the stack, which should be a text string list
of tree nodes formed by the TrLsApd operator, is popped
off and replaced with the first tree node in the list under the remainder
of the list with that node removed. Discarding the top leaves the node,
implementing head; keeping it but discarding the tree node
implements tail.
45 TrLsApd -- Tree List Append
The tree node at the top of the stack and the text string below it
are both popped off and replaced with the string extended by adding the
text representation of the tree node to its end with a '+' separator.
46 PN? -- Pronoun?
The value at the top of the stack, which should be a noun number from
the ThingList, or else a Thing tree node or its noun, is popped off and
replaced with the the pronoun number if there is a pronoun with this noun
number, or else null if no pronoun refers to this Thing.
47 PWS -- PreWalk Setup
The tree node at the top of the stack is popped and replaced by the
beginning of its modifier list (just past the head noun or verb) if it's
a Thing or Proposition; it reaches through a semantic role marker to the
underlying Thing, or if it is partway through a modifier list, just keeps
that node; otherwise null is pushed instead. This prepares for the "(prewalk)"
SetVar operator.
48 CntNds -- Count Nodes
The tree node at the top of the stack is popped and replaced by the
number of tree nodes connected to it as siblings (including itself: a result
of 1 means the node has no forward siblings). Any nodes to which this is
a sibling are not counted.
49 Nn#inTr -- Noun Number in Tree
The tree node at the top of the stack and the number below it are both
popped off and replaced with true (1) if that number is a noun somewhere
in the first 9 levels of subtree within that node, or false (null) if not.
4A JumpLN -- Jump to L&N Lexical Rule
This is essentially the same as CallLN, except that
the tree node is not made current. It is only used to process built-in
Lexical Rule 0.311 ImplicitInfo if it has something to do.
4B xTab -- Table Access
This is only used to fetch an entry from the punctuation table, but
works more or less like LookTab.
4C i7FF -- Infinity
This pushes the most positive number possible, 2147483647.
4D nnnn xSto -- Indexed Store
4E nnnn xLd -- Indexed Load
This pops off a pronoun index number, then loads or stores to the pronoun
so indexed, but otherwise like Sto or Ld.
The immediate value nnnn should be the variable reference
of the first pronoun (all pronouns are sequential), and a zero on the stack
top would access it.
4F nnnn xRng -- Index In Range
If the top of stack is not a number between zero and nnn, it is replaced
with null; otherwise it is duplicated. The next operation would be a BrF
to test either the copy or null, followed by code to pop and process the
original value, which the BrF jumps over if not in range.
50 RcpTh -- Reciprocal Thing
The tree node at the top of the stack is popped. It should be a Proposition
properly marked by a 0.210 Reciprocal modifier. This operation searches
for a semantic role whose Thing noun number matches that of the first semantic
role (typically 0.91 Subject), and stores that Thing node into predefined
Tree variable "ReciprocalNoun". This operator is used in the default built-in
lexical rule for 0.210 Reciprocal.
51 Shft -- Shift
The top two numbers are popped off the stack, and replaced by the top
value shifted left the number of bit positions indicated in the second.
If that number is negative, the shift is to the right.
52 Svnt -- Selected Variant
When translation begins, a list of selected variants is made from resource
Adat#3008. If the translation has not chosen to display all variants, then
whenever a 0.310 Variant Interpretation node is encountered, the selection
is chosen from the list (if it can be found), or else the first subtree
is chosen, and pushed onto the stack.
53 LastN -- Last N Characters
Like Subst, the text string at the top of the stack
is popped off and a substring extracted from it; the length is the number
second on the stack, which is the length from the end of the string to
preserve. Both values are replaced by the resulting substring.
54 Sto0 -- Store Replacing Zero
This is essentially the same as Sto, except that it
will not replace a non-zero value. It is used for initializing global variables
used in rules.
55 CmpTree -- Compare Tree (CmpTr)
The tree node at the top of the stack (which should be one of the propositions
in a compare relation) is popped and replaced by the one role or adposition
subtree from that proposition whose Thing differs from the other proposition
in the relation (in predefined variable "Node#"), or null if there are
more than one difference. A different adposition over the same or different
noun number counts as a single difference.
56 Nshape -- Node Shape
The tree node at the top of the stack is popped and replaced by the
node shape bits from the Tree. There will be irrelevant bits in the upper
positions of the number (see Tree Nodes
definition), which can be removed by the AND operator.
58 RelProTr -- Relative Propositions List (RelPrs)
The tree node at the top of the stack is popped and replaced by a list
of the relative clause subtrees under that node.
59 Trecur -- Tree Recursion Check
A badly written grammar might attempt to call the same rule from within
itself recursively, with no way out of the loop. This operation marks each
tree node on entry to its lexical rule, then unmarks it on the way out.
If it comes to a node already marked, the translation is aborted.
5A UpLvl -- Uplevel Variable Reference
This is used for accessing a variable deeper in the stack than
its first reference. It's being reworked...
2012 October 1