The SCHEMESTATION cross-compiler and assembler are used to produce binary byte code objects to be run in the SCHEMESTATION operating system simulator.
This document specifies the implementation aspects of the compiler and the assembler.
The compiler is implemented in Scheme and the assembler in C. The assembler is written in C because the library implementing SCHEMESTATION heap handling is written in C. Were the assembler written in Scheme, heap creation should be re-implemented in Scheme because byte code objects produced by the assembler are essentially linearized heaps. On the other hand, it is preferable to keep the amount of non-Scheme code in minimum in order to make later porting of the compiler to the SCHEMESTATION platform as easy as possible.
This document specifies the compiler first and the assembler afterwards.
The overall structure of the compiler is presented in Fig 1.
The lexical structure of legal source code accepted by the compiler is a sequence of tokens, where the structure of a token is defined by the following grammar (borrowed modified from [1]).
{token} --> {identifier} | {boolean} | {number} | {character} | {string} | {bit-string} | ( | ) | #( | ' | ` | , | ,@ | . {delimiter} --> {whitespace} | ( | ) | " | ; {whitespace} --> {space or newline} {comment} --> ; {all subsequent characters up to a line break} {atmosphere} --> {whitespace} | {comment} {intertoken space} --> {atmosphere}*{identifier} --> {initial} {subsequent} * | {peculiar identifier} {initial} --> {letter} | {special initial} {letter} --> a | b | c | ... | z | A | B | C | ... | Z {special initial} --> ! | $ | % | & | * | / | : < | = | > | ? | ~ | _ | ^ {subsequent} --> {initial} | {digit} | {special subsequent} {digit} --> 0 | 1 | 2 | ... | 9 {special subsequent} --> . | + | - {peculiar identifier} --> + | - | ...
{boolean} --> #t | #f {character} --> #\ {any character} | #\ {character name} {character name} --> space | newline
{string} --> " {string element}* " {string-element} --> {any character except " or \} | {backslashed} {backslashed} --> \ {any character}
{bit-string} --> {hex-bit-string} | | {binary-bit-string} {hex-bit-string} --> #H {hex-digit}* {binary-bit-string} --> #B {binary-digit}* {hex-digit} --> 0 | 1 | 2 | ... | 9 | a | b | ... | f | A | B | ... | F {binary-digit} --> 0 | 1
{number} --> {integer} | {real}
{integer} --> {sign} {digit}+ {sign} --> + | - | {real} --> {sign} {digit}+ {decimal-part} {exponent} {decimal-part} --> . {digit}* {exponent} --> e {sign} {digit}+
The external representations of Scheme objects, datums, are formed according to the following grammar:
{datum} --> {simple datum} | {compound datum} {simple datum} --> {boolean} | {number} | {character} | {string} | {bit-string} | {symbol} {symbol} --> {identifier} {compound datum} --> {list} | {vector} {list} --> ( {datum}* ) | ( {datum}+ . {datum} ) | {abbreviation} {abbreviation} --> {abbrev prefix} {datum} {abbrev prefix} --> ' | ` | , | ,@ {vector} --> #( {datum}* )
The expression syntax for the compiler is specified in the grammar below. As opposed to that in [1], we do not give syntax for most of the derived expressions due to the fact that they are implemented using the macro system and thus their grammar is not a part of the compiler per se. Concerning quasiquotations, see the note in [1] about the grammar formalism.
{expression} --> {variable} | {literal} | {procedure call} | {lambda expression} | {conditional} | {assignment} | {sequencing} | {compiler directive} | {syntax}{literal} --> {quotation} | {self-evaluating} {self-evaluating} --> {boolean} | {number} | {character} | {string} | {bit-string} {quotation} --> '{datum} | (quote {datum}) {procedure call} --> ({operator} {operand}*) {operator} --> {expression} {operand} --> {expression}
{lambda expression} --> (lambda {formals} {body}) {formals} --> ({variable}*) | {variable} | | ({variable}+ . {variable}) {body} --> {sequence} {sequence} --> {command}* {expression} {command} --> {expression}
{conditional} --> (if {test} {consequent}) | (if {test} {consequent} {alternate}) {test} --> {expression} {consequent} --> {expression} {alternate} --> {expression}
{assignment} --> (set! {variable} {expression})
{compiler directive} --> (include {string})
{sequencing} --> (begin {sequence})
{quasiquotation} --> {qq 1} {template 0} --> {expression} {qq D} --> `{template D} | (quasiquote {template D}) {template D} --> {simple datum} | {list template D} | {vector template D} | {unquotation D} {list template D} --> ({template or splice D}*) | ({template or splice D}+ . {template D}) | '{template D} | {qq D+1} {vector template D} --> #({template or splice D}*) {unquotation D} --> ,{template D-1} | (unquote {template D-1}) {template or splice D} --> {template D} | {splicing unquotation D} {splicing unquotation D} --> ,@{template D-1} | (unquote-splicing {template D-1})
{syntax} --> (let-syntax ({syntax-binding}*) {body}) | (letrec-syntax ({syntax-binding}*) {body}) {syntax-binding} --> ({macro-name} {literals} {syntax-pattern}+) {macro-name} --> {variable} {literals} --> ({variable}*) {syntax-pattern} --> ({pattern} {template}) {pattern} --> {datum} {template} --> {datum}
Finally, programs compiled by the compiler are defined by the following grammar:
{program} --> {command or definition}* {command or definition} --> {command} | {definition} {definition} --> (_define {variable} {expression}) | (begin {definition}*) | (define-syntax {macro-name} {literals} {syntax-pattern}+)
The semantics of the source language is that of the Scheme programming language with extensions. The following noteworthy extensions exist:
The following primitive procedures are supported by the compiler:
The macro facility provided by the compiler is reminiscent of that proposed in [1]. With every macro, a set of literals and a set of syntax patterns is associated. The set of literals is a set of keywords that are recognized as literal identifiers as instead of pattern variables in a pattern. Syntax patterns are used to define actual macro transformations. The syntax of macro definitions is seen from the grammars presented previously.
A syntax pattern consists of a pattern and a template. When macro is expanded, the macro invocation is matched against the patterns in the order they appear in the corresponding macro definition. When a first matching pattern is found, the corresponding template is expanded and the expansion replaces the original invocation during the macro expansion process.
A pattern is a Scheme expression, where variables denote pattern variables that can be matched by arbitrary sub-expressions in the invocation. Ellipsis (...) can be used to denote repetition and works as in the appendix of [1]. Pattern variables appearing in the template get their bindings from the matching. Ellipsis can naturally appear in the templates too.
The compiler outputs an assembler source file. The structure of an assembler source file is defined in the grammar below. Low-level lexical conventions are those of Scheme; semi-colons start comments and the structure of white space is not important.
{asm file} --> {preamble} @instructions {instructions} @constants {constant}{preamble} --> {register spec}* {register spec} --> {asm identifier} {integer}
{instructions} --> {instruction or label}* {instruction} --> ({operator} {operand}*) {asm identifier} --> {start} {subsequent}* {start} --> {lower-case letter or hyphen} {subsequent} --> {start or digit} {label} --> {asm identifier} : {operator} --> {asm identifier} {operand} --> {register} | {index} | {label} {register} --> {asm identifier} {index} --> {integer}
{constant} --> {vector spec} | {boolean spec} | {bit-string spec} | {integer spec} | {pair spec}
{vector spec} --> vector {length} {constant} {boolean spec} --> boolean {0 or 1} {bit-string spec} --> bit-string {length} {byte}* {length} --> {integer} {byte} --> {integer} {integer spec} --> integer {integer} {pair spec} --> pair {constant} {constant}
See the section on assembler for details.
The compiler can be executed from the UNIX command line, and the following options are recognized:
Non-option arguments on the command line are files to compile. If a file name ends with the suffix ``.ssscm'' or the suffix ``.scm'', the suffix is stripped of from the file name. A corresponding assembler file is always written and has the suffix ``.ssasm''. The byte code file created by the assembler has the suffix ``.ssbc''.
The [Scheme] implementation of the compiler is divided into the following parts:
COMPILER -- Main interface -- command line processing | | | - compiler invocation | | | - assembler invocation | - Scheme reader -- lexical parser | | | - datum building | - Macro expansion -- semantic tree building | | | - macro transformations | | | - lexical address resolving | | | - semantic optimizations | - Code generation -- linearization phase | | | - peephole optimizer | - Support | - General auxiliary procedures
Short descriptions of the components presented above follows.
The main interfaces for the modules and implementation highlights will be specified and discussed next.
Description: | Set the current build date to be DATE, the builder identifier to be BUILDER, and the building host to be HOST. This information is set for informational purposes only. If this function is not called, then all the variables (build date, builder identifier and building host) assume the default value "?UNDEFINED?". |
Description: | Set the current version identifier to be VERSION-STRING and the date of the current release to be RELEASE-DATE. This information is set for informational purposes only. If this function is not called, then both variables (current version and release date) assume the default value "?UNDEFINED?". |
Description: | Run the compiler. Read an argument list (a Scheme list of strings) from standard input, i.e. the current input port. After reading succesfully a list of strings, pass the list to run-non-interactive-compiler. |
Description: | Parse the command line arguments given in COMMAND-LINE-ARGS and execute compiler functionality according to them. |
Description: | Read a file with a given name and build a semantic tree from the source code it (hopefully!) represents. The return value is the semantic tree built or #f if an error was encountered but the procedure returned however. |
Description: | Compile the file BASENAME.ssscm and output the resulting assembly language to the file BASENAME.ssasm. If assembler invocation is enabled, finally rut he assembler using run-assembler. Return value is unspecified. Errors are returned using the normal comerr procedure.
This function is responsible for controlling the overall compilation process of a program. |
Description: | Linearize a semantic tree, with optimizations. This procedure assumes that certain compiler data structures contain data, corresponding to the tree, that are created in compile-transform-tree!. Thus this procedure should be called only for trees that have been created using compile-transform-tree! just previously. Returns an instructions sequence or #f if something went wrong. |
Description: | Do certain transformations for a semantic tree, in particular inline primitives and compute lexical addresses. The transformations are made destructively, i.e. the tree given as input is mutated (although to a better one!) The return value is the tree after mutations or #f is something went fatally wrong. It is possible that the root of the tree changes, so the return value must be used instead of TREE after he transformations have been made. |
Description: | Write the symbolic assembler file corresponding to the byte code instruction sequence SEQUENCE to the file named BASENAME.ssasm. Also run the assembler if it is requested (not inhibited) by the options set. |
Description: | Output a Scheme object OBJECT to a certain output port. The object is outputted in a format that is understood by the external SchemeStation byte code assembler. |
Description: | Initialize compiler data structures for a new compilation project. |
Description: | Invoke the assembler. BASENAME is a file name without suffix. The file BASENAME.ssasm must exist. The external assembler is invoked using the assembler executable path name and assembler options possibly specified in the compiler command line. The return value is the return status of the assembler executable. If verbosity level is at least 1, output the assembler command line just before executing. |
Description: | Read in the file FILE. Returns an annotated expressions. Implemented in the terms of read-from-port. |
Description: | Read, using the compiler's Scheme reader, all expressions readable from the port PORT. STREAM-NAME is a descriptive name of the stream and is used if errors are encountered. The result value is an annotated expression, constructed such that if E1 ... EN are the N expressions read, the value returned contains (begin E1 ... EN). |
Description: | Read in the include file named FILE, searching for it from the include path. Returns the expression found from the include file. If the include file cannot be found an error is signalled. |
Description: | Try to open the file FILENAME and return a reading port if succeeded. If the file cannot be opened, return #f instead of the port. |
Description: | Create the initial lexical environment that will be in effect when the compilation of a program starts. |
Description: | Signal an error during [command line] options processing. First do (apply printf args), i.e. interpret the first argument as a format string and the rest as objects to format, and output the result to the current output port (usually standard output). Then call the function output-options-help-and-exit, which is not defined in the options module but must be provided externally. |
Description: | Process the list of command line options OPTIONS-LIST, which may contain both switches and non-switch arguments (i.e. file names). For every argument that is not a switch nor an argument for a switch, NON-SWITCH-CALLBACK! is called as (NON-SWITCH-CALLBACK! COMMAND-LINE-ITEM) and the result of this procedure call is ignored. Recognized switches are processed. If an unrecognized switch is encountered or some other error happens during the processing, the SPROC(options-error) procedure is called. The set of recognized switches is exactly the set of registered switches. Switches are registered using procedures below. |
Description: | This is as register-option-callback, but registers a boolean switch that takes no arguments. CALLBACK! will be called with one argument that is either #t or #f. If a boolean switch has the name NAME, then the switch is recognized in four forms: -NAME, --NAME, -noNAME, --noNAME. The first two forms cause the callback to be called with the value #t and the later two with the value #f. The return value of the callback is always ignored. |
Description: | Register a switch with possibly multiple names. NAMES is a list of strings that are the names of the switch. NUM-ARGS is the number of arguments excepted by the switch, and CALLBACK! is a procedure that will be called to process the switch when it is found from the command line. CALLBACK! should be a procedure that accepts a call with NUM-ARGS arguments. When the callback is called any return value of the call is ignored.
A switch with name NAME is recognized in two forms in the command line: -NAME and --NAME. Unambiguous [prefix] abbreviations are also recognized. Arguments may not be melted to the switch but must be separate command line items (i.e. --NAMEx nor --NAME=x will work, --NAME x must be used). |
The Scheme reader used in the compiler reads in expressions but does not store them in the native way, but instead as annotated expressions. Annotated expressions are Scheme expressions that contain information about the source of the expressions. For example, the name of the file where the expression is originally read is stored as an annotation associated with the expression.
The lexical parser is generated using Danny Dube's SILex lexical scanners generator. See SILex documentation for interfaces and other details.
Description: | Return the annotation part of an annotated expression. |
Description: | Return the expression part of an annotated expression. |
Description: | Create an annotated expression with the expression part EXPR and the annotation part ANNOTATION. |
Description: | Make a reader procedure that reads annotated expressions from the port PORT. Give the stream the name NAME, which will be added as annotation to the expressions read. FAILURE-CONT is a one-argument procedure that will be called if a read error happens. FAILURE-CONT must not return. FAILURE-CONT gets as its argument, when called, a list of length four, with the following items: (1) NAME, (2) current line number in the stream, (3) current column number in the stream, (4) current number of bytes read from the stream (i.e. total offset).
The reader returned itself is a zero-argument procedure that either returns an annotated expression, returns the symbol 'eof, or calls FAILURE-CONT --- which must not return.") |
Description: | Return a newly allocated copy of the given string STR, parsing backslash sequences. \n is mapped to the newline character. For all other characters X, \X is mapped to X. |
Description: | Create a semantic tree denoting a procedure application. |
Description: | Build a semantic tree denoting an inlined sequence of byte code instructions. |
Description: | Build a semantic tree denoting an inlined sequence of byte code instructions and handled as a full closure. |
Description: | Create a semantic tree denoting an assignment. |
Description: | Create a semantic tree denoting an error during the compilation process. |
Description: | Signal, via comerror, an error of severity SEVERITY in ANNOTATED-EXPRESSION, with additional information provided by FORMAT-STRING and ARGS (see comerror), and then return a semantic tree denoting an error, containing ANNOTATED-EXPRESSION. |
Description: | Create a semantic tree denoting a (lambda) closure. |
Description: | Create a semantic tree denoting a literal value. |
Description: | Create a semantic tree denoting a primtive procedure (the procedure itself, not its invocation.) |
Description: | Create a semantic tree denoting a sequence. BODY-LIST is a list of annotated Scheme expressions. However, the list itself must not be annotated (use remove-list-annotations). |
Description: | Create a semantic tree denoting the application of a macro defined via define-syntax. |
Description: | Create a semantic tree denoting a conditional. |
Description: | Create a semantic tree denoting a reference to a variable that does not exist in the current lexical environment. (Later a binding for the variable is searched for from the global environment, but the global environment does not exist, naturally, yet when the semantic trees are created |
Description: | Create a semantic tree denoting a reference to a variable. |
Description: | Build a semantic tree corresponding to a given (annotated) expression. The semantic tree is built in the lexical environment LEX-ENV. The flag UNCOND? denotes whether or not the place of the current expression in the program is such that it will be evaluated in any case, not depending on conditionals or function applications. It is used to decide whether or not a global definition can take place.
SHADOW contains information about lexical bindings that come to effect via macro invocations. |
Description: | Build the semantic tree corresponding to the annotated expression EXPR, which denotes a whole top-level program. |
Description: | Parse an inlined assembler expression. |
Description: | Parse an inlined assembler closure expression. |
Description: | Parse a begin-expression. |
Description: | Parse a _define-expression (_define is the "rudimentary" define, provided directly by the compiler. The actual define form is implemented as a macro on the top of _define). |
Description: | Parse an if-expression. |
Description: | Parse an include compiler directive, reading in the included file recursively. |
Description: | Parse a lambda-expression. |
Description: | Parse a quote expression. |
Description: | Parse an assignment (set!) expression. |
Description: | Create a syntax frame where the core syntax constructs are bound to the procedures parsing them. |
Description: | Parse a define-syntax expression, defining new global syntax. |
Description: | Parse a let-syntax expression, defining a new local syntax and then expanding the body in the new syntactic environment. |
Description: | Parse a letrec-syntax expression. |
Semantic optimization is not specified yet.
Description: | Add to the semantic tree SEMANTIC-TREE static lexical addresses for variable references and assigned variables. The function mutates the tree, and returns the tree after succesful execution. Errors are reported in a standard way using comerr. |
Description: | Examine the semantic tree TREE, denoting a whole program, and return a byte code instructions sequence that is the linearization of the tree. |
Description: | Make peephole-style optimizations on the given instruction sequence INSTRUCTION-SEQUENCE. The instruction sequence is possibly mutated during the process. An optimized instruction sequence is returned. The behaviour of the peephole optimizer can be affected by global options set e.g. in the command line arguments. |
Description: | compiler-exit-on-error-function is called when a fatal error has been signalled. It can be redefined to produce whatever behaviour needed. The default is to signal error to the underlying Scheme. When a binary dump of the compiler is made, this function is redefined to exit immediately from the program. |
Description: | Report an internal compiler error. This causes a fatal compiler error to be reported. ARGS are outputted verbatim in the error message and need not to be very user friendly. After all, internal compiler bugs should not exist in the final product. |
Description: | Define a new global variable NAME in the global environment of the currently compiled program. |
Description: | Return the index of the defined global variable VAR-NAME. |
Description: | Require the primitive procedure SPECS to be implemented as a real closure, which is then bound to the global variable NAME. |
Description: | A predicate to test whether the name VAR-NAME is already defined in the global environment or not. |
Description: | Return the number of currently defined global variables. |
Description: | Output the contents of the global variables table in a form suitable to be inserted to an assembler file as an informative comment. |
Description: | Find the lexical binding for the variable SCHEME-EXPR in environment LEX-ENV. Return the binding found or a binding that denotes an unbound value, if no binding was found. |
Description: | Return the type field of the lexical binding BINDING. |
Description: | Return the value field of the lexical binding BINDING. |
Description: | Create a lexical binding object that denotes binding the variable NAME to an object of type TYPE and value VALUE. |
Description: | Output a graphic representation of the semantic tree TREE to the port PORT. |
Description: | Create a new tree node without subtrees, i.e. a leaf node. |
Description: | Create a new tree node with properties PROPERTIES and subtrees SUBTREES, which should be a list. |
Description: | Return the properties of the tree NODE. |
Description: | Return the value of the property PROPERTY in the node NODE. If the property is not set, return the default value #f. |
Description: | Set the subtrees of the tree NODE to SUBTREES, destructively. |
Description: | Return the subtrees of the tree NODE. |
The pretty printer used in the SCHEMESTATION compiler is written originally by Marc Feeley in 1991. It is modified slightly to fit our purposes.
[The printf family is again home-made.]
Description: | As printf, put print to PORT instead of (current-output-port). |
Description: | Interpret STR as a format string, and output to the current output port STR verbatim, except for certain control sequences that start with the percent sign (%) and are replaced with renderings of the rest arguments ARGS in order. The general form of a control sequence is %[justification-flag][width]{format-char}. justification-flag and width are optional, format-char is mandatory. format-char is one of the following characters: (1) 'd', denoting that the corresponding argument is rendered as if outputted using the procedure display, (2) 'w', denoting that the corresponding argument is rendered as if outputted using the procedure write (i.e. in a form understood by Scheme's read), (3) 'p', denoting that the corresponding argument should be pretty-printed, and (4) 'a', denoting that the corresponding argument is an annotated expression and should be written as if 'w' had given but with the annotations data removed.
The width field is used to control the exact length of rendered string. If no width field is present, then the whole string representing the corresponding object is outputted in its natural length. If a width field is present, then the representation will be modified to use exactly the amount of characters specified by the width field. The width field controls the width of pretty printing if the format 'p' is used, not the length of the renderation (this is an exception). The justification-flag field has no effect if width has not been specified. If width has been specified, then the justification-flag controls the alignment of text in the rendered string if the text is shorter than the requested width. Three flags are understood: (1) 'l', meaning to align to left, (2) 'r', meaning to align to right, and (3) 'c', denoting centering. |
Description: | As printf, put return a string instead of outputting to a port. |
Description: | Enable the tracing mode TRACING-MODE. By default, all modes are disabled. Tracing modes are enabled to get useful information about the internal workings of the compiler, mainly for debugging purposes.") |
Description: | Do (apply printf ARGS) if the tracing mode TRACING-MODE has been enabled, otherwise do nothing. The return value is unspecified.") |
Description: | Do (apply printf ARGS) if the compiler verbosity option is at least VERBOSITY-LEVEL, otherwise do nothing. The return value is unspecified.") |
The assembler, written in C, reads in assembly files and outputs byte code objects.
The general input syntax of the assembler is specified previously [see above] but is elaborated on here.
Every instruction is presented with an expression of form
(
Registers are specified using symbolic names. The mapping of symbolic names to register numbers can be changed in the preamble. If no mapping is given, the following default mapping is used:
Symbolic name | Register number |
---|---|
env | 0 |
arg | 1 |
cont | 2 |
genv | 3 |
t1 | 4 |
val | 5 |
t2 | 6 |
t3 | 7 |
t4 | 8 |
t5 | 9 |
t6 | 10 |
t7 | 11 |
mq | 12 |
exc | 13 |
stack | 14 |
pc | 15 |
If at least one register identifier specification appears in the preamble then the default mapping is not used at all.
The constants part specifies one Scheme object, that must be a vector. The constants part must present a loopless Scheme object that is read then in in a depth-first manner. Vectors are specified by giving the vector keyword, then the vector length, and then as many constants as specified in the length field; similarly for other types (see the syntax).
The assembler needs to perform the following operations:
Thus the implementation can be divided into the following parts:
ASSEMBLER -- Main interface -- command line processing | - Parser -- general lexical scanner | | | - instructions reading | | | - constants table reading | | | - preamble parsing | - Labels handling -- constants table entry allocation | | | - offset computation | - Binary data generation -- instruction generation | | | - constants table writing | - Aux & support
Short descriptions of the components presented above follows.
The main interfaces for the modules and implementation highlights will be specified and discussed next.
This is the main entry point of the program. It parses the command line using the SCHEMESTATION utilities library, and then assembles the files specified at the end of the command line.
This function is called before actual assembling starts. It initializes dynamically allocated memory structures required by the assembling procedure. This procedure should be called only once.
The string FILENAME is a file name without any suffix. The file FILENAME.ssasm is tried to assemble to the file FILENAME.ssbc. Returns STATUS_FAILED if the files cannot be opened, or if assembling failed. Otherwise outputs a byte code object to FILENAME.ssbc and returns STATUS_OK. This function may be called multiple times.
Run the main assembler, reading data from INPUT and writing the byte code file to OUTPUT, if no errors occur. This is the core of the assembler, responsible of reading in the preamble, the instructions, the constants table, and then producing the byte code file.
The general algorithm is as follows: The generated lexical scanner is used first to read the preamble. When the preamble/instructions separator @instructions has been read, switch to an instructions reading mode. This mode is implemented as a simple state machine that adds instructions and labels read to dynamically growing tables using procedures defined below. When the instructions/constants separator @constants has been read, first label offsets are computed as now all labels should be defined. Then a dedicated procedure is used to read in the constants table. When the constants table has been read, the byte code object file is outputted and the assembling ends in success.
Return the instruction number corresponding to the symbolic instruction name INSTR_NAME. Return -1, if the instruction name is unknown.
Return the register number corresponding to the symbolic register name REG_NAME. Return -1, if the register name is unknown.
The lexical scanner used is generated using flex(1). See flex documentation for more information.
The following procedures are used to create labels and instructions, and to handle them:
Define a new label with the name LABEL_NAME. Return an index number for the label in NUM. This function always succeeds (unless the main memory is exhausted, but this is not taken into account).
Return the index number of the label named LABEL_NAME, or -1 if the label name is unknown.
Set the label number LABEL_NUM to denote the instruction INSTR_NUM, from the beginning of the program. Return STATUS_OK if the label had no previous denotation. If the denotation tried to set twice for the same label, the second (and subsequent) time(s) return STATUS_FAILED and do nothing.
Compute the real label offsets into the byte code sequence. These are not, in general, the same as the instruction numbers given in set_label_denotation, as the length of binary representations of byte code instructions vary depending on the complexity of the instructions.
Build the byte code object (in essence, a linearized heap vector). The byte code object is allocated dynamically, and a pointer to it is written *PTR. The length of the object is written to *LENPTR. Return STATUS_OK, if everything went well. If something went wrong, return STATUS_FAILED. If the return value is not STATUS_OK, then PTR and LENPTR might contain garbage.
At the start of the procedure, a heap is initialized using the SCHEMESTATION heap library. This heap is utilized by the procedures below.
The algorithm used to implement build_linearized_object_vector is the following: first, initialize a heap. Allocate the binary object top-level vector from the heap. Then, read in the constants table and build the corresponding Scheme object, storing it in one slot of the binary object vector. Then build the binary representation of the instruction sequence and make a heap bit string from it. Store the bit string in one slot of the binary object vector. Then linearize the binary object vector, deinitialize the used heap and return the linearized heap, which is the binary code object.
Read in the constants table and return the read constant object in D1. Return STATUS_FAILED if there was a syntactic or semantic error in the constants table.
The constants table is read in recursively. First, topmost object specification is read and it is ensured that it is a vector (the constants table must be a vector). Then read_constant() is called to read the items of the vector; read_constant() then calls itself recursively when reading composite data (pairs or vectors). In addition to the items readed from the file, also the label offsets are written to the actual constants table created. Thus the actual length of the constants table created is at least that of that in file, plus the number of distinct labels in the program.
Recursively read constants in, return result in *D. This is done according to the constants table syntax.
Create the binary representation of the byte code sequence and return a pointer to a piece of dynamically allocated memory containing the binary representation in *BUFPTR. Return the length of the representation in *LEN. Return STATUS_OK if everything went ok, otherwise STATUS_FAILED, in which case BUFPTR and LEN might contain garbage.
Write the binary representation of the instruction number NUMBER in the sequence to *BUF, if BUF is not NULL. Return the length of the representation in any case in LEN. This procedure utilizes the SCHEMESTATION virtual machine library. It is called twice, first to compute the label offsets but not to create any byte code, and then to create the actual binary representation.
Some auxiliary procedures will be needed, but they are not specified here.