Compiler technical specification

Compiler technical specification
SchemeStation documentation

1 Introduction

The SCHEMESTATION cross-compiler and assembler are used to produce binary byte code objects to be run in the SCHEMESTATION operating system simulator.

This document specifies the implementation aspects of the compiler and the assembler.

2 General

The compiler is implemented in Scheme and the assembler in C. The assembler is written in C because the library implementing SCHEMESTATION heap handling is written in C. Were the assembler written in Scheme, heap creation should be re-implemented in Scheme because byte code objects produced by the assembler are essentially linearized heaps. On the other hand, it is preferable to keep the amount of non-Scheme code in minimum in order to make later porting of the compiler to the SCHEMESTATION platform as easy as possible.

This document specifies the compiler first and the assembler afterwards.

The overall structure of the compiler is presented in Fig 1.


Fig 1. The overall structure of the compiler

3 Compiler

3.1 Source code syntax

3.1.1 Lexical structure

The lexical structure of legal source code accepted by the compiler is a sequence of tokens, where the structure of a token is defined by the following grammar (borrowed modified from [1]).

{token} -->   {identifier}
            | {boolean}
            | {number}
	    | {character}
	    | {string}
            | {bit-string}
	    | ( | ) | #( | ' | ` | , | ,@ | .
{delimiter} --> {whitespace} | ( | ) | " | ;
{whitespace} --> {space or newline}
{comment} --> ; {all subsequent characters up to a line break}
{atmosphere} --> {whitespace} | {comment}
{intertoken space} --> {atmosphere}*

{identifier} --> {initial} {subsequent} * | {peculiar identifier} {initial} --> {letter} | {special initial} {letter} --> a | b | c | ... | z | A | B | C | ... | Z {special initial} --> ! | $ | % | & | * | / | : < | = | > | ? | ~ | _ | ^ {subsequent} --> {initial} | {digit} | {special subsequent} {digit} --> 0 | 1 | 2 | ... | 9 {special subsequent} --> . | + | - {peculiar identifier} --> + | - | ...

{boolean} --> #t | #f {character} --> #\ {any character} | #\ {character name} {character name} --> space | newline

{string} --> " {string element}* " {string-element} --> {any character except " or \} | {backslashed} {backslashed} --> \ {any character}

{bit-string} --> {hex-bit-string} | | {binary-bit-string} {hex-bit-string} --> #H {hex-digit}* {binary-bit-string} --> #B {binary-digit}* {hex-digit} --> 0 | 1 | 2 | ... | 9 | a | b | ... | f | A | B | ... | F {binary-digit} --> 0 | 1

{number} --> {integer} | {real}

{integer} --> {sign} {digit}+ {sign} --> + | - | {real} --> {sign} {digit}+ {decimal-part} {exponent} {decimal-part} --> . {digit}* {exponent} --> e {sign} {digit}+

3.1.2 Representation of datums

The external representations of Scheme objects, datums, are formed according to the following grammar:

{datum} --> {simple datum} | {compound datum}
{simple datum} -->   {boolean}
                   | {number}
                   | {character}
                   | {string}
                   | {bit-string}
                   | {symbol}
{symbol} --> {identifier}
{compound datum} --> {list} | {vector}
{list} -->   ( {datum}* ) | ( {datum}+ . {datum} )
           | {abbreviation}
{abbreviation} --> {abbrev prefix} {datum}
{abbrev prefix} --> ' | ` | , | ,@
{vector} --> #( {datum}* )

3.1.3 Expressions and programs

The expression syntax for the compiler is specified in the grammar below. As opposed to that in [1], we do not give syntax for most of the derived expressions due to the fact that they are implemented using the macro system and thus their grammar is not a part of the compiler per se. Concerning quasiquotations, see the note in [1] about the grammar formalism.

{expression} -->   {variable}
                 | {literal}
                 | {procedure call}
                 | {lambda expression}
                 | {conditional}
                 | {assignment}
                 | {sequencing}
                 | {compiler directive}
                 | {syntax}

{literal} --> {quotation} | {self-evaluating} {self-evaluating} --> {boolean} | {number} | {character} | {string} | {bit-string} {quotation} --> '{datum} | (quote {datum}) {procedure call} --> ({operator} {operand}*) {operator} --> {expression} {operand} --> {expression}

{lambda expression} --> (lambda {formals} {body}) {formals} --> ({variable}*) | {variable} | | ({variable}+ . {variable}) {body} --> {sequence} {sequence} --> {command}* {expression} {command} --> {expression}

{conditional} --> (if {test} {consequent}) | (if {test} {consequent} {alternate}) {test} --> {expression} {consequent} --> {expression} {alternate} --> {expression}

{assignment} --> (set! {variable} {expression})

{compiler directive} --> (include {string})

{sequencing} --> (begin {sequence})

{quasiquotation} --> {qq 1} {template 0} --> {expression} {qq D} --> `{template D} | (quasiquote {template D}) {template D} --> {simple datum} | {list template D} | {vector template D} | {unquotation D} {list template D} --> ({template or splice D}*) | ({template or splice D}+ . {template D}) | '{template D} | {qq D+1} {vector template D} --> #({template or splice D}*) {unquotation D} --> ,{template D-1} | (unquote {template D-1}) {template or splice D} --> {template D} | {splicing unquotation D} {splicing unquotation D} --> ,@{template D-1} | (unquote-splicing {template D-1})

{syntax} --> (let-syntax ({syntax-binding}*) {body}) | (letrec-syntax ({syntax-binding}*) {body}) {syntax-binding} --> ({macro-name} {literals} {syntax-pattern}+) {macro-name} --> {variable} {literals} --> ({variable}*) {syntax-pattern} --> ({pattern} {template}) {pattern} --> {datum} {template} --> {datum}

Finally, programs compiled by the compiler are defined by the following grammar:

{program} --> {command or definition}*
{command or definition} -->   {command} | {definition}
{definition} -->   (_define {variable} {expression})
                 | (begin {definition}*)
                 | (define-syntax {macro-name}
                                  {literals}
                                  {syntax-pattern}+)

3.2 Source code semantics

The semantics of the source language is that of the Scheme programming language with extensions. The following noteworthy extensions exist:

  1. There is a global variable exception-handler that can contain a procedure that will be called when running programs receive messages.
  2. There is a procedure send-message that can be used to send messages.
  3. There are primitive procedures push-to-queue and get-from-queue that can be used to handle an internal message queue, so that get-from-queue will block if the queue is empty until an exception handler pushes more to the queue using push-to-queue.

3.3 Primitive procedures

The following primitive procedures are supported by the compiler:

3.3.1 Pairs


Core procedure: (car pair)
Return the first item of a given pair.


Core procedure: (cdr pair)
Return the second item of a given pair.


Core procedure: (cons value value)
Create a new pair.


Core procedure: (set-car! pair value)
Set the first item of a given pair.


Core procedure: (set-cdr! pair value)
Set the second item of a given pair.


Core procedure: (make-tagged-pair tag)
Create a new tagged pair with the given tag.

3.3.2 Vectors


Core procedure: (make-vector length)
Create a new vector of the given length.


Core procedure: (make-tagged-vector length tag)
Create a new tagged vector with the given tag and of the given length.


Core procedure: (vector-set! vector index value)
Set the indexth value of vector to value.


Core procedure: (vector-get vector index)
Return the indexth value of the given vector.


Core procedure: (vector-ref vector index)
A synonym to vector-get.

3.3.3 Bit strings


Core procedure: (make-shared-bit-string length)
Core procedure: (make-private-bit-string length)
Core procedure: (make-shared-bit-string length tag)
Core procedure: (make-private-bit-string length tag)
Create a new bit string whose length is LENGTH bits (not bytes).


Core procedure: (bit-string-length string)
Return the length of string in bits (not bytes).


Core procedure: (bit-string-shr string amount)
Core procedure: (bit-string-shl string amount)
Core procedure: (bit-string-rol string amount)
Core procedure: (bit-string-ror string amount)
Shift (shr/shl) or rotate (rol/ror) bit strings left or right; return newly allocated bit strings.


Core procedure: (bit-string-and string1 string2)
Core procedure: (bit-string-or string1 string2)
Core procedure: (bit-string-xor string1 string2)
Core procedure: (bit-string-neg string1)
Bit-wise bit string operations; return newly allocated bit strings.


Core procedure: (set-bit-substring! string1 string2 offset)
Copy the contents of string2 over string1, starting at the position offset.


Core procedure: (get-bit-substring string1 start length)
Return a newly allocated bit string that contains the region of string1 that starts at the startth bit and spans length bits.

3.3.4 Messages


Core procedure: (send-message address message)
Send a given message to a given address.

3.3.5 Miscallenous


Core procedure: (not value)
Return #f if value is not #f, otherwise #t.


Core procedure: (eq? value value)
Return #t if two values are exactly the same object, otherwise #f.


Core procedure: (halt)
Terminate; commit a suicide; exit; quit; halt for ever.


Core procedure: (exit)
A synonym to halt.


Core procedure: (quit)
A synonym to halt.

3.3.6 Arithmetics


Core procedure: (int+ integer integer)
Core procedure: (int/ integer integer)
Core procedure: (int- integer integer)
Core procedure: (int* integer integer)
Core procedure: (int= integer integer)
Core procedure: (int>= float float)
Integer arithmetics and comparison.


Core procedure: (float+ float float)
Core procedure: (float/ float float)
Core procedure: (float- float float)
Core procedure: (float* float float)
Core procedure: (float= integer integer)
Core procedure: (float>= float float)
Floating-point arithmetics.


Core procedure: (modulo integer integer)
Core procedure: (divide integer integer)
Core procedure: (remainder integer integer)
Number-theoretical division and remainders for integers.

3.3.7 Type predicates


Core procedure: (boolean? value)
Return #t if value is a boolean object, otherwise #f.


Core procedure: (integer? value)
Return #t if value is an integer object, otherwise #f.


Core procedure: (float? value)
Return #t if value is a floating-point number object, otherwise #f.


Core procedure: (pair? value)
Return #t if value is a pair, otherwise #f.


Core procedure: (vector? value)
Return #t if value is a vector of any length, otherwise #f.


Core procedure: (shared-bit-string? value)
Return #t if value is a shared bit string, otherwise #f.


Core procedure: (private-bit-string? value)
Return #t if value is a private bit string, otherwise #f.


Core procedure: (tagged-pair? value tag)
Return #t if value is a tagged pair and has the tag tag, otherwise #f.


Core procedure: (tagged-vector? value tag)
Return #t if value is a tagged vector of any length and has the tag tag, otherwise #f.


Core procedure: (tagged-shared-bit-string? value tag)
Return #t if value is a tagged shared bit string and has the tag tag, otherwise #f.


Core procedure: (tagged-private-bit-string? value tag)
Return #t if value is a tagged private bit string and has the tag tag, otherwise #f.


Core procedure: (procedure? value)
Return #t if value is a procedure object, otherwise #f.

3.4 Macro expansion

The macro facility provided by the compiler is reminiscent of that proposed in [1]. With every macro, a set of literals and a set of syntax patterns is associated. The set of literals is a set of keywords that are recognized as literal identifiers as instead of pattern variables in a pattern. Syntax patterns are used to define actual macro transformations. The syntax of macro definitions is seen from the grammars presented previously.

A syntax pattern consists of a pattern and a template. When macro is expanded, the macro invocation is matched against the patterns in the order they appear in the corresponding macro definition. When a first matching pattern is found, the corresponding template is expanded and the expansion replaces the original invocation during the macro expansion process.

A pattern is a Scheme expression, where variables denote pattern variables that can be matched by arbitrary sub-expressions in the invocation. Ellipsis (...) can be used to denote repetition and works as in the appendix of [1]. Pattern variables appearing in the template get their bindings from the matching. Ellipsis can naturally appear in the templates too.

3.5 Output of the compiler

The compiler outputs an assembler source file. The structure of an assembler source file is defined in the grammar below. Low-level lexical conventions are those of Scheme; semi-colons start comments and the structure of white space is not important.

{asm file} --> {preamble} @instructions {instructions} @constants {constant}

{preamble} --> {register spec}* {register spec} --> {asm identifier} {integer}

{instructions} --> {instruction or label}* {instruction} --> ({operator} {operand}*) {asm identifier} --> {start} {subsequent}* {start} --> {lower-case letter or hyphen} {subsequent} --> {start or digit} {label} --> {asm identifier} : {operator} --> {asm identifier} {operand} --> {register} | {index} | {label} {register} --> {asm identifier} {index} --> {integer}

{constant} --> {vector spec} | {boolean spec} | {bit-string spec} | {integer spec} | {pair spec}

{vector spec} --> vector {length} {constant} {boolean spec} --> boolean {0 or 1} {bit-string spec} --> bit-string {length} {byte}* {length} --> {integer} {byte} --> {integer} {integer spec} --> integer {integer} {pair spec} --> pair {constant} {constant}

See the section on assembler for details.

3.6 Compiler command line syntax

The compiler can be executed from the UNIX command line, and the following options are recognized:

--asmopts STRING
Pass STRING to the assmbler when invoked as a sub-process of the compiler. Default is the empty string.
--assemble, --noassemble
Choose whether or not to run assembler. Default is to run.
--assembler NAME
Set the assembler executable name as given to a sub-shell starting the assembler (default is "ssasm").
--help
Output a help message and exit.
--includedir DIR
Add the directory DIR as the first to the list of directories to be searched for included files.
--optimize LEVEL
Set the optimization level, zero for no optimization at all and larger integers for more optimization.
--verbose
Make the compiler to print information about its progress. Can be given multiple times to get more verbosity.
--trace ITEM
Choose to trace a specific part of the compiler identifier by ITEM. This is for debugging only.
--std, --nostd
Choose to include or to not include the standard include files that define e.g. the derived forms of the Scheme language. The default is to include.

Non-option arguments on the command line are files to compile. If a file name ends with the suffix ``.ssscm'' or the suffix ``.scm'', the suffix is stripped of from the file name. A corresponding assembler file is always written and has the suffix ``.ssasm''. The byte code file created by the assembler has the suffix ``.ssbc''.

3.7 Compiler implementation

The [Scheme] implementation of the compiler is divided into the following parts:

COMPILER -- Main interface -- command line processing
          |                 |
          |                 - compiler invocation
          |                 |
          |                 - assembler invocation
 	  |
 	  - Scheme reader -- lexical parser
          |                | 
          |                - datum building
 	  |
 	  - Macro expansion -- semantic tree building
          |                  | 
          |                  - macro transformations
          |                  |
          |                  - lexical address resolving
          |                  |
	  |                  - semantic optimizations
 	  |
 	  - Code generation -- linearization phase
          |                  | 
          |                  - peephole optimizer
	  |
	  - Support
          |
          - General auxiliary procedures

Short descriptions of the components presented above follows.

Main interface
The Main interface module is responsible for interacting with the user and the assembler. It consists of
  • a command line processor, that interprets the command line given by the user and runs the compiler with the functionality requested in the line;
  • a compiler invoker, that is used by the command line processor to invoke the real compiler; and
  • an assembler invoker, that is used to interact with the external assembler when creating byte code object files.

Scheme reader
The Scheme reader module is responsible for reading and parsing Scheme expression from input streams. It consists of two parts: first,
  • a lexical parser, that tokenizes an input stream, and
  • a datum builder, that interprets a stream of tokens and builds structured data objects from the stream.

Macro expansion
The Macro expansion module is responsible for transforming a Scheme program into a semantically equivalent semantic tree, expanding user-defined macros during the process. The Macro expansion module consists of
  • procedures to create different semantic trees and to control the expansion process;
  • procedures to define new macro transformers and to apply them;
  • functionality to compute the static lexical addresses of variable references; and
  • a sub-module that implements certain semantic optimizations after the mapping to a semantic tree has finished.

Code generation
The Code generation module transforms a semantic tree to a byte code sequence. The code generation module consists of
  • a linearization sub-module that implements the actual linearization, and
  • a peephole optimizer that does final optimizations on the linearized code.

Support
The Support module contains procedures and sub-modules that are specific to the compiler but that cannot be easily fit into the modules above. These sub-modules include e.g. modules for keeping track of the lexical environments during compilation.
General auxiliary procedures
The Generaul auxiliary procedures include compiler non-specific procedures that are handy in implementing the compiler but that are not present in some standard Scheme implementations.

The main interfaces for the modules and implementation highlights will be specified and discussed next.

3.7.1.5.1 The functions of module Compiler main interface

File compiler.scm

Scheme-functions:


File compile-file.scm

Scheme-functions:


File file-input.scm

Scheme-functions:


File initial-lex-env.scm

Scheme-functions:

3.7.2.2.4 The functions of module Compiler command line processor

File options.scm

Scheme-functions:

3.8 Scheme reader

The Scheme reader used in the compiler reads in expressions but does not store them in the native way, but instead as annotated expressions. Annotated expressions are Scheme expressions that contain information about the source of the expressions. For example, the name of the file where the expression is originally read is stored as an annotation associated with the expression.

3.8.1 Lexical parser

The lexical parser is generated using Danny Dube's SILex lexical scanners generator. See SILex documentation for interfaces and other details.

3.8.2.2.1 The functions of module Scheme reader

File reader.scm

Scheme-functions:


File lexer.scm

Scheme-functions:

3.9 Syntax, macro expansion and semantic tree creation

3.9.1.22.3 The functions of module Syntax, macro expansion and semantic tree creation

File st-application.scm

Scheme-functions:


File st-asm.scm

Scheme-functions:


File st-assign.scm

Scheme-functions:


File st-aux.scm

Scheme-functions:


File st-closure.scm

Scheme-functions:


File st-literal.scm

Scheme-functions:


File st-primitive.scm

Scheme-functions:


File st-sequence.scm

Scheme-functions:


File st-syntax.scm

Scheme-functions:


File st-test.scm

Scheme-functions:


File st-variable.scm

Scheme-functions:


File st-main-build.scm

Scheme-functions:


File syntax-asm.scm

Scheme-functions:


File syntax-begin.scm

Scheme-functions:


File syntax-define.scm

Scheme-functions:


File syntax-if.scm

Scheme-functions:


File syntax-include.scm

Scheme-functions:


File syntax-lambda.scm

Scheme-functions:


File syntax-quote.scm

Scheme-functions:


File syntax-set.scm

Scheme-functions:


File syntax-procedures.scm

Scheme-functions:


File macros.scm

Scheme-functions:

3.10 Code generation

3.10.1 Semantic optimization

Semantic optimization is not specified yet.

3.10.2.5 The functions of module Code generation

File lexaddrs.scm

Scheme-functions:


File newlinearizer.scm

Scheme-functions:

3.10.3.1.1 The functions of module Peephole optimizer

File peephole.scm

Scheme-functions:

3.11 Support

3.11.1.1.2 The functions of module Error reporting

File comerror.scm

Scheme-functions:

3.11.2.2.4 The functions of module Binding environments

File global-vars.scm

Scheme-functions:


File lexical-bindings.scm

Scheme-functions:

3.11.3.2.7 The functions of module Semantic trees support

File trees.scm

Scheme-functions:

3.12 Auxiliary

The pretty printer used in the SCHEMESTATION compiler is written originally by Marc Feeley in 1991. It is modified slightly to fit our purposes.

[The printf family is again home-made.]

3.12.1.1.3 The functions of module Formatted printing

File printf.scm

Scheme-functions:

3.12.2.1.3 The functions of module Tracing support

File trace.scm

Scheme-functions:

4 Assembler

The assembler, written in C, reads in assembly files and outputs byte code objects.

4.1 Assembler syntax

The general input syntax of the assembler is specified previously [see above] but is elaborated on here.

4.1.1 Instruction identifiers

Every instruction is presented with an expression of form (identifier operand ...). The instruction identifier is derived from the corresponding byte code instruction name in a straightforward way; upper case characters are converted to lover case and underscores are replaced by hyphens. So, for example, the identifier corresponding to the instruction LOAD_NULL is load-null. The full list of virtual machine instructions is found from SchemeStation VM instruction numbers and argument types and Virtual machine definition.

4.1.2 Register identifiers

Registers are specified using symbolic names. The mapping of symbolic names to register numbers can be changed in the preamble. If no mapping is given, the following default mapping is used:

Symbolic nameRegister number
env0
arg1
cont2
genv3
t14
val5
t26
t37
t48
t59
t610
t711
mq12
exc13
stack14
pc15

If at least one register identifier specification appears in the preamble then the default mapping is not used at all.

4.1.3 Constants table

The constants part specifies one Scheme object, that must be a vector. The constants part must present a loopless Scheme object that is read then in in a depth-first manner. Vectors are specified by giving the vector keyword, then the vector length, and then as many constants as specified in the length field; similarly for other types (see the syntax).

4.2 Implementation

The assembler needs to perform the following operations:

Thus the implementation can be divided into the following parts:

ASSEMBLER -- Main interface -- command line processing
           |
           - Parser -- general lexical scanner
           |         |
           |         - instructions reading
           |         |
           |         - constants table reading
           |         |
           |         - preamble parsing
           |
           - Labels handling -- constants table entry allocation
           |                  | 
           |                  - offset computation
           |
           - Binary data generation -- instruction generation
           |                         |
           |                         - constants table writing 
           |
           - Aux & support

Short descriptions of the components presented above follows.

Main interface
The Main interface module is responsible of communicating with the user and controlling the assembling process. It consists mainly of
  • a unit for parsing the command line options [this is got from the SCHEMESTATION utilities library]
Parser
The Parser module is responsible for interpreting the assembler files presented to the assmbler. It consists of
  • a lexical scanner that tokenizes the input;
  • a module for reading the instructions part;
  • a module for reading the constants table; and
  • a module for parsing the preamble.
Labels handling
The Labels handling module is responsible for allocating constant table entries for labels and computing their offsets.
Binary data generation
The Binary data generation module is responsible for creating the actual binary representation of the source program and storing it into a UNIX filesystem file. This module is implemented by using the SCHEMESTATION virtual machine and heap libraries, specified elsewhere.
Aux & support
Any other required [auxiliary] functionality is implemented in the Aux & support module.

The main interfaces for the modules and implementation highlights will be specified and discussed next.

4.3 Main interface


Interface: int main(int argc, char **argv)

This is the main entry point of the program. It parses the command line using the SCHEMESTATION utilities library, and then assembles the files specified at the end of the command line.


Interface: void init_assembler(void)

This function is called before actual assembling starts. It initializes dynamically allocated memory structures required by the assembling procedure. This procedure should be called only once.


Interface: Status assemble(char *filename)

The string FILENAME is a file name without any suffix. The file FILENAME.ssasm is tried to assemble to the file FILENAME.ssbc. Returns STATUS_FAILED if the files cannot be opened, or if assembling failed. Otherwise outputs a byte code object to FILENAME.ssbc and returns STATUS_OK. This function may be called multiple times.

4.4 Parser


Interface: Status run_assembler(FILE *input, FILE *output)

Run the main assembler, reading data from INPUT and writing the byte code file to OUTPUT, if no errors occur. This is the core of the assembler, responsible of reading in the preamble, the instructions, the constants table, and then producing the byte code file.

The general algorithm is as follows: The generated lexical scanner is used first to read the preamble. When the preamble/instructions separator @instructions has been read, switch to an instructions reading mode. This mode is implemented as a simple state machine that adds instructions and labels read to dynamically growing tables using procedures defined below. When the instructions/constants separator @constants has been read, first label offsets are computed as now all labels should be defined. Then a dedicated procedure is used to read in the constants table. When the constants table has been read, the byte code object file is outputted and the assembling ends in success.


Interface: int get_instr_num(char *instr_name)

Return the instruction number corresponding to the symbolic instruction name INSTR_NAME. Return -1, if the instruction name is unknown.


Interface: int get_reg_num(char *reg_name)

Return the register number corresponding to the symbolic register name REG_NAME. Return -1, if the register name is unknown.

The lexical scanner used is generated using flex(1). See flex documentation for more information.

4.5 Labels handling

The following procedures are used to create labels and instructions, and to handle them:


Interface: void add_label(char *label_name, int *num)

Define a new label with the name LABEL_NAME. Return an index number for the label in NUM. This function always succeeds (unless the main memory is exhausted, but this is not taken into account).


Interface: int get_label_num(char *label_name)

Return the index number of the label named LABEL_NAME, or -1 if the label name is unknown.


Interface: Status set_label_denotation(int label_num, int instr_num)

Set the label number LABEL_NUM to denote the instruction INSTR_NUM, from the beginning of the program. Return STATUS_OK if the label had no previous denotation. If the denotation tried to set twice for the same label, the second (and subsequent) time(s) return STATUS_FAILED and do nothing.


Interface: Status compute_label_offsets(void)

Compute the real label offsets into the byte code sequence. These are not, in general, the same as the instruction numbers given in set_label_denotation, as the length of binary representations of byte code instructions vary depending on the complexity of the instructions.

4.6 Binary data generation


Interface: Status build_linearized_object_vector(uchar **ptr, int *lenptr)

Build the byte code object (in essence, a linearized heap vector). The byte code object is allocated dynamically, and a pointer to it is written *PTR. The length of the object is written to *LENPTR. Return STATUS_OK, if everything went well. If something went wrong, return STATUS_FAILED. If the return value is not STATUS_OK, then PTR and LENPTR might contain garbage.

At the start of the procedure, a heap is initialized using the SCHEMESTATION heap library. This heap is utilized by the procedures below.

The algorithm used to implement build_linearized_object_vector is the following: first, initialize a heap. Allocate the binary object top-level vector from the heap. Then, read in the constants table and build the corresponding Scheme object, storing it in one slot of the binary object vector. Then build the binary representation of the instruction sequence and make a heap bit string from it. Store the bit string in one slot of the binary object vector. Then linearize the binary object vector, deinitialize the used heap and return the linearized heap, which is the binary code object.


Interface: Status build_constants_table(HeapTypedData *d1)

Read in the constants table and return the read constant object in D1. Return STATUS_FAILED if there was a syntactic or semantic error in the constants table.

The constants table is read in recursively. First, topmost object specification is read and it is ensured that it is a vector (the constants table must be a vector). Then read_constant() is called to read the items of the vector; read_constant() then calls itself recursively when reading composite data (pairs or vectors). In addition to the items readed from the file, also the label offsets are written to the actual constants table created. Thus the actual length of the constants table created is at least that of that in file, plus the number of distinct labels in the program.


Interface: Status read_constant(HeapTypedData *d)

Recursively read constants in, return result in *D. This is done according to the constants table syntax.


Interface: create_byte_code(unsigned char **bufptr, int *len)

Create the binary representation of the byte code sequence and return a pointer to a piece of dynamically allocated memory containing the binary representation in *BUFPTR. Return the length of the representation in *LEN. Return STATUS_OK if everything went ok, otherwise STATUS_FAILED, in which case BUFPTR and LEN might contain garbage.


Interface: build_instruction(int number, int *len, unsigned char *buf)

Write the binary representation of the instruction number NUMBER in the sequence to *BUF, if BUF is not NULL. Return the length of the representation in any case in LEN. This procedure utilizes the SCHEMESTATION virtual machine library. It is called twice, first to compute the label offsets but not to create any byte code, and then to create the actual binary representation.

4.7 Aux & support

Some auxiliary procedures will be needed, but they are not specified here.

5 References

[1]
William Clinger and Jonathan Rees (editors). Revised4 Report on the Algorithmic Language Scheme.