Heap layout

Heap layout
SchemeStation documentation

1 Introduction

Almost all data in SCHEMESTATION is list-structured and resides in heaps, which are pieces of list-structured memory. More-over, all data items are typed. This is a feature that "normal" hardware solutions do not provide (memory words are untyped and in a linear array). The pieces of active list-structured memory are garbage collected to make explicit reclamation of used memory unnecessary.

We use the term "heap" widely to mean any list-structured amount of data.

2 Data types

SCHEMESTATION supports the following data types:

2.1 Pairs

Pairs are collections of two data items. The two data items are called the car and the cdr of the pair. Pairs can be mutated if they are not constant. Pairs are always passed by reference inside a program, i.e. they are shared objects.

The null pointer is represented just as a pair that points to the location zero (it can [and should] never contain the contents of a pair).

2.2 Vectors

Vectors are collections of zero or more data items. Vectors can be mutated on a per-item basis if they are not constant. Vectors are always passed by reference inside a program, i.e. they are shared objects.

2.3 Shared bit arrays

Bit arrays are collections of bits. Bit arrays can be mutated if they are not constant. Shared bit arrays are always passed by reference.

2.4 Private bit arrays

Private bit arrays are passed "by value", i.e. they are copy-on-write.

2.5 Booleans

Booleans are items from the finite set of two values, traditionally called #f and #t [after their written representation in Scheme].

2.6 64-bit signed integers

Similar to booleans, but with 64-bit ordered integer value space.

2.7 64-bit double-precision IEEE floating point numbers

Set of floating point numbers.

2.8 Procedures

Procedures are combinations of byte code addresses and arbitrary (singular) objects. (Actually the address is a procedure entry point and the "arbitrary" object points to an execution environment.)

2.9 Extended-typed objects

Extended-typed objects are as their "normal" counterparts but they include an additional [64-bit] field that is used to tag them with application-defined type tags.

3 Heap layout

Two issues must be considered with respect to the actual heap layout: (1) how an active heap is presented in the computer memory; and (2) how heaps are transferred from a place to another. Thus we must have (1) an efficient representation for heaps during computation and (2) an unambigous and well-defined way to linearize heaps when necessary.

3.1 Memory layout

An active heap is divided into two areas: (1) general area; and (2) bit array storage. The general area is conceptually divided into 64-bit cells and additional 8-bit type tags attached to each cell in a way that is discussed below.

Every 64-bit cell has an additional 8-bit type tag related to it. The cells and type tags are laid out in a sequence of 8 cells, 8 type tags, 8 cells, 8 type tags etc, i.e.

<-> one byte                                 type tags
| . . . . . . . | . . . . . . . | ...       | | | | | | | | | ...
  cell 1          cell 2                     1 2 3 4 5 6 7 8  cell 9

Different values are represented as follows.

3.1.1 Pairs

A pair is represented as a cell with type HEAP_TYPE_PAIR that contains the address of two consequtive cells somewhere in the general area that contain the car and cdr of the pair. Extended-type pairs are pointers to three consequtive cells; the first cell containing the additional type tag; the basic type of the value is then HEAP_TYPE_TAGGED.

3.1.2 Vectors

A vector of length K is represented as a cell with type HEAP_TYPE_VECTOR that contains the address of K+1 consequtive cells somewhere in the general area. The first cell contains the length of the vector (K) and the rest contain the values in the vectors. Extended-type vectors contain an additional type tag cell before the length field; the basic type of the value is then HEAP_TYPE_VECTOR_TAGGED.

3.1.3 Bit arrays

Bit arrays are cells with type HEAP_TYPE_SHARED_BSTRING, HEAP_TYPE_PRIVATE_BSTRING, HEAP_TYPE_SHARED_BSTRING_TAGGED or HEAP_TYPE_PRIVATE_BSTRING_TAGGED. They point to the bit array storage area. There first the length of the bit array is recorded, then a reference counter, and finally the bits of the array itself.

3.1.4 Booleans

Booleans are represented as 64 zero bits (#f) or 63 zero bits and a 1-bit (#t). They have type HEAP_TYPE_BOOLEAN.

3.1.5 Procedures

Procedures are pointers to two consequtive cells, the first of which containing a byte code address and the second an arbitrary value. Procedures have type HEAP_TYPE_PROCEDURE.

3.2 Linearized heaps

Linearized heaps are represented in the following format:

  1. The total length of the linearisation (in bytes)
  2. The number of eight-cell items described in layout section.
  3. The number of bit string items in linearisation.
  4. The cells of the common part. Addresses are integers starting from zero (the first cell) and advancing by one for each cell. Storage structure is the one described in the layout section.
  5. The bit array length table.
  6. The bit array tag table. Zero if untagged.
  7. The bit array refrence count table.
  8. The actual bit arrays, extended to next byte-divisible length and paddded with zero bits. The addresses to the bit arrays start from zero (the first bit string) and are interpreted as indices to an array whose elements are 64-bit each.