Test Report

Test Report
SchemeStation documentation

1 Testing target

Targets of the this testing report are defined at SCHEMESTATION Testing Plan [Project Test Plan] and related documents that define behavior and semantics of the SCHEMESTATION system.

System version related to this testing report is SCHEMESTATION snapshot dated 240498.

2 Differences from the testing plan

The differences from the testing plan are:

Unit testing phase was delayed from January 15 to February 3 due to christmas holiday and the lack of existing common test platform.
Usability testing as user-interface testing is removed because lack of single, uniform user interface. Usability testing in sense of using the system in general is preformed instead.

3 Test coverage

Implemented test coverage is same as the coverage defined in the SCHEMESTATION Testing Plan.

4 Test environment

For uniform method of generating test run output, special Test Report Utility is written. The test run outputs are not included in this report because of their low signal/noise ratio to the non-implementors.

Tests are run on a Linux machine equipped with Intel CPU unless stated otherwise. In general, a Unix-compatible system is assumed.

5 Testing results

Name Done Succesful Total

Unit tests 669 (99%) 668 (99%) 674

Integration tests 16 (100%) 15 (94%) 16

System tests 22 (100%) 22 (100%) 22

Name	Done	Succesful	Total
Unit tests	669 (99%)	668 (99%)	674
Integration tests	16 (100%)	15 (94%)	16
System tests	22 (100%)	22 (100%)	22

The number in the tables should not be interpreted blindly; number, categorisation and importance of tests per target vary. For more accurate information, see related documents. The precentage values represent the number of successful tests of the total tests performed. They give some measure of the progress, but not very much.

Unit test summaries are below. Integration Testing Plan, System test specification and Acceptance testing specification describe the results in other testing phases.

5.0.1 Unit testing

Name Done Succesful Total

The heap 12 (92%) 11 (84%) 13

The addressing system 15 (93%) 15 (93%) 16

The networking system 38 (97%) 38 (97%) 39

The messaging system 10 (91%) 10 (91%) 11

The external agent system 26 (96%) 26 (96%) 27

The virtual machine 228 (100%) 228 (100%) 228

The scheduler 18 (100%) 18 (100%) 18

Compiler 322 (100%) 322 (100%) 322

Name	Done	Succesful	Total
The heap	12 (92%)	11 (84%)	13
The addressing system	15 (93%)	15 (93%)	16
The networking system	38 (97%)	38 (97%)	39
The messaging system	10 (91%)	10 (91%)	11
The external agent system	26 (96%)	26 (96%)	27
The virtual machine	228 (100%)	228 (100%)	228
The scheduler	18 (100%)	18 (100%)	18
Compiler	322 (100%)	322 (100%)	322

The results are described in Heap Unit Testing Plan, Addressing System Unit Testing Plan, Networking Testing Plan, Messaging System Unit Testing Plan, External Agent Interface Testing Plan, VM Unit Testing and Scheduler Unit Testing.

The failing test in the heap module can be triggered only by external maliciously generated linearisations. I can be corrected relatively easily - but is considered low priority as the goals of the project are related to conceptual research, not developing a rock-solid OS.

5.0.2 The compiler

The compiler testing is discussed at Compiler test report.

5.1 Integration tests

Integration tests are described in Integration Testing Plan. Integration tests are performed and passed. One test fails with malicious input (not possible to create such with normal SS system). This can be tracked down to the heap module.

5.2 System and acceptance tests

System and acceptance tests are defined in System test specification and Acceptance testing specification. System and acceptance test is performed, and passed.

5.3 Interoperability testing

Interoperability tests are done, most visibly at the heap [Heap Unit Testing Plan] that give basis for integration testing [Integration Testing Plan] success. Interoperability tests are passed in all port target systems and between them.

5.4 Performance testing

Performance flaws were detected on the heap and VM tests. The easily correctable flaws that caused essential problems on implementation of running system are fixed. Improving the performance further requires drastic changes in the data structures, algorithms, and possibly even in the basic ideas. Some of these ideas are tested in practice, but not included in the default version.

Standard virtual machine performance in the development Linux system is represented below.

Run type VM instructions/s

Real-world performance (messaging+heap+vm) 250.000

Nonmessaging task (heap+vm) 400.000

Maximum performance (vm) 1.000.000

These results meet the original requirement specification values.

Messaging performance is a slight disappointment. Typical values for 433 MHz Alpha system are 20 messages per second - no matter is the message couple of bytes or several megabytes. Some horrifying misfeature or specification flaw is considered to reside in heap or messaging module (probably the heap).

Migration performance is mostly limited by the messaging performance - and the messaging load of the agent.

All of the above performance measurements can vary very much with slight variations - for instance, 64-bit operations being used often favor 64-bit architectures and compilers that inline these operations in 32-bit architectures (which resulted 60% performance increase!). Compiling the system debugging disabled can double the speed of several modules.

5.5 System monitoring

System monitoring is performed by turning debug flags on the software and reading the debug output. No explicit measurements from this data is made.

One or more system monitoring areas can be activated selectively - every opcode run on the VM can be displayed, or messaging activity can be shown, for instance.

5.6 Redefined usability testing and the results

A sort of usability testing was performed by giving a late version of the system to a group of CS students with common interest on Scheme programming language. The domain of this testing was to get comments and ideas of the product from non-developer team. Because of lack of one "true" interface to the system (SS is after all, a whole OS with unlimited set of possible interaction methods), the results reflect the outsider opinions of the system in whole, not just the UI part.

There were following results:

The system was considered good for the specified use - which is scientific research and experimentation around the concept.
The system was considered to be possible to learn to use by a CS student - not always easily, but it was considered interesting enough to get over some bumps on the learning curve.
The system was considered interesting for experimenting the paradigms it implemented - which isn't possible using traditional development environments.
The system was considered inspiring, and useful for experimenting various things - some of them more related to the project goals than others. Thus we may consider that it isn't fully wasted effort. :)

6 Test evaluation

Problems encountered in testing are:

Deadline for unit testing was scheduled too early.
The test platform implementation was done too sloppily.
Some tests were obviously either implemented too sloppily or had no reasonable implementation to catch all bugs.
Bugs and inconsistencies were found. This may be considered a normal situation. ;)

Practically all tests are performed now (240498). Couple of tests that are not performed or do fail are suc that cause no harm in the actual running system - they are not considered critical in the current project. If project is continued with altered requirements, these tests must naturally be re-evaluated.

Bugs have been found in all units being tested, but probably most of the harm was caused by heap that had fatal bugs that gone undetected, and the messaging system that had a memory leak and some other hard-to-see bugs in it. Considering the project size and schedule, this sort of bugs can pose a major threat. Along with misunderstood nuances of the subsystem semantics these must be considered first-priority targets.

Lack of practically flawless test suites causes the project a moderate pain. It made tracking bugs harder because of incapability of locating all the bugs straight away, but it also saved the project from burden of writing a massive testing system; even with current tests, the test cases can exceed ten thousand rows of code.

Run type	VM instructions/s
Real-world performance (messaging+heap+vm)	250.000
Nonmessaging task (heap+vm)	400.000
Maximum performance (vm)	1.000.000