                                                                Craig Kadziolka
                                                                     March 2000

                        Runtime Engine Implementation


Overview

This document describes the implementation of the runtime engine, including
parts which were needed to be redesigned to accommodate an actual
implementation.

1.Abstract

The runtime engine consists of four parts, namely a threading mechanism, 
an event mechanism, a network manager, and a garbage collector.

The intention behind having all these elements as part of the same runtime
rather than as separate objects is so that they can all work together more
efficiently than if they were forced to use the common interfaces provided
for each module.

The goal of the implementation of the runtime engine is for it to be less 
than twenty kilobyes.  To achieve this ends the runtime must be implemented 
in C and assembly.

The initial implementation is for the linux operating system running on an 
Intel cpu, although another goal is for it to be as easy to port to other 
platforms as possible.


2. Table Of Contents

	Overview.........................................................1
	1. Abstract .....................................................1
	2. Table Of Contents.............................................1
        3. Need for this runtime engine..................................1

	Currently Implemented Features...................................x

	Expected Timeline................................................x

	Multi-Threading..................................................x
	
	Event Mechanisim.................................................x

	Implementation Testing...........................................x


3. Need for this runtime engine

This runtime engine is envisaged to be small enough to be embeddable, and
very fast and efficient.  Currently there are high runtime overheads in most
object orientated languages, and Sather is no exception.  Although the
runtime engine is not specifically designed for any one language, it will
initially be a replacement for the current Sather runtime engine which
consists of several different parts, which is quite large.


Currently Implemented Features

Currently the multi-threading mechanisim and the event handling mechanisim 
are implemented under the linux platform for Intel architecture.  The 
Nasm assembler is required for compilation of the assembly routines.

The network manager is partially implemented, which should hopefully be
completed within the next few weeks.


Expected Timeline


Multi-Threading				Late January 2000

Event Manager				Early March 2000

Network Manager				Mid April 2000

Garbage Collector			Mid July 2000

Testing					Mid August 2000


Currently all deliverables have met their associated deadlines, and I am 
headed for the planned completion date of mid august, all things going to
plan.


Multi-Threading

A thread is a light weight process.  It maintains its owns stack and a small
amount of space for storing the context of the processor. But the difference
between a runtime engine thread and a operating system process is that
processes have their own address space, where as threads maintain the address
space of their parents.  Another difference between the two is that when a
context switch occurs with a process an entry to the operating system is
required but with a thread no entry to the operating system is needed.  This
gives threads an advantage in speed over that of operating system processes.

The runtime engine offers a co-operative multitasking environment in which
the client code may create, destroy and suspend threads.  The reason for it
being co-operative instead of preemptive is that properly designed programs
can better effect threading in a co-operative environment, mainly because
only they know when they dont need the cpu.  preemptive timeslices may
interrupt the program when they need the cpu most, degrading performace in
some circumstances with unnecessary context switching and scheduling.

Some of the first routines to be written were the save and load context
functions.  As their names suggest they saved and loaded the context of the
cpu into their respective save areas.  The scheduler followed, which was
written in C, called from the end of save context and at the end of the
scheduler it passes the save area of the thread to be loaded to the load
context function.  Both the context switching functions were written in Intel
assembly language, both requiring various interfaces to the C language.

The scheduler implements a priority based system, which has a queue of
threads for each priority level, with the lowest level priority queue
containing the idle thread, and a "default" priority level which is where the
boot thread is placed after booting into a multi-threading environment.

The method for initializing the multi-threading environment is a matter which
requires some discussion.  One of the first things the initialization process
must do is setup the thread queues.  This involves nulling all the head and
tail pointers for each priority level.  Then the thread of execution must be
modelled in our threading system as a thread just like any other.  This is
handled by allocateing a node at the default priority level and the running a
save context.  By the nature of save context it will then save the context of
the processor into the save area allocated, and then it will schedule the
next thread for execution. Since there is only one thread at this stage the
scheduler schedules the main thread as the next one to execute, and its
context is then loaded, which simply returns from the save context function.
Once the boot thread has been represented by the threading system, the next
task is to create the idle thread.  At this stage our threading queues are
compliant with the constraints required by our threading functions, so it is
safe to call create_thread with the idle function as the parameter.

This raises another question, that of finding the currently active
thread. The way I handle this is to insert a double word entry in the stack
immediately above the null frame which points to the save area for that
thread.  This way I can know which thread is executing when its interrupted
or when it wants to suspend to let another thread use the processor. This
approach is fine for threads that our multi-threading system creates but it
some work is required for the boot thread with a system allocated stack.  To be
able to treat the boot thread just like any other thread, which is essential
to do, I needed to modify the boot threads stack based on the way I find the
threads save area.

According to C calling convention when a C function is called the frame
pointer is pushed to the stack and then is set to the current stack pointer.
Then on leaving a function the frame pointer is reloaded to its old value and
the stack pointer is reset, before returning to the return address which is
located on the stack.  This effectively forms a linked list up the stack,
which ends when the "null frame" is reached.  The "null frame" is merely a
double word value which is set to zero.  To locate the save area of the
currently executing thread you merely trace up the linked list stopping when
you hit the zero value.  This function was origionally a loop of four
instructions but I found if i unrolled the first iteration of the loop the I
could cut it back to a loop of three instructions.  This function was also
written in assembly as it needed to be fast.

Based on this method of finding the save area of the thread thats executing
I then needed a way of inserting the pointer to the save area on the stack
immediately above the null frame. The simplest answer would be to trace
back to the null frame and then overwrite the value before it with the
pointer.  This is not a feasible option as the memory before the null frame
could be in use by another process, or possibly for another purpose by the
operating system.  The solution I chose was to allocate a data structure
which was two double words long.  The double word at the lower address value
was then set to zero to replicate a null frame, and the double word at the
higher address value was set to point the the save area for the boot
thread. Once this was done, it then scans up the linked list of frame pointers
until it reaches the null frame, and it then replaces the null frame with a
pointer to the lower address of the data structure. This allows us to treat
the main thread as any other thread that we have created, which is highly
desirable in terms of not requiring multiple conditions in the
multi-threading functions.

The creation of threads is handled by the create thread function.  It takes
the priority of the thread and the starting point of the thread as its
parameters.  It also takes another parameter, which is redundant under the
platform which this implementation is for, and that is which processor the
thread will be required to run on.

Creation of a thread is a fairly straight-forward process.  To be as
efficient as possible, only one block of memory is allocated when the thread
is created.  This block of memory will hold the threads stack, its save area, and its
information node. Then the threads information node is linked into one of the
priority lists as specified by the priority argument that was passed to the
create call.  Then save area is then setup with the default values.  The
stack for the thread is then set up with a null frame, a return address which
is simply the address of the destructor, and the save area pointer above the
return address. Once the stack is set up and the links to the stack are set
up in the save area the thread has been created.

Destruction of threads occur then the function that is the thread returns.
Because we set the stack up the return address that sits on the stack is the
destructor routine.  This routine finds the currently executing thread, and
then removes it from the thread queue, and queues it in the destruction
queue.  Then the destrutor returns, makeing it a fast routine, not slowing
down to do housekeeping.  Then, when we are idle, or when we have slowed down
to create new threads, we do the true destruction of the threads which
involves freeing the memory which was allocated for the stacks, once we have
checked to see that there are no handlers registered to events from within that 
thread.

Multi-Threading Interface Functions

The following functions are offered to client code wishing to use
the multi-threading system.


  Initialize_Threads
  
  This function must be called before any of the other threading
  functions are called. It 'boots' the cpu up to the threading
  mechanisim. This function will be removed from the list of 
  interface functions and become part of the runtime initialization.
  
void Initialize_Threads();


  Create_Thread

  This function creates a new threads, in the address space of the current
  system. The priority defines which order the threads are run, eg higher
  priorities are always run before lower priority threads. The start point
  is the address of a function which the thread will start on.

Handle Create_Thread(int priority, int processor, 
		     void (*StartPoint) (void));


  Suspend

  This function suspends the current thread, allowing for other threads
  to use the cpu. If there are no other more important threads then it
  will simply return to the currently executing thread.

void Suspend();


Event Manager

The event manager is a system by which real operating system / hardware
generated events are translated into virtual runtime events for handling by
whatever handler the program whats to use.

The event system uses a vector based approach allowing multiple handlers to
be attached to any given event.  When the event is signalled the list of
handlers for that event is traversed until such a time as one of the handlers
sucessfully handles it which is indicated with its return value.

If no handler wishes to handle the event then it falls through to the default
handler which displays an appropriate message to the user and then depending
on the severity of the event it will perform one of a range of actions which 
might be either exiting, or continuing execution. 

The event manager allows for client code to register handlers for events and
to remove handlers also.  It also includes a method for registering handlers
for operating system / hardware caused events, which takes the form of a
mapping between an enumeration of the real events with their virtual
counterparts.

The event system allows for software to generate events, in which case they
are handled in much the same mannor as hardware events.  This is a very
useful feature especially within the runtime engine.

The first thing the event handler must do when an event occurs is save the
context which was previously executing into its appropriate save area.  It
just so happens that linux does this already when a signal occurs, and so it
is inefficient to do this twice, but under any system which does not save
context on an event for you, it is important that a save context occurs
as the first thing after the event.

After the event manager has saved context it then proceeds to invoke the
handlers as stated above.  Each one is given a chance to handle the event
until one of them does.  After this is done, a load context is required to
return to the state in which the program was in before the event occurred.
Once again if the operating system does this for you, as is the case in
linux, then there is no need to repeat this action.

With this comes the idea of atomic actions, with regard to context switching
and important code.  It is very important for context switching routines to
be atomic, and failing that if you are working within the boundarys of a
protected mode operating system such as linux then you need to make it as
atomic as possible.  This involves disabling all signals for the duration of
the context switchs.  If you are running on a bare processor without an
operating system then it is important to make the functions atomic.

Event Manager Interface Functions


  initialise_events

  This does any initialisation needed by the cross platform
  part of the event manager.

void initialise_events();


  add_event

  This adds an event to the event list.

  The routine checks the EventID * to make sure it is not NULL
  and returns a NullPtrErr if it is.

  On return if the error code is NoErr then the EventID value
  will hold a valid ID for the new event which can be used to
  delete, cause or allocate handlers for events of this kind. 

enum SysErr add_event(EventID *);

  
  remove_event

  This removes and event from the event list.

  The routine checks that the EventID is an existing event in the
  list and if it isn't, it will return a NotAnEvent error. If for
  some reason it can't delete an Event it will return a CannotDelete
  error. If a NoErr is returned then the event was successfully
  deleted.

enum SysErr remove_event(EventID);


  cause_event

  This function causes an event.

  The routine checks that the Event given is a valid event in
  the event list and returns a NotAnEvent error if it doesn't.

  If a NoErr is returned then the event was valid and any
  handlers for the event will have been executed.

enum SysErr cause_event(struct Event, int, void **);


  add_handler

  This sets a specific handler to be executed when a certain
  event occurs.

  The routine checks to see if the Event given by the EventID is
  valid and returns NotAnEvent if it isn't. If the Handler is NULL,
  a NullPtrErr is returned. If NoErr is returned then the handler
  was successfully added to the list of the specified event.

enum SysErr add_handler(Object, EventID, Handler);


  remove_handler

  This removes a Handler.

  If the event of type EventID doesn't exist then a NotAnEvent
  error is returned. If the Handler * is NULL, a NullPtrErr is
  returned. If the Handler * doesn't point to an existing
  handler then a NotAHandler error is returned.
  If NoErr is returned then the handler was successfully removed.

enum SysErr remove_handler(EventID, Handler);


  get_handler

  This routine lets you get a pointer to a Handler.

  If the event specified by the EventID doesn't exist then a
  NotAnEvent error is returned and the Handler * will be
  untouched. Please note that if NoErr is returned then the
  Handler * will hold a pojnter to the first handler in
  the handler list for the specified event or *NULL* if
  there are no handlers.

enum SysErr get_handler(Handler *, EventID);


Network Manager

The network manager is in charge of sending messages between the nodes of a
distributed system.  It runs on the UDP protocol and uses two sockets, one
for sending packets and another for recieving packets. By using two sockets
the action is full duplex, and can send and recieve at the same time.

I selected UDP as the communication protocol because it is much faster than
TCP, and in todays networking environment the overhead of TCP is not
necessary in this instance.  If large packet loss and corruption were
expected then a switch to TCP would be required, but currently UDP is the
best choice of protocol.

A result of using UDP as the communications protocol is that all messages
must be limited in size to the size of a packet less the room occupied by the
packets header.

The biggest difficulty with the network manager is the idea that each
computer in the network may have any number of clients running on it, each
using the networking system.  There are several mappings required, the first
of which is the map of computers to computer handles which are used in the
sending of messages.

The approach I decided to take is that each machine keeps a table of local
client numbers that are currently used.  This way, if a client wishes to have
the capability to recieve messages then it must first register its intentions
to do so with the runtime engine, and that way the runtime engine can give it
the event which it should register a handler for, for when a message is
recieved.
 
This means that when you register a remote destination to send a message to,
you specify the machine which goes into a table of remote systems, and you
specify the client on that machine at send time.

This is similar to the idea of ports on any networked machine.  There will
exist familar ports on which familar clients are run, such as the garbage
collector, multi-threading system, event mechanism, and so on.  This requires
that the local client when registering that it wants to use the messaging
system, needs to have the ability to specify its own client id, and the
ability to call a function to retrieve the next available local id.

Network Manager Interface Functions

  register_computer

    This function converts a system name to a processor
    handle, which speeds up use in the network interface.

  Arguments:   
    name - the system name for the computer.

  Returns:
    Computer_Handle - A usable handle 

Computer_Handle register_computer(char * name);


  send_message

    This function sends the specified message to the specified
    processing node's client.

  Arguments:
    dest - the processor that the message is destined
    client - the client part of the runtime operating on the dest port.
    message - the physical message to send
    length - the length of the message
    
  Returns:
    Void.

void send_message(Computer_Handle dest, enum ClientID Client,
	          struct Message *sendme);

  register_local_client

    This function registers a local client as a participant in the
    networking system.

  Arguments:
    desired_id - the desired client id.
    msg_ready - this argument is filled with the event id of the 
                event that will occur when data is ready for this
                client.

  Returns:
    If there are too many clients then a MaxClientsReached error
    will be returned, if the current client id is in use then 
    a ClientIDInUse error will be returned, otherwise a NoErr is
    returned.
    
enum SysErr register_local_Client(ClientID desired_id, 
				  EventID * msg_ready);


Garbage Collector

A garbage collector is an entity which manages the memory system of a
program, and by identifing dead objects and freeing them for the system it
eliminates memory leaks in programs.  The defination of a dead object is one
which cannot be reached from a collection of "root sets", which are usually
defined as consisting of the global variables, local variables, the
activation stack, and registers used by active procedures.

As this section of the runtime engine has not yet been considered fully, all
that can be said about it is that it will be based on bakers treadmills which
has a treadmill for each size of object, and will be asynchronously
collecting garbage from within a runtime engine thread while the program is
running.

It uses a technique called incremental garbage collection, which means that
it will collect garbage in chunks, rather than collecting all the garbage in
one go.

Implementation Testing

In the field of systems programming one of the most important requirements of 
software is that it dont not, under any circumstances, fall over.

The initial testing of the runtime engine was performed with a simple test
program to detect the first ninty percent of program faults.  After it
appeared to be functioning properly, I used the stress-tests to attempt to
force it to fall over.

There were several stress tests for the multi-threading mechanisim. The first
was to test the creation and destruct of threads, which when ran created 
30000 threads and then switched context between them all five times before 
detroying them all, and then finishing.  Once the runtime passed this test I
tested the threading system for memory leaks.  This was done by a program
which when ran created 1000 threads, and then destroyed them, which was then
repeated forever.  I used top to check the memory usage after imediately 
running the program.  I then left the program running for three hours before 
returning.  On checking the memory useage after three hours it had remained
constant which would indicate no signs of a memory leak.  

Once the event mechanisim was ready to be integrated into the multi-threading 
mechanisim I modified the first stress-test for the threading slightly to
test the event system.  I did this by increasing the amount of context 
switching so that it would execute for long enough to test wheither the event
system was thread-safe.  I then created a shell script under linux which
caused every different type of catchable event within the test program. 
This resulted in the program being bombarded with 1200 signals at the same 
time. 

This test of the event mechainsim resulted in strange errors to occur within
the test program.  Eventually I discovered that the context for the process
in linux when a signal occurs is stored on the current stack, and so after
several events occuring the threads stacks which were only 2k at the time 
were overflowing.

