[net.lang.ada] Compiler Optimizations of Ada Tasking

sdl@MITRE-BEDFORD.ARPA (Litvintchouk) (08/10/86)
Some time ago, I sent a query to this mailing list regarding available
compiler optimizations of Ada tasking.  Following is a summary of the
replies I received.


[From Ed Falis, speaking in "unofficial" capacity re: Alsys compilers]
For rendezvous, both the 68K and 8086 based compilers do not perform any
context switching when an accept statement has an empty body.

Additionally, the 68K based compiler (Sun , Apollo) use the "order of
arrival" scheme that Dave Stevenson (at Rational?)  developed several
years ago.  In this scheme, the later of the two tasks to request the
rendezvous executes the accept body code in its own stack.  Assuming an
even distribution of caller / acceptor arrival at the request point,
there's a 25% reduction in context switches under this scheme.  (Note
that the Habermann Nassi optimization does not reduce the number of
context switches since it is exactly symmetrical to the "naive"
implementation).  However, HN does facilitate more sophisticated
optimizations like task thread elimination (ie turning a task into a
monitor package), because the default mechanism is to have the caller
execute the accept body code.  It can also simplify the implemnetation
of interupt entries for the same reason.

Another issue of interest is related to storage reclamation of task
stacks and heap objects whose lifetime is associated with a frame
internal to a task body.  Rather than scheduling a terminating task to
perform its deallocations, we allow the task which triggers passage from
completion to termination to perform the cleanups for other tasks.  This
reduces context switching at termination, and becomes particularly
noticeable when there are cascades of task selecting terminate
alternative of selective wait statements.


[From Tom Quiggle, re: Telesoft compilers]
At TeleSoft we are implementing a number of tasking optimizations.  Some
of them are supported by our current Second-Generation Ada compiler, and
some will be implemented in future releases.

The optimizations we currently have in place are all interrupt related:

  * If the accept for an interrupt entry (LRM 13.5.1:1) contains no 
    sequence of statements, we will defer the execution of the interrupt 
    task to the next synchronization point; thus eliminating two context
    shifts.  This optimization is applicable to situations where data is
    arriving at a rate less than the longest expected sequential code
    sequence in a task (e.g. a keyboard driver or a monitoring device 
    generating samples a few times a second).  This optimization is 
    automatically applied.

  * If the accept for an interrupt entry contains no scheduling events
    (nested accepts, entry calls, delays), the body of the interrupt 
    entry may be executed in the context of the task which was executing 
    when the interrupt occurred.  The compiler will generate code for 
    such an interrupt entry as a subprogram which will be called at the
    point of interruption.  The interrupt task (i.e. the task containing 
    the interrupt entry) must still be scheduled to allow it to execute the 
    sequence of statements following the accept, but the interrupt latency 
    is greatly reduced.  With this optimization, the overhead of getting to 
    the user's interrupt handler is reduced to a few instructions.  The 
    expensive full context shift is deferred to after executing the 
    (presumably time critical) accept body for the interrupt entry.  This
    optimization is triggered by a pragma.

  * For accept statements for interrupt entries of the form:

    while <boolean-expression> loop
      accept interrupt_entry do
	sequence_of_statements; -- must not contain scheduling events.
      end interrupt_entry;
    end loop;
    more_statements;

    we can apply the same optimization as mentioned above, but avoid the
    expense of a context shift after the interrupt accept body as long as 
    the precondition holds true.  In this case, the compiler will again
    generate the code for the interrupt entry's accept body as a subprogram,
    but will fold in the evaluation of the precondition, and return this 
    information to the run-time system.  The run-time will only effect a 
    context shift to the interrupt task when the precondition is no longer 
    met.

    This construct occurs most frequently in the form:

    loop
      while not buffer_full loop
	accept interrupt_entry do
	  buffer(interrupt_related_data);
	end interrupt_entry;
      end loop;
      accept flush_buffer(consumer_buff : out buffer_type) do
	consumer_buff := local_buffer;
      end flush_buffer;
    end loop

    in which case there is only a context shift to the interrupt task when
    the buffer fills.  When the buffer does fill, the task executing at
    the point of the interrupt will be preempted, and the interrupt task
    will be allowed to execute down to the accept to flush the buffer.
    In the absence of a user-specified priority, the implementation can
    also raise the priority of this task to insure that it gets back to
    the accept for the interrupt entry without scheduling intervening tasks
    (excluding another interrupt at a higher priority).  This optimization
    is triggered by a pragma.

Optimizations we intend to implement over the next year or so (my marketing
guys will kill me if I suggest any specific dates) include:

  * Monitor tasks: Tasks intended to provide exclusive access to shared
    data need not generate separate, schedulable, entities. Such tasks 
    may be replaced with a simple semaphore-like monitor, and entry calls
    to these tasks become P/V operations surrounding subprogram calls.

  * Messingers tasks: Dynamically allocated tasks whose sole purpose is
    to pass data from a producer to a consumer in a non-blocking fashion
    can also be optimized away.  Again, no separate schedulable entity
    need be created.  Instead, a "special" task control block containing
    a pointer to a data block is placed on the consumer's entry queue.
    When the consumer accepts the entry call, it merely dequeues the
    special TCB, dereferences the data pointer, and returns the TCB
    to a pool of available messinger tasks.

Both of these optimizations will probably require a pragma, and may 
place some restrictions on tasks using these optimizations.  



Finally, I know of at least one compiler that will implement the
Habermann-Nassi optimization:  the Intermetrics compiler.



Steven Litvintchouk
MITRE Corporation
Burlington Road
Bedford, MA  01730
(617)271-7753

ARPA:  sdl@mitre-bedford
UUCP:  ...{cbosgd,decvax,genrad,ll-xn,philabs,security,utzoo}!linus!sdl

(all the usual disclaimers about all this not representing MITRE
policy go here.)