[comp.lang.ada] Instantiation of a generic with a p

stt@inmet.inmet.com (12/27/89)

With regard to instantiations inside a procedure,
with an actual subprogram parameter being a local procedure as well...

The Ada Reference Manual very properly avoids descriptions
of implementation approach, since there are many ways to implement
any feature, with tradeoffs of compile-time complexity versus
run-time speed, space versus time, etc.

Nevertheless, I can describe the strategy employed by various
Ada compilers with which I am familiar:

The Intermetrics compiler passes a "static link" when calling
a nested subprogram.  The static link is passed in a particular
register (depends on the target, obviously), and points to
the stack frame for the lexically enclosing subprogram.
The static link may be used by the called procedure to 
gain access to the enclosing subprogram's local variables
and parameters, as well as its static link pointing to the
next enclosing subprogram (if any).  I am sure other compilers
use this method as well.  A possible optimization is to
determine whether the called subprogram makes any use of
the static link (only possible if it is not a separately compiled
subunit), and suppress passing the link if appropriate.

The Verdix compiler typically uses a "local display" for handling
nested subprograms.  Each subprogram maintains a table of pointers
to lexically enclosing stack frames.  The table is at a known
offset within the stack frame, and the nested subprogram
builds its own by copying its caller's local display and augmenting
it with a pointer to its own stack frame.  This moves the run-time cost
to the start of a subprogram with nesting, rather than at
the call point and at the point of up-level references.
I think some versions of the Verdix compiler also support static
links.

Another dimension of difference between compilers has to do
with the implementation of generics.  For the Intermetrics compiler,
generics are implemented strictly as a macro expansion.  The body
of the generic is expanded at the point of the instantiation, with
the actuals substituted for the uses of the formals.
Certain other compilers will share the code between generic
instantiations, though generally only under restricted circumstances
where the instantiations are similar "enough."  
Both the newer versions of the Verdix compiler and the Dec Vax compiler
provide some support for "generic sharing."
The Rational compiler shares generics universally, as far as I know,
taking advantage of their descriptor-oriented/object-oriented
hardware architecture.

When generics are shared, there is generally an instantiation descriptor
created as part of the instantiation, which can involve some amount
of overhead both at its creation and at each use of the actual
generic parameters (since they are fetched from the instantiation
descriptor rather than being "inlined" in the macro expansion).

Another source of overhead associated with nested subprograms
involves the access-before-elaboration check.  Generally,
each subprogram spec will have an associated elaboration bit
which will be cleared when the spec is encountered, and set
when the body is encountered.  This bit will be checked
at each call to determine whether the body is being referenced
before it has been elaborated.  If there is no separate
spec for the subprogram then the elaboration check
can be eliminated (though some compilers still do it
for simplicity/uniformity's sake).  Even if there is a separate
spec, if there is no "interesting" code between the
spec and the body, the check can still be eliminated.
Your nested subprogram
didn't have a separate spec so there should be no need for
any elaboration-check overhead.  The global subprogram would
only need an elaboration-check if the spec were separately compiled.

Furthermore, there is an elaboration check associated with
a generic instantiation.  This may or may not be present, again
depending on separate compilation and compiler optimization details.

Anyway, the possibilities go on and on.  If you are really concerned
about performance, you may have to request an assembly listing
of the result and do the instruction cycle counts by hand.
pragma Suppress(elaboration_check) may be useful (there are some
who feel it ought to be the default!-)).

I hope this helps (at least it should make you feel sorry
for Ada compiler implementors)...

S. Tucker Taft
Intermetrics, Inc.
Cambridge, MA  02138