[comp.lang.modula3] Closures in m3

nr@elan.Princeton.EDU (Norman Ramsey) (02/19/91)

Many of the table, list, et cetera data structure packages make it
possible to enumerate elements of the table.  Typically the procedures
implementing the enumeration accept a REFANY as arguments, and this
REFANY is used as a closure argument.  An example from List:

TYPE
    WalkProc = PROCEDURE (arg, element: REFANY) RAISES {};
        (* The type of procedures for walking over lists. *)

PROCEDURE Walk( l: T; p: WalkProc; arg: REFANY := NIL );
    (*
    Applies p to each of the elements of l in turn.  The only exceptions raised
    are those possibly raised by p. *)

I suggest that it would be more elegant for enumeration to use
closures (as in Thread).

TYPE
  WalkClosure = OBJECT METHODS apply(element:REFANY) END;

PROCEDURE Walk( l: T; p: WalkClosure );


I would like to hear people's comments on this proposal.  If you won't
buy ``more elegant,'' perhaps you'll buy ``typechecked at compile time.''
--
Norman Ramsey
nr@princeton.edu

gnelson (Greg Nelson) (02/19/91)

I would like to second Norman Ramsey's proposal.  Stamp out REFANYs!

Mike_Spreitzer.PARC@xerox.com (02/20/91)

An even simpler approach would be to use procedures without the REFANY
argument, relying on the ability to make local procedures as the mechanism for
binding to other data.  For example, List could be changed to say:

TYPE
	WalkProc = PROCEDURE (element: REFANY) RAISES {};
		(* The type of procedures for walking over lists. *)

PROCEDURE Walk( l: T; p: WalkProc );
    (*
    Applies p to each of the elements of l in turn.  The only exceptions raised
    are those possibly raised by p. *)

A client that sums the element of a list might then look like this:

PROCEDURE SumIt( it: List.T ) : INTEGER =
	VAR sum: INTEGER;
	PROCEDURE AddElt ( element: REFANY )
		= BEGIN (* code to add the element goes here *) END AddElt;
	BEGIN
	sum := 0;
	List.Walk(it, AddElt);
	RETURN sum;
	END SumIt;

Since the calls on enumerations are from procedure bodies already, there is
always a plausible place to put the local procedure declaration.

Designing enumeration interfaces raises the issue of concurrency.  The above
clearly works if the enumeration is sequential and List.Walk does not return
untill all the "callbacks" have been made.  If enumeration is not sequential,
the interface ought to say so (in M3, this can only be done with English
comments), and the client has to prepare for it by doing the necessary
synchronization, if any, in the local procedure.  If List.Walk might return
before all the callbacks have been done, List.Walk will have to also return a
cookie that the client can use to synchronize with the completion of the
enumeration.  All three schemes (extra REFANY parameter, closure objects, local
procedures) could be made to work if such considerations are made [but the
local procedure scheme needs an additional change: the ability to assign local
procedures to procedure variables; this is currently forbidden, but no more
unsafe than dereferencing UNTRACED references --- so I don't see why it
shouldn't be equally possible.]

Proceeding further along the local procedure direction, we get the elegance of
CLU iterators.  Imagine that M3 has a syntax for procedure literals.  For
example, suppose that
	LAMBDA sig = Block
is the syntax for procedure literals.  Then the SumIt could look like this:

PROCEDURE SumIt( it: List.T ) : INTEGER =
	VAR sum: INTEGER;
	BEGIN
	sum := 0;
	List.Walk(it, LAMBDA (element: REFANY) = BEGIN (* add element *) END);
	RETURN sum;
	END SumIt;

Thus, programmers can create their own kinds of looping constructs, and use
only requires a little syntactic boilerplate.  That boilerplate could be
further minimized if the syntax
	Block
were allowed for procedure literals when the context of the literal provides
the type (ie, signature) for the procedure.  With that proviso, SumIt could
look like this:

PROCEDURE SumIt( it: List.T ) : INTEGER =
	VAR sum: INTEGER;
	BEGIN
	sum := 0;
	List.Walk(it, BEGIN (* code to add the element goes here *) END);
	RETURN sum;
	END SumIt;

The latter gets too syntactically fragile for my taste when the Block has
declarations.  But then, declarations before the BEGIN rather than after tastes
inferior to me.  Adding a new kind of bracket to make blocks into procedure
literals is possible, but that's more complex than I'd like to see.

Mike

nr@Princeton.EDU (Norman Ramsey) (02/20/91)

> An even simpler approach would be to use procedures without the REFANY
> argument, relying on the ability to make local procedures as the mechanism for
> binding to other data.

The only problem I have with this proposal is that Modula-3 is a
functional language, so these local procedures can't escape (be
assigned or returned).  If I were to use local procedures for
enumerations, and if I wanted to perform the same enumeration in two
different places, I would have to duplicate the implementation of the
local procedure (or write two different local procedures that call a
third, fetching the arguments from the closure).  Returning an
enumeration procedure would be impossible.

The advantage of using local procedures is that the closure is
allocated on the stack, not the heap, and the language mechanisms do
it automatically.  Using my proposal, the programmer has to declare
the type of the closure and allocate it himself from the heap.

> ...[but the
> local procedure scheme needs an additional change: the ability to assign local
> procedures to procedure variables; this is currently forbidden, but no more
> unsafe than dereferencing UNTRACED references --- so I don't see why it
> shouldn't be equally possible.]

Because the local procedures are allocate on the stack, so their
closures are allocated on the stack.  Assignment to global variables
is like returning procedure values: the function ``escapes,'' which
means that the closure (free variables) must be allocated on the heap,
because a stack-allocated closure would refer to stack locations that
have been reused for something else.  Imagine

> TYPE
> 	WalkProc = PROCEDURE (element: REFANY) RAISES {};
> 		(* The type of procedures for walking over lists. *)
> 
> PROCEDURE Walk( l: T; p: WalkProc );

PROCEDURE NewSum(start:REFANY; add:PROCEDURE(x,y:REFANY):REFANY): WalkProc =
  VAR sum := start;
  PROCEDURE Walk(element:REFANY) RAISES {} =
    BEGIN sum := add(sum,element); END Walk;
  BEGIN
    RETURN Walk;	(* or ``walker := walk; RETURN; '' *)
  END NewSum;
  
Now `Walk' has free variables `sum' and add, but these were allocated
to a stack frame that is reclaimed when NewSum returns.

> PROCEDURE SumIt( it: List.T ) : INTEGER =
> 	VAR sum: INTEGER;
> 	BEGIN
> 	sum := 0;
> 	List.Walk(it, LAMBDA (element: REFANY) = BEGIN (* add element *) END);
> 	RETURN sum;
> 	END SumIt;

This example is OK because the local procedure is passed down the stack.

mjordan@src.dec.com (Mick Jordan) (02/20/91)

> I would like to hear people's comments on this proposal.  If you won't
> buy ``more elegant,'' perhaps you'll buy ``typechecked at compile time.''

'List' is an example of an interface that was simply copied from the M2+
world, hence the REFANY. It is certainly not the best way to do callback
iterators in Modula-3, which are neatly handled by 'object closures' as
you suggest, particularly given that state needed by the WalkProc can be
bundled up as a subtype of WalkClosure. Actually, for simple data structures
I prefer the following style of iterator:

VAR
  someContainerType: SomeContainerType.T;
  containedType: ContainedType.T;
  iter := SomeContainerType.NewIter(someContainerType);
BEGIN
  WHILE SomeType.Next(iter, containedType) DO
    (* process 'containedType' *)
  END;
END;

This has the virtue that you do not need to bundle up state into the
object closure, nor do you need to code a new procedure.  

Finally, note that, because Modula-3 permits nested procedures 
to be passed as arguments, the closure REFANY ('arg') isnt really necessary
since local state can be accessed directly from the nested procedure.
However, not everybody likes nested procedures.

Mick Jordan

Mike_Spreitzer.PARC@xerox.com (02/22/91)

[A mailer claimed this didn't make it to M3 the first try...]

You are right about the first point: local procedures can't escape (given M3 as
it is).  So the closure scheme is more convenient in some cases.  It's less
convenient in others (where the local procedure would work, making the local
procedure involves less overhead than making a whole closure object).  Wouldn't
the local procedure scheme suffice in the vast majority of actual uses?  Note
that if there
were a syntax for objection creation that allowed method literals, the closure
scheme can also approach the elegance of CLU iterators.

In clarification of my comment about assigning local procedures, I meant that I
don't see why it isn't equally as possible as dereferencing UNTRACED references
--- that is, something that you can do in an unsafe module when you know the
referenced locations (whether on the stack or in the heap) are still being used
as you expect.  I was not proposing this as a way to change the fact that local
procedures can't escape.  To do that would require changes that I expect the
language designers will say are too pervasive, expensive, difficult to
implement, or some combination thereof.

Mike

craig@Neon.Stanford.EDU (Craig D. Chambers) (02/23/91)

In article <1991Feb22.150701.15020@uicbert.eecs.uic.edu> wilson@uicbert.eecs.uic.edu (Paul Wilson) writes:
>A moderately smart compiler can detect most cases where a procedure can't
>escape, and compile it the cheap way.  The T and Gambit compilers do this
>for Scheme.  I don't see why it wouldn't work for M3.  (One really obvious
>case is to avoid making a closure for any procedure which is only called,
>without ever creating a pointer to it.  That gets rid of most closures 
>right there.
>
>...
>
>Full closures aren't terribly expensive if you optimize away the obvious
>ones.  Being careless about compiling control structures like generators
>can cause a lot of closure-creation, though, so that heap allocation
>is significantly increased.  Depending on how good a garbage collector
>you've got, that may be a factor.  If you've got a generational gc and
>do the obvious optimizations on closures, though, you should be okay.
>(One thing David Kranz' thesis from Yale showed was that closures don't
>have to cost anything to programs that don't use them.  So you may not
>want to make this stuff unsafe, because the cost of doing it right
>may not be all that much.)

Closures can be optimized away as you describe only if the control
structure procedures that eventually call the closure all get inlined
into the function creating the closure.  Kranz' work on eliminating
closures that are only called applies only to closures defined and
invoked locally (within a single procedure).  This is presumably not
the case with the list iterator example being discussed.

Another optimization described in Kranz' work requires interprocedural
analysis to detect when a procedure never stores one of its arguments
into the heap or passes the argument to a function that might do such
a store, i.e. detected which parameters to functions are only ever
passed down or invoked but never "escape" as Kranz calls it.  For
closures passed in non-escaping argument positions to non-inlined
functions, the compiler can use a cheaper (e.g. stack-allocated)
representation of the closure since the compiler has proven that the
block can never escape upwards.  I don't know how often this analysis
succeeds in being effective.  I'd expect the job to be much harder in
an object-oriented language than a procedural language like Scheme,
since the interprocedural analysis is more complicated.

-- Craig Chambers

craig@Neon.Stanford.EDU (Craig D. Chambers) (02/23/91)

In article <91Feb21.152416pst.16910@alpha.xerox.com> Mike_Spreitzer.PARC@xerox.com writes:
>You are right about the first point: local procedures can't escape (given M3 as
>it is).  So the closure scheme is more convenient in some cases.  It's less
>convenient in others (where the local procedure would work, making the local
>procedure involves less overhead than making a whole closure object).  Wouldn't
>the local procedure scheme suffice in the vast majority of actual uses?  Note
>that if there
>were a syntax for objection creation that allowed method literals, the closure
>scheme can also approach the elegance of CLU iterators.

In fact, full closures (or even downwards-only closures) are *more*
powerful than CLU iterators (if this is what you meant by "elegant").
CLU iterators effectively take a single closure argument (the body of
the for loop).  If closures are first-class, then iterators can take
multiple closure arguments.  In our Self system we have many methods
that take multiple closure arguments.  For example, one iterator
method called "doFirst:Middle:Last:" takes a closure to execute for
the first element of the receiver collection, a different closure for
the interior elements, and a third closure for the last element of the
collection.  This is useful for instance when constructing a string
representation of the collection and needing to insert commas in
between elements; the code can insert a comma before all the elements
except the first by using a different closure for the first element
and the other elements.

One thing that makes closures usable in practice is a concise closure
literal syntax.  Since Self and Smalltalk both rely on closures and
messages for all control structures, even ifs and whiles, this concise
syntax had to be developed just to allow the language to be used at
all.  Without it, I wouldn't expect closures to receive much use.
This lack of concise syntax in Pascal-like languages has probably been
a major hurdle to widespread use of closures.

-- Craig Chambers