nr@elan.Princeton.EDU (Norman Ramsey) (02/19/91)
Many of the table, list, et cetera data structure packages make it possible to enumerate elements of the table. Typically the procedures implementing the enumeration accept a REFANY as arguments, and this REFANY is used as a closure argument. An example from List: TYPE WalkProc = PROCEDURE (arg, element: REFANY) RAISES {}; (* The type of procedures for walking over lists. *) PROCEDURE Walk( l: T; p: WalkProc; arg: REFANY := NIL ); (* Applies p to each of the elements of l in turn. The only exceptions raised are those possibly raised by p. *) I suggest that it would be more elegant for enumeration to use closures (as in Thread). TYPE WalkClosure = OBJECT METHODS apply(element:REFANY) END; PROCEDURE Walk( l: T; p: WalkClosure ); I would like to hear people's comments on this proposal. If you won't buy ``more elegant,'' perhaps you'll buy ``typechecked at compile time.'' -- Norman Ramsey nr@princeton.edu
gnelson (Greg Nelson) (02/19/91)
I would like to second Norman Ramsey's proposal. Stamp out REFANYs!
Mike_Spreitzer.PARC@xerox.com (02/20/91)
An even simpler approach would be to use procedures without the REFANY argument, relying on the ability to make local procedures as the mechanism for binding to other data. For example, List could be changed to say: TYPE WalkProc = PROCEDURE (element: REFANY) RAISES {}; (* The type of procedures for walking over lists. *) PROCEDURE Walk( l: T; p: WalkProc ); (* Applies p to each of the elements of l in turn. The only exceptions raised are those possibly raised by p. *) A client that sums the element of a list might then look like this: PROCEDURE SumIt( it: List.T ) : INTEGER = VAR sum: INTEGER; PROCEDURE AddElt ( element: REFANY ) = BEGIN (* code to add the element goes here *) END AddElt; BEGIN sum := 0; List.Walk(it, AddElt); RETURN sum; END SumIt; Since the calls on enumerations are from procedure bodies already, there is always a plausible place to put the local procedure declaration. Designing enumeration interfaces raises the issue of concurrency. The above clearly works if the enumeration is sequential and List.Walk does not return untill all the "callbacks" have been made. If enumeration is not sequential, the interface ought to say so (in M3, this can only be done with English comments), and the client has to prepare for it by doing the necessary synchronization, if any, in the local procedure. If List.Walk might return before all the callbacks have been done, List.Walk will have to also return a cookie that the client can use to synchronize with the completion of the enumeration. All three schemes (extra REFANY parameter, closure objects, local procedures) could be made to work if such considerations are made [but the local procedure scheme needs an additional change: the ability to assign local procedures to procedure variables; this is currently forbidden, but no more unsafe than dereferencing UNTRACED references --- so I don't see why it shouldn't be equally possible.] Proceeding further along the local procedure direction, we get the elegance of CLU iterators. Imagine that M3 has a syntax for procedure literals. For example, suppose that LAMBDA sig = Block is the syntax for procedure literals. Then the SumIt could look like this: PROCEDURE SumIt( it: List.T ) : INTEGER = VAR sum: INTEGER; BEGIN sum := 0; List.Walk(it, LAMBDA (element: REFANY) = BEGIN (* add element *) END); RETURN sum; END SumIt; Thus, programmers can create their own kinds of looping constructs, and use only requires a little syntactic boilerplate. That boilerplate could be further minimized if the syntax Block were allowed for procedure literals when the context of the literal provides the type (ie, signature) for the procedure. With that proviso, SumIt could look like this: PROCEDURE SumIt( it: List.T ) : INTEGER = VAR sum: INTEGER; BEGIN sum := 0; List.Walk(it, BEGIN (* code to add the element goes here *) END); RETURN sum; END SumIt; The latter gets too syntactically fragile for my taste when the Block has declarations. But then, declarations before the BEGIN rather than after tastes inferior to me. Adding a new kind of bracket to make blocks into procedure literals is possible, but that's more complex than I'd like to see. Mike
nr@Princeton.EDU (Norman Ramsey) (02/20/91)
> An even simpler approach would be to use procedures without the REFANY > argument, relying on the ability to make local procedures as the mechanism for > binding to other data. The only problem I have with this proposal is that Modula-3 is a functional language, so these local procedures can't escape (be assigned or returned). If I were to use local procedures for enumerations, and if I wanted to perform the same enumeration in two different places, I would have to duplicate the implementation of the local procedure (or write two different local procedures that call a third, fetching the arguments from the closure). Returning an enumeration procedure would be impossible. The advantage of using local procedures is that the closure is allocated on the stack, not the heap, and the language mechanisms do it automatically. Using my proposal, the programmer has to declare the type of the closure and allocate it himself from the heap. > ...[but the > local procedure scheme needs an additional change: the ability to assign local > procedures to procedure variables; this is currently forbidden, but no more > unsafe than dereferencing UNTRACED references --- so I don't see why it > shouldn't be equally possible.] Because the local procedures are allocate on the stack, so their closures are allocated on the stack. Assignment to global variables is like returning procedure values: the function ``escapes,'' which means that the closure (free variables) must be allocated on the heap, because a stack-allocated closure would refer to stack locations that have been reused for something else. Imagine > TYPE > WalkProc = PROCEDURE (element: REFANY) RAISES {}; > (* The type of procedures for walking over lists. *) > > PROCEDURE Walk( l: T; p: WalkProc ); PROCEDURE NewSum(start:REFANY; add:PROCEDURE(x,y:REFANY):REFANY): WalkProc = VAR sum := start; PROCEDURE Walk(element:REFANY) RAISES {} = BEGIN sum := add(sum,element); END Walk; BEGIN RETURN Walk; (* or ``walker := walk; RETURN; '' *) END NewSum; Now `Walk' has free variables `sum' and add, but these were allocated to a stack frame that is reclaimed when NewSum returns. > PROCEDURE SumIt( it: List.T ) : INTEGER = > VAR sum: INTEGER; > BEGIN > sum := 0; > List.Walk(it, LAMBDA (element: REFANY) = BEGIN (* add element *) END); > RETURN sum; > END SumIt; This example is OK because the local procedure is passed down the stack.
mjordan@src.dec.com (Mick Jordan) (02/20/91)
> I would like to hear people's comments on this proposal. If you won't > buy ``more elegant,'' perhaps you'll buy ``typechecked at compile time.'' 'List' is an example of an interface that was simply copied from the M2+ world, hence the REFANY. It is certainly not the best way to do callback iterators in Modula-3, which are neatly handled by 'object closures' as you suggest, particularly given that state needed by the WalkProc can be bundled up as a subtype of WalkClosure. Actually, for simple data structures I prefer the following style of iterator: VAR someContainerType: SomeContainerType.T; containedType: ContainedType.T; iter := SomeContainerType.NewIter(someContainerType); BEGIN WHILE SomeType.Next(iter, containedType) DO (* process 'containedType' *) END; END; This has the virtue that you do not need to bundle up state into the object closure, nor do you need to code a new procedure. Finally, note that, because Modula-3 permits nested procedures to be passed as arguments, the closure REFANY ('arg') isnt really necessary since local state can be accessed directly from the nested procedure. However, not everybody likes nested procedures. Mick Jordan
Mike_Spreitzer.PARC@xerox.com (02/22/91)
[A mailer claimed this didn't make it to M3 the first try...] You are right about the first point: local procedures can't escape (given M3 as it is). So the closure scheme is more convenient in some cases. It's less convenient in others (where the local procedure would work, making the local procedure involves less overhead than making a whole closure object). Wouldn't the local procedure scheme suffice in the vast majority of actual uses? Note that if there were a syntax for objection creation that allowed method literals, the closure scheme can also approach the elegance of CLU iterators. In clarification of my comment about assigning local procedures, I meant that I don't see why it isn't equally as possible as dereferencing UNTRACED references --- that is, something that you can do in an unsafe module when you know the referenced locations (whether on the stack or in the heap) are still being used as you expect. I was not proposing this as a way to change the fact that local procedures can't escape. To do that would require changes that I expect the language designers will say are too pervasive, expensive, difficult to implement, or some combination thereof. Mike
craig@Neon.Stanford.EDU (Craig D. Chambers) (02/23/91)
In article <1991Feb22.150701.15020@uicbert.eecs.uic.edu> wilson@uicbert.eecs.uic.edu (Paul Wilson) writes: >A moderately smart compiler can detect most cases where a procedure can't >escape, and compile it the cheap way. The T and Gambit compilers do this >for Scheme. I don't see why it wouldn't work for M3. (One really obvious >case is to avoid making a closure for any procedure which is only called, >without ever creating a pointer to it. That gets rid of most closures >right there. > >... > >Full closures aren't terribly expensive if you optimize away the obvious >ones. Being careless about compiling control structures like generators >can cause a lot of closure-creation, though, so that heap allocation >is significantly increased. Depending on how good a garbage collector >you've got, that may be a factor. If you've got a generational gc and >do the obvious optimizations on closures, though, you should be okay. >(One thing David Kranz' thesis from Yale showed was that closures don't >have to cost anything to programs that don't use them. So you may not >want to make this stuff unsafe, because the cost of doing it right >may not be all that much.) Closures can be optimized away as you describe only if the control structure procedures that eventually call the closure all get inlined into the function creating the closure. Kranz' work on eliminating closures that are only called applies only to closures defined and invoked locally (within a single procedure). This is presumably not the case with the list iterator example being discussed. Another optimization described in Kranz' work requires interprocedural analysis to detect when a procedure never stores one of its arguments into the heap or passes the argument to a function that might do such a store, i.e. detected which parameters to functions are only ever passed down or invoked but never "escape" as Kranz calls it. For closures passed in non-escaping argument positions to non-inlined functions, the compiler can use a cheaper (e.g. stack-allocated) representation of the closure since the compiler has proven that the block can never escape upwards. I don't know how often this analysis succeeds in being effective. I'd expect the job to be much harder in an object-oriented language than a procedural language like Scheme, since the interprocedural analysis is more complicated. -- Craig Chambers
craig@Neon.Stanford.EDU (Craig D. Chambers) (02/23/91)
In article <91Feb21.152416pst.16910@alpha.xerox.com> Mike_Spreitzer.PARC@xerox.com writes: >You are right about the first point: local procedures can't escape (given M3 as >it is). So the closure scheme is more convenient in some cases. It's less >convenient in others (where the local procedure would work, making the local >procedure involves less overhead than making a whole closure object). Wouldn't >the local procedure scheme suffice in the vast majority of actual uses? Note >that if there >were a syntax for objection creation that allowed method literals, the closure >scheme can also approach the elegance of CLU iterators. In fact, full closures (or even downwards-only closures) are *more* powerful than CLU iterators (if this is what you meant by "elegant"). CLU iterators effectively take a single closure argument (the body of the for loop). If closures are first-class, then iterators can take multiple closure arguments. In our Self system we have many methods that take multiple closure arguments. For example, one iterator method called "doFirst:Middle:Last:" takes a closure to execute for the first element of the receiver collection, a different closure for the interior elements, and a third closure for the last element of the collection. This is useful for instance when constructing a string representation of the collection and needing to insert commas in between elements; the code can insert a comma before all the elements except the first by using a different closure for the first element and the other elements. One thing that makes closures usable in practice is a concise closure literal syntax. Since Self and Smalltalk both rely on closures and messages for all control structures, even ifs and whiles, this concise syntax had to be developed just to allow the language to be used at all. Without it, I wouldn't expect closures to receive much use. This lack of concise syntax in Pascal-like languages has probably been a major hurdle to widespread use of closures. -- Craig Chambers