rich@rexago1.UUCP (K. Richard ) (02/05/86)
I often find myself creating highly localized register variables in an attempt to synthesize common sub expression optimization: { register char _q = *((x.y)->z); if (_q == 'a' || _q == 'b' || _q == 'c') { ... I also want to return values from arbitrary expressions like a for loop. Lisp has something called a lambda definition which is something like a local temporary function definition. Sort of like: if ( lambda int f(register char _q = *((x.y)->z)) { return(_q == 'a' || _q == 'b' || _q == 'c'); } /* end of lambda def */ ) { ... or x = lambda int f() { for (...) { ... return(some expression); ... } /* for */ } /* lambda */ So, academically, if we were to add a simlar feature to a new dialect of C, or to construct a pre-preprocessor for ansii C (a la C++), what problems do you see? K. Richard Magill
rose@think.ARPA (John Rose) (02/07/86)
(Pls. forgive/correct errs of etiquette; 1st time poster.) In article <187@rexago1.UUCP> rich@rexago1.UUCP (K. Richard) writes: >Lisp has something called a lambda definition which is something like >a local temporary function definition. >So, academically, if we were to add a simlar feature to a new dialect of C, >... what problems do you see? > >K. Richard Magill Yes, I've been wishing for this kind of thing too. We probably all feel that C macrology is greatly complicated by the "statement/expression" distinction; extreme cleverness is sometimes required to turn control flow into commas and question marks. Finding temporaries to use can be a nightmare. Here's another reason for wanting lambda's: Something called a "lexical closure" can provide an extremely clean way to create special a purpose "message-handling object". (Why are they always called "object" or "instance" or some equally nondescript name?) First, a digression on closures, then discussion of their relation to "objects", finally syntax and implementation notes. Functions may have free variables; in C, a block's free variables are just those which are declared outside that block but used inside it. Most languages allow you to take a block of code, declare some of its free variables to be formal arguments, and obtain some value which you can store and later function-call on some actual arguments to execute the original block. The block is usually allowed to have free variables other than the arguments (there are often restrictions on their nature). At the moment the function-callable value is obtained, we say that the block is "closed", and the "bindings" of the free variables are included in the value obtained. This is most interesting when the free variables are automatics declared by enclosing blocks, because this implies that the variable must exist as long as EITHER the enclosing block OR the value of the closed block exist. Note that the bindings, not just the values, are captured, in a lexical closure. For example, if the code in the closed block ever changes the value of a closed variable, the code from the enclosing block can see the change, and vice versa; also, closures made in the same activation of the enclosing block share bindings. In many Lisps, including that of our 3600's, one passes a message to an object by function-calling it with a special message selector as the first argument. Thus, objects can be declared and created in a single, simple block of code: (***LISP ALERT***) (defun make-char-deletion-stream (instream chars-to-delete) ;; This program is buggy, but illustrative. #'(lambda (op &rest args) (loop for res = (common-lisp:apply instream op args) unless (and (eq op :tyi) (memq res chars-to-delete)) return res))) You don't need to understand Lisp to understand the following: The "(LOOP ...)" is the block which is closed; the value obtained is returned as the value of the function MAKE-CHAR-DELETION-STREAM. That block has four free variables: INSTREAM, CHARS-TO-DELETE, OP, and ARGS. The last two are argument variables. The first two are automatics which are defined in an enclosing block. When the LOOP block is closed (by the "#'(LAMBDA...)" construct) the bindings of INSTREAM and CHARS-TO-DELETE are captured; they would ordinarily disappear on exit from MAKE-CHAR-DELETION-STREAM, but now they persist as long as the returned closure value exists. They are, in fact, "instance variables" in O.O.P. sense. But what a difference from most object oriented languages! There is no complicated declaration of object contents or operations; everything is in one place. This is not always desirable: in C it is often necessary to have a declarations file, and perhaps several files of implementation, for some data structure. But for a simple specialization of an already-defined data structure, such a syntax cannot be beat. So, that's why I like closures. Now I'll say why they are not always appropriate in plain C, but might find a place in C++. In C, block-closings are only allowed at file top level, the closing is done once as a static "initialization", and the "value obtained" is permanently bound to the function symbol. Note that all free variables in such top-level blocks CANNOT be automatic, and have static (indefinitely long) lifetime. Therefore there is no reason to add the following restriction, which I commend to your attention: When a block is closed, all non-argument free variables MUST have static lifetime. In the presence of this restriction, the value of a closed block can be implemented simply as a function pointer (to some anonymous static function, generated by a hypothetical preprocessor). This is because the addresses of static-lifetime variables can be compiled into code. If auto variables are to be closed over (something like this is necessary to get "instance variables"), then a simple function pointer cannot represent the closure; there must be additional information carried around which says where to find the bindings. Moreover, the compiler must arrange to put captured automatic variables in the heap. There is a lot of hair in a good implementation of this. C probably shouldn't allow this, because the simple operation of closing (taking the address of) a block implies heap-allocation, a task not regarded as primitive. Also, captured auto variables are really not auto, a misrepresentation which is foreign to C (reference to a captured auto, even in the enclosing block, would require an extra indirection to reference the heap, a hidden inefficiency). One could also allow free automatic variables in a closure, and say that use of them is equivalent to indirecting through a pointer taken to the auto variable in question. Returning such a closure (passing it "up" the stack) is just as erroneous as returning the address of an automatic variable. (A compiler could catch common cases of both.) Passing a closure "down" the stack is fine; this would be done when calling library functions like "qsort" or Lisp "MAPCAR", which is probably a common usage. By the way, if a variable being closed over is declared "const", the value of the variable itself, not the binding (address of the variable), can be put into the closure. This can remove a restriction about returning things "up" the stack. It might be reasonable to do this *always*, and require bindings, when required, to be simulated with addresses (or C++ references). This latter rule produces only approximate lexical closures; it may be called "capture by value". It eases (implementation of) "upward" closures (e.g., objects), but "downward" closures often want to "capture by reference", say to accumulate a count into a captured variable from inside MAPCAR. There is a need to worry about the *type* of the closure value. We'd like to pass it to such utilities as "qsort", but you can't store BOTH a static function pointer AND pointer(s) to variable bindings in a single word, and then expect it to be function callable by naive software. A quick solution would be to allow only one closure from any given block to exist at a time, and have static variables for it point to its current bindings. This would work for "qsort", for example, but fail if the same sort were called recursively. (There are further solutions here, but I'm uncomfortable with allowing only one closure. Perhaps others aren't?) Another solution is to introduce a new type, which is a struct of a function pointer and some bindings, which is function-callable in the obvious manner. Since this is a special case of a C++ class (with function-calling overloaded), class-declaration syntax could be extended (I'm still thinking about this one). Naive code cannot accept these pseudo-function-pointers. The best solution would be to in fact "compile code onto the stack", to use a hoary phrase. The struct (or class) of the previous paragraph would have some bytes inside it which would cause a call to some appropriate static function (which would be the closed block). Magic code in the function prologue of the static function would examine the return address and deduce the location of the struct, hence the bindings. If pointers to such structs are to be returned "up" the stack, they must be put in the heap using malloc (or "new" in C++). I've sometimes felt frustrated that in C programs the number of functions is fixed before run time. Function-generating functions are useful. I'd be willing to manage the heap storage; just give me a way to make such things. I will return to this below, after some syntax. Note that if the address of a closure isn't taken, but it is just immediately function-called, then we needn't worry about types, and inline expansion can be done. This would be the case with many macros. To give an example, we must define some syntax. I suggest *not* using a new keyword like "lambda". Consider the cast syntax. Inside the parens goes a declaration, sans declared identifier and terminating semicolon. Well, extend this to allow the top-level function definition syntax. The function body is syntactically like an initializer, which is not allowed in a cast, so we have our choice of where to put it. Some possibilities (Yacc allows any, I think): (int /*absent_id*/(arg1, arg2) char *arg2;) { /*body*/ } (int /*absent_id*/(int arg1, char *arg2)) { /*body*/ } (int /*absent_id*/(int arg1, char *arg2) { /*body*/ }) This is an expression syntax. The semantics are those of a function identifier. Syntactic precedence should be that of a parenthesized subexpression. I favor the last, because it is more obviously a syntactic unit, and the contents look most like a file-level form, but the second makes sense too, because it looks like a cast-application. Also, the following rewrite might be considered: ({ /*body*/ }) ==> (appropriate_type /*absent_id*/(void) { /*body*/ })() I.e., the closure is immediately invoked with no arguments. The type is deduced from all of the return statements, as with "?:". Finally! An example! Here is a variant of "qsort" which is passed a key-extraction function instead of a comparison function. qsort_key(vec, vec_len, vec_size, key_extract) void *vec; int key_extract(void *); { /* Sort the vector, using the given key. */ qsort(vec, vec_len, vec_size, (int (void* p, void* q) /* upward */ { return key_extract(p) - key_extract(q); })); } Also, one can "Curry" a function, say exponentiation: double (*make_myexp(const double base))(double exp) { extern double pow(double,double); return (double (double exp) /* downward */ { return pow(base,exp); }); } /* pow(x,y) == make_myexp(x)(y) */ Here's a "combinator": int (*curry_2_1(int (*f)(int,int), int arg1)) (int arg2) { return (int (int arg2) { return f(arg1, arg2); }); } Back to implementation. We want to be able to create function objects. Since C doesn't specify primitives for creating functions, a hypothetical pre-processor must generate implementation-dependent code. One can write a mixture of C and assembler code which will initialize a piece of memory to look like a function; this function will presumably put the arguments and the closure's data-pointer into standard places and then call the actual code for the block. A simple example, for the Vax, follows the signature below. The callable piece of memory might be (as it is below) the initial part of the closure's structure; subsequent parts of the closure struct might contain pointers to the various bindings, and the whole thing could be declared as a struct (or class). So, that fourth argument to "sort" is simply the address of a local structure. In order to return such a thing, it should be saved in the heap (or copied into a waiting buffer). In order to be moved around, it must be typed, sized, etc. It may be that the thing would even object to being moved (non-PIC, and all that). Well, at that point it's time to start thinking about C++, with its greater control of initialization and assignment. One amusing possibility to think about, regarding C++ extensions, is to make all this happen automatically under certain conditions. For example, if operator() is virtual (and not overloaded), implement it as above, so that the class is REALLY function-callable. Also, allow classes to be declared inside blocks, and allow the operator() member function to contain free variables, and interpret such free variables to generate implicit member definitions (which are copies of the original value, or references to the original binding). The syntaxes for lambda and expression blocks given above could then be viewed as "sugar" for such class declarations. Or, if C++ is not desired, one could use small syntactic variations on the lambda construct to control whether the closure was allocated on stack ("upward") or heap ("downward"), or data segment ("downward, but 1 at a time"). One might use the storage class specifiers "auto", "extern", and "static", respectively, distinguish those cases. At this point I'll make my exit. Cheers! ---------------------------------------------------------- John R. Rose, Thinking Machines Corporation, Cambridge, MA 245 First St., Cambridge, MA 02142 (617) 876-1111 X270 rose@think.arpa ihnp4!think!rose /* Implementation example: Creation of function objects. */ #include <sys/types.h> #include <frame.h> extern _tmpl(), _tmplSize, _tmplOff, _tmplRet; #define _TMPL_SIZE 12 /*>= ((int)&_tmplSize)*/ struct _closure { char cl_bits[_TMPL_SIZE]; }; /* _TMPL is a real function, which serves as a template for our objs. */ asm(" .text"); asm("__tmpl: .word 0"); asm(" callg (ap), __tmpl_blank"); asm("tmplRet: ret"); asm("tmplEnd:"); /* Set up some constants: */ asm(" .set __tmplSize, tmplEnd - __tmpl"); /* sizeof(_tmpl) */ asm(" .set __tmplRet, tmplRet - __tmpl"); /* offset of ret */ asm(" .set __tmplOff, __tmplRet - 4"); /* offset of func */ static _tmpl_blank() { abort(); } /* This stuff goes into any function which is called by a _TMPL. * It sets up a self-pointer to point to the _TMPL. */ #define _closure_decls(type,this) \ register type* this #define _closure_inits(type,this) \ asm(" movl fp,r11"); \ this = (type*)(((struct frame *)this)->fr_savpc - (int)&_tmplRet) void _init_closure(func, cl) void (*func)(); struct _closure *cl; { *cl = *(struct _closure *)_tmpl; *(char **)(&cl->cl_bits[(int)&_tmplOff]) += (char *)func - (char *)_tmpl_blank - /* pc-relative: */ ((char *)cl - (char *)_tmpl); } /* Example of a closure: double (*make_myexp(const double base))(double exp) { return (static double (double exp){ return pow(base,exp); }) } *** (*make_myexp(2.71828*))(x) should be equivalent to exp(x), etc. *** *** Curried exponentiation: pow(x,y) == make_myexp(x)(y) *** */ /* Preprocessor yields: */ struct make_myexp_cl { struct _closure cl; double base; }; static double make_myexp_1(exp) double exp; { _closure_decls(struct make_myexp_cl, this); extern double pow(); _closure_inits(struct make_myexp_cl, this); return pow(this->base, exp); } double (*make_myexp(base))() double base; { struct make_myexp_cl *r; r = (struct make_myexp_cl *)malloc(sizeof(struct make_myexp_cl)); _init_closure((void (*)())make_myexp_1, &r->cl); r->base = base; return (double (*)()) r; } -- ---------------------------------------------------------- John R. Rose, Thinking Machines Corporation, Cambridge, MA 245 First St., Cambridge, MA 02142 (617) 876-1111 X270 rose@think.arpa ihnp4!think!rose
dmiruke@isis.UUCP (Dataram Miruke) (02/07/86)
> I often find myself creating highly localized register variables in an > attempt to synthesize common sub expression optimization: > > { > register char _q = *((x.y)->z); > > if (_q == 'a' || _q == 'b' || _q == 'c') { ... > > I also want to return values from arbitrary expressions like a for loop. > Lisp has something called a lambda definition which is something like > a local temporary function definition. Sort of like: > > if ( lambda int f(register char _q = *((x.y)->z)) { > return(_q == 'a' || _q == 'b' || _q == 'c'); > } /* end of lambda def */ ) { ... > > or > > x = lambda int f() { > for (...) { > ... > return(some expression); > ... > } /* for */ > } /* lambda */ > > So, academically, if we were to add a simlar feature to a new dialect of C, > or to construct a pre-preprocessor for ansii C (a la C++), what problems > do you see? > > K. Richard Magill Don't the inline functions in C++ provide a similar feature? - Datta Miruke
bs@alice.UucP (Bjarne Stroustrup) (02/09/86)
> From allegra!mit-eddie!think!harvard!seismo!hao!nbires!isis!dmiruke Wed Dec 31 19:00:00 1969 > From: dmiruke@isis.UUCP (Dataram Miruke) > Subject: Re: lambda defs in C > Organization: University of Denver Math and Computer Science > >> I often find myself creating highly localized register variables in an >> attempt to synthesize common sub expression optimization: >> >> { >> register char _q = *((x.y)->z); >> >> if (_q == 'a' || _q == 'b' || _q == 'c') { ... >> >> I also want to return values from arbitrary expressions like a for loop. >> Lisp has something called a lambda definition which is something like >> a local temporary function definition. Sort of like: >> >> if ( lambda int f(register char _q = *((x.y)->z)) { >> return(_q == 'a' || _q == 'b' || _q == 'c'); >> } /* end of lambda def */ ) { ... >> >> or >> >> x = lambda int f() { >> for (...) { >> ... >> return(some expression); >> ... >> } /* for */ >> } /* lambda */ >> >> So, academically, if we were to add a simlar feature to a new dialect of C, >> or to construct a pre-preprocessor for ansii C (a la C++), what problems >> do you see? >> >> K. Richard Magill > Don't the inline functions in C++ provide a similar feature? > - Datta Miruke Yes, the following is legal C++ inline int f() { for (...) { ... return(some expression); ... } } However, the current implementation cannot handle loops or switches in an inline, so you get a ``sorry, not implemented'' message. Sorry, we will have to wait a bit longer. The non-looping example can be handled quite nicely using an inline. Inline functions differ from lambdas in that they must be defined (outside any function) before they can be used. - Bjarne Stroustrup (AT&T Bell Labs, Murray Hill). PS Calling the current C++ implementation a pre-processor for C is as accurate as calling C a pre-processor for the assembler. The C++ front-end does a complete typecheck and performs non-trivial code transformations (like inline expansion). It relies on the underlying C compiler for the part of the code generation process (only). ANY message from the C compiler is considered a C++ compiler bug.