[comp.lang.misc] H.O. functions in imperative languages

jokinen@tucos.UUCP (Matti Jokinen) (07/13/87)
There are many languages that support both higher order functions and
imperative programming style.  Lisp is the best-known example.  However,
efficient implementation of such languages is not quite trivial.

In a computer a higher order function is usually represented as a pair
(C,E), where C is a block of executable code and E is the evaluation
environment.  E is a data structure that contains values of nonlocal
identifiers.  The pair is called a closure.  In a functional language
all objects in E are immutable.  Thus when E is created, the values of
nonlocal identifiers can be copied into E if that is found convenient -
the copy and the original object are indistinguishable.  Variables (and
other mutable objects) cannot be copied; instead, the memory address of
the original object must be stored in E.  Problems arise when the object
is in the stack.

There are several potential solutions:

1. All variables can be allocate from the heap.

2. The compiler could select the allocation method for each variable on
the basis of a global flow analysis.  The problem is undecidable in the
general case, so some variables will be allocate from the heap even if
they could be safely allocated from the stack.  It is also rather
difficult to combine global analysis with separate compilation.

3. Importion of nonlocal variables into routines can be restricted so
that the contents (r-value) of a stack-allocated variable can be
imported but the address (l-value) cannot.  Thus the values of nonlocal
stack-allocated variables are stored in E when the closure is created,
and further assignations have no effect on E.  Pointers to heap can be
safely imported and stored into E even if they point mutable objects.
This idea has been used at least in Russell.

4. Alternatively, exportion of functions out of blocks can be
restricted.  Each object has a scope that determines its maximal
lifetime.  The scope of a function can be defined as the intersection of
the scopes of the values of its nonlocal identifiers.  This idea was
used in Algol 68 (unfortunately in a more restricted form).

5. Yet another solution can be found in Simula and its successors,
e.g. Smalltalk.  Procedures are not first class citizens but
environments are, and a procedure P can be a component of an environment
E.  P is then always evaluated in E.  Environments are allocated from
the heap.


If first order functions are first-class citizens (as in C), higher order
functions of type

    function(x1: t1, ..., xn: tn): resulttype

can be represented as structures:

    funtype = record [code: function(x1: t1,
                                     ...
                                     xn: tn,
                                     e: envtype): resulttype;
                     env: envtype]

If f is of type funtype, it can be called by writing

    f.code (y1, ..., yn, f.env)

This technique can be used in C, but there are two problems:

1. The env component can be of any type.  If its type is fixed, the set
of representable functions will be restricted.

2. The programmer must take care that the proper environment is supplied
as the last argument.

Problem 1 is solved if the language has a builtin type that is the union
of all other types.  Problem 2 can be solved if the language supports
data abstractions.  Funtype is then defined as an abstract data type
with at least two operations, `create' and `call'.  CLU is one of the
imperative languages that support h.o. functions this way.


Matti Jokinen				domain: jokinen@cs.utu.fi
Department of Computer Science		uucp:   ...mcvax!tucos!jokinen
University of Turku
Finland