[comp.lang.functional] Open Letter To Chin About HO=>FO

nelan@enuxha.eas.asu.edu (George Nelan) (05/08/90)

Chin, congratulations for a superb exposition of your HO=>FO transformation
algorithms! (but I hardly think 'Belated Reply' is appropriate terminology!:-)

This open letter should address most of the issues of your last posting. 
I shall address your algorithms #2 & #3 (#1 *seems* nearly identical).
BTW, I am going to use what I call "flat-nearHaskell" since this more closely
approximates the (Hope) style of your examples.  Also, my examples should help 
to clarify just what is meant by "flattened = Johnsson-style lambda-lifted".

The first example (alg. #2) illustrates your 'instantiated' HO arguments: 
| Given a higher-order program:
| 	f(x)		= alt_map(x,lambda a=>a*a end,lambda a=>a*a*a end)
| 	alt_map([],F,G)	= []
| 	alt_map(x:xs,F,G)= F(x):alt_map(xs,G,F)
| The second transformation algorithm will remove the 'instantiated'
| higher-order function call in f by transforming it to:
| 	f([])	= []
| 	f(x:xs)	= x*x:f'(xs)
| 	f'([])	= []
| 	f'(x:xs)= x*x*x:f(xs)
| This transformation is a form of function specialisation (or partial
| evaluation) which specialises function calls with instantiated 
| higher-order arguments. We have to be careful which of the higher-order
| arguments we are allowed to specialise for recursive functions. If we are not
| careful, we may get a non-terminating transformation algorithm. The parameter
| criterion I used to determine if an argument can be specialised or not
| is called, the 'variable-only' criterion. For example, the above
| alt_map function contains two higher-order parameters, F and G, which are
| both 'variable-only'. This is so because the corresponding recursive call, 
| alt_map(xs,G,F), in the 2nd defining equation of alt_map contains variables 
| (constants allowed as well) in those parameter positions.

A corresponding "flat-nearHaskell" example is:
	p where p = (f [1 2])
		(f x) = (alt_map x t1 t2)
		(t1 a) = {a * a}
		(t2 a) = {a * {a * a}}
		(alt_map [] F G) = []
		(alt_map [x:xs] F G) = [(F x) : (alt_map xs G F)]

My algorithm fails for this! (however, it appears that it would be feasible
to modify the alg. to account for this special case).  This begs the question:
where in the heck do these special cases come from anyway?  Chin, how long did
it take you to find all these cases? :-)
NOTE TO TP (Term Paper) READERS: this appears to be a failure of the 'fixed' 
function in 'new_callee' when weakening alt_map. 

The second example (alg. #2) illustrates your 'variable-only' rule: 
| In contrast, another higher-order function:
| 	acc_map([],F)	= []
| 	acc_map(x:xs,F) = F(x):acc_map(xs,lambda a=>a+F(x) end)
| does not have any 'variable-only' higher-order parameter. This is because 
| its corresponding recursive function call, acc_map(xs,lambda a=>a+F(x) end),
| contains an accumulating term, lambda a=>a+F(x) end, in its ho-parameter
| position. Such parameters are more difficult to specialise/remove and
| may require laws to do so. I have not offered any solutions towards 
| the removal of ho-arguments without the variable-only criterion.

A corresponding "flat-nearHaskell" example is:
	p where p = (acc_map [1 2] g)
		(g x) = {x * 2}
		(acc_map [] F) = []
		(acc_map [x:xs] F) = [(F x) : (acc_map xs (t1 F x))]
		(t1 F x a) = {a + (F x)}

It appears that my algorithm cannot (presently) handle this either! 
NOTE TO TP READERS: this appears to be yet another failure of the 'fixed'
function when weakening acc_map.

The third example (alg. #2) illustrates your "full-laziness rule".
In this case, my alg. DOES produce a correct transformation; albiet not
in fully lazy form (I assume these kinds of optimizations as post-processes):  
| Higher-order parameters which are non-linear may, at times, result in 
| loss of efficiency when they are specialised. For example, the map function
| below:
| 	map([],F) 	= []
| 	map(x:xs,F)	= F(x):map(xs,F)
| contains a non-linear higher-order parameter, F (occurred twice in the RHS
| of the second equation). As a result, if we were to directly specialise the 
| map function call in:
| 	g(xs,y)		= map(xs,lambda a=>sqr(y)+a end)
| we get a less efficient program (note the re-computed sub-term
| sqr(y)):
|	g([],y)		= []
|	g(x:xs,y)	= (sqr(y)+x):g(xs,y)
| This issue is related to Wadworth's
| full laziness graph reduction technique and Hughes's fully lazy lambda
| lifting technique. To avoid this loss of laziness. I formulated a
| transformation algorithm which would extract out 'ground-type' maximal
| free expressions (MFEs - see [Hughes82]). A ground-type MFE for the 
| above example is sqr(y). This could be extracted out to give:
| 	g(xs,y)		= g'(xs,sqr(y))
| 	g'(xs,Y)	= map(xs,lambda a=>Y+a end)
| before g' is specialised to:
| 	g'([],Y)	= []
| 	g'(x:xs,Y)	= (Y+x):g'(xs,Y)
| We are interested in extracting only ground-type MFEs and not function-type 
| MFE. This is because the extraction of function-type MFEs, e.g. + sqr(y),
| actually introduces (rather than remove) ho-functions - contradicting 
| with the goal of ho-removal transformation. 
| 
| The laziness preserving transformation is quite complex because we may need to
| extract out deeply 'buried' ground-type free expressions. I shall not go into
| details here. The transformation algorithm can be found in my thesis and 
| in a paper I am currently revising (e-mail me for details).

A corresponding "flat-nearHaskell" example is:
	p where p = (g [1 2] 16)
		(g xs y) = (map (t1 y) xs)
		(t1 y a) = {(sqr y) + a}
		(map f []) = []
		(map f [x:xs]) = [(f x) : (map f xs)]
    =>
	p where p = (g [1 2])
		(g xs) = (map_1 xs)
		(t1 y a) = {(sqr y) + a}
		(map_1 []) = []
		(map_1 [x:xs]) = [(t1 16 x) : (map_1 xs)]

| Related Works
| -------------
| 
| [Goguen88] and [Wadler88] both proposed a macro expansion (unfolding) 
| technique to remove a restricted class of higher-order functions. More
| specifically, their technqiue could be applied to higher-order functions 
| whose function-type parameters are 'fixed' (e.g. map function above 
| but not alt_map nor acc_map).

| [Nielson&Nielson89] proposed the higher-order function removal 
| transformation as a special form of partial evaluation 
| with the help of BTA (Binding-time analysis).
| I think their proposed technique is presently only able to handle
| the removal of function-type arguments without free variables.
| 
| References
| ----------
| [Burstall&Darlington77] A Transformation System for Developing 
| 		recursive programs, JACM 1977
| [Goguen88] Higher-Order Functions considered Unnecessary for Higher-order
| 		Programming, SRI-CSL-88-1
| [Lester89] Stacklessness: Compiling Recursion for a Distributed Architecture
| 		FPCA '89
| [Hughes82] Supercombinators: A new implementation technique for Applicative
| 		Languages ACM LFP 1982
| [Nielson&Nielson89] Transformation on Higher-order Functions FPCA'89.
| [Wadler88] Deforestation - Transforming Programs to Eliminate Trees,
| 		ESOP '88

As far as I can tell (correct me!), it appears that we (currently) have 
(in terms of algorithmic generality):

	{Goguen88, Wadler88} < mine < yours

(BTW, I HAVE read Goguen88 - very nice).
I don't know how [Nielson & Nielson89] (or anyone else, for that matter!)
fits into this "ordering".  Of course, this is subject to change depending
on how many more special cases are thrown at me!!  It seems that my alg.
and Chin's alg. differs primarily in that I assume that optimization (e.g.,
for full-laziness) is a post-process and also in that Chin's can deal with
the "alt_map" example whereas mine cannot (without modification).

These examples expose holes in my alg. (one should be fixable and the
other: you can't do either).  Both of these holes involve what I call a
'fixed' function (which, BTW, has (apparently) the same connotation as does
your usage of that word!).  My 'fixed' function (supposedly) computes a
new fixed-point for a function (supercombinator) that is macro-expanded
by what I call "weakening".  I think that further research of that function
needs to be done: this *may* yield a single general algorithm that can handle 
all of the special cases you brought up. One of the big questions I have is:
how can one be sure all the special cases are covered?

Finally Chin, I hardly think it would be fair to stop now without adding
some more fuel to the fire :-)

I have a question: how do your algorithms deal with indirect mutual recursion?
For example (under my transformation alg.):
	
	p where p = (f h 3)
		(f g x) = (IF (EQ x 0) 1 {x * (g {x - 1})})
		(h x) = (f h x)
    =>
	p where p = (f_1 3)
		(h x) = (f_2 x)
		(f_1 x) = (IF (EQ x 0) 1 {x * (h {x - 1})})
		(f_2 x) = (IF (EQ x 0) 1 {x * (h {x - 1})})
	

Ok, that's it for now! Whew! My head hurts! Can I go home now? 
(I just wanna play my guitar for awhile, really)
-- 
George Nelan, ERC 252, Arizona State University, Tempe, Arizona, USA, 85287
INET: nelan@enuxha.eas.asu.edu
UUCP: ...{allegra,{ames,husc6,rutgers}!ncar}!noao!asuvax!nelan
QOTD: We don't need no stinkin' "Quote Of The Day" this far out, you know.

wnc@ivax.doc.ic.ac.uk.doc.ic.ac.uk (Wei N Chin) (05/09/90)

Open Reply to:

| >From: nelan@enuxha.eas.asu.edu (George Nelan)
| Newsgroups: comp.lang.functional
| Subject: Open Letter To Chin About HO=>FO
| Date: 7 May 90 18:20:13 GMT
| Organization: Arizona State Univ, Tempe
| 
|      :
| My algorithm fails for this! (however, it appears that it would be feasible
| to modify the alg. to account for this special case).  This begs the question:
| where in the heck do these special cases come from anyway?  Chin, how long did
| it take you to find all these cases? :-)

It took quite some time. The 'variable/constant-only' parameter criterion was
discovered (or re-discovered?) while I was working on an extension to
Wadler's deforestation algorithm. I find that this criterion is extremely
suitable for the specialisation of 'symbolic' rather than just 'grounded/known'
arguments (which have been the pre-dominant concern of Partial Evaluation
techniques).

|        :
| by what I call "weakening".  I think that further research of that function
| needs to be done: this *may* yield a single general algorithm that can handle 
| all of the special cases you brought up. One of the big questions I have is:
| how can one be sure all the special cases are covered?

There is a large class of recursive functions with 
accumulating parameter arguments, similar to the 'acc_map' example, 
which will provide you with *plenty* of special cases
for extending the ho-removal transformations! These would most likely have to
rely on more sophisticated (and corresponding harder but not impossible
to automate) techniques (needing laws/lemmas), like those illustrated 
in [Wand80].

| Finally Chin, I hardly think it would be fair to stop now without adding
| some more fuel to the fire :-)
|
| I have a question: how do your algorithms deal with indirect mutual recursion?
| For example (under my transformation alg.):
|
|	p where p = (f h 3)
|		(f g x) = (IF (EQ x 0) 1 {x * (g {x - 1})})
|		(h x) = (f h x)
|    =>
|	p where p = (f_1 3)
|		(h x) = (f_2 x)
|		(f_1 x) = (IF (EQ x 0) 1 {x * (h {x - 1})})
|		(f_2 x) = (IF (EQ x 0) 1 {x * (h {x - 1})})
	
The above example can be handled in a straightforward way
by my ho-removal transformation algorithms. I haven't mentioned this
previously, but the way to apply the transformation is 
use a bottom-up transformation strategy ([Feather79] used this
strategy very successfully for his semi-automatic transformations).
The principal idea behind bottom-up transformation strategy is to transform
functions which are 'lower' in the defining hierarchy before those above them.
(Mutually recursive functions are considered to be at the same level).

In the above example, the three functions, 'p', 'f' and 'h' are in 
the following hierachy 'p' > 'h' > 'f'. Applying the transformation
algorithms to the three functions would result in the following:
	
	f g x   = (IF (EQ x 0) 1 {x * (g {x - 1})})  (* no changes to this *)

	h x 	= f h x   			(* unfold f h x*)
		= IF (EQ x 0) 1 (x * (h (x-1))

	p	= f h 3				(* unfold f h 3*)
		= IF (EQ 3 0) 1 (x * (h (3-1))  

Notice that since f is non-recursive, some occurrences of its function 
calls, e.g. `f h x` and `f h 3`, can be *directly* unfolded without the need
to introduce intermediate functions. This is allowed as long as there is
no inefficient duplication of arguments. 

Hence, my transformed first-order program (dropping 'f' which is
not referred directly/indirectly by the main function 'p') would be:

	p where p   = IF (EQ 3 0) 1 (x * (h (3-1)))
		h x = IF (EQ x 0) 1 (x * (h (x-1))

Incidently, partial evaluation technique may be used to simplify the RHS of
'p'.

References
==========
[Feather82] A System for Assisting Program Transformation, ACM TOPLAS 1982
[Wand80] Continuation-Based Prog Transformation, JACM 1980

==============================================================================
Wei N Chin
Dept of Computing,Imperial College, London SW7 2AZ
e-mail wnc@doc.ic.ac.uk
NB Will be moving, after 15June1990, to 
	Dept of IS/CS, Natl Univ of Spore, Kent Ridge,Singapore 0511
=============================================================================