[comp.lang.scheme] combinator code

jrk@s1.sys.uea.ac.uk (Richard Kennaway CMP RA) (11/18/88)

In article <8811161428.AA13775@toucan.LCS.MIT.EDU>, bard@THEORY.LCS.MIT.EDU writes:
> 
> I thought that the reason for using supercombinators rather than S and K is
> space.  Ordinary \-calculus expressions can be translated into SK-combinatory
> expressions, but the combinatory version is exponentially longer than the SK
> term.  Some sets of supercombinators give polynomial-length expansions; I
> think some give linear expansions.

Here's a bunch of references for the complexity of various combinator
translations.  I didnt see all of this discussion, so my apologies if
I'm repeating things that have already been said here.

Yes, SK is exponential, which is why (as far as I know) noone has ever
used it for a machine.

Turner gave a translation into S, K, B, C, and I (Bxyz = x(yz), Cxyz =
xzy, Ix = x), (Software P&E, vol9, 1979), which is cubic in the worst
case.  He then added three more combinators S'wxyz = w(xz)(yz), B'wxyz
= wx(yz), and C'wxyz = wxzy, (J.Symb.Logic, vol.44, 1979) which is
quadratic in the worst case.

Counting arguments show that the best you can do with any fixed set of
combinators is nlogn.  This can be achieved (Noshita, Inf.Proc.Letters,
vol.20, 1985, and - ahem - Kennaway, Inf.Proc.Letters, vol.24, 1987)
though first run-length encoding the runs of identical combinators that
Turner's second translation tends to give, then programming the
run-lengths in combinators.  The last step is purely of theoretical
interest, in practice you might as well directly implement the
run-length encoded version, or the closely related "director strings"
(Kennaway, op.cit, and Kennaway&Sleep, ACM Toplas, October 1988), as
has been done on the SKIM2 machine (Stoye, Proc. Functional Programming
Workshop, Goteborg, 1985), to obtain a modest improvement in speed of
evaluation of expressions, compared with Turner's second translation.
(This is probably more important than getting the code size down - in
practice I doubt the quadratic effects are significant.)

Burton (Inf.Proc.Letters, vol.14, 1982) claims a linear translation,
but this depends on a different measure of the size, and when measured
on the same basis as the other references, it's also nlogn.

The basic idea embodied in Turner's second translation, and in director
strings, can also be found in an unpublished note which Dijkstra wrote
on seeing Turner's first translation (EWD735, 1980), but which no-one,
other than one anonymous referee, seems to have heard of.

There's also the related BC-chain translation, which is a linear-space
representation of Turner's second translation (Noshita & Hikita, 1984,
RIMS Symp. on Sofware Science & Engineering, Kyoto, 1984 - I'm sure
there's a more accessible reference, but I cant find it).  Evaluation
of BC-chain expressions takes linear time relative to evaluation of
Turner (extended set) combinators.  I dont know what the constant
factors look like.

Of course, given an nlogn translation into some fixed finite set of
combinators, you immediately have an nlogn translation into any finite
set capable of expressing a translation, even pure SK (compose the
first translation with a translation of the first set into the second -
only a linear expansion), though of course that's another purely
theoretical result - a bit like producing a RISC compiler by combining
a CISC compiler with a RISC emulator...

Supercombinators (Hughes, thesis, Oxford, 1984, and umpteen papers by
various people since then) are nlogn in the average and worst case.
Their advantage is not in the size of the translation but in speed of
execution, due to the fact that each supercombinator embodies a larger
chunk of computation, and that the supercombinators you get are
tailored to the program being compiled, providing more opportunities for
clever implementation.  (Or so I assume - has anyone compared the actual
performance of something like SKIM2 running director strings with a
supercombinator machine?)

>   Now, code size isn't usually much of a problem, in that we don't
> usually care whether a program is 50K or 250K.

:-) Unless you're running from floppies...(sorry, was thinking of comp.sys.mac)

>                                                 The difference between 50K
> and  2^50K is significant.  I don't know if the translation has a decent
> expected case or not.  

It's difficult to define the "expected case" - depends on the probability
distribution of the programs you compile.  What is a "typical" program?
For the nlogn translations it doesnt matter - once your worst case is down
to nlogn, the counting argument shows that must be the average case as well,
but for Turner's quadratic translation, you have to look at what sort of
programs produce the worst case.  Worst-case lambda-expressions for this
translation have deeply nested subexpressions and functions of many arguments
- the nesting has to be proportional to the size of the program to get
the quadratic effects, e.g. (backslash = lambda):

	\x1.\x2.\x3.\x4.\x5.(x5 ((x2 (x1 x4)) x3))

Does this ever happen?

>   In any event, the SK compiler has a lot of work to do before it can match
> even a fairly stupid supercombinator compiler, simply because it can be
> forced to produce incredibly long code.

Is this a particular SK compiler you're talking about?  Has someone actually
implemented the naive translation into just S and K?  I'm not surprised it's
no good.

> -- Bard Bloom

-- 
Richard Kennaway
School of Information Systems, University of East Anglia, Norwich, U.K.
uucp:	...mcvax!ukc!uea-sys!jrk	Janet:	kennaway@uk.ac.uea.sys

jrk@s1.sys.uea.ac.uk (Richard Kennaway CMP RA) (11/19/88)

Correction: Kennaway, Inf.Proc.Letters, vol.24, 1987, should have been
Kennaway&Sleep, Inf.Proc.Letters, vol.24, 1987.
-- 
Richard Kennaway
School of Information Systems, University of East Anglia, Norwich, U.K.
uucp:	...mcvax!ukc!uea-sys!jrk	Janet:	kennaway@uk.ac.uea.sys