[comp.lang.lisp] Size of Common Lisp

ram+@cs.cmu.edu (Rob MacLachlan) (02/16/91)
So what about the size of Common Lisp?  Is Lisp a VAX?  Consider the
similarities:
 -- Big (too big to run on a VAX)
 -- Old (older than a VAX)
 -- Slow
 -- Sold by DEC

What lessons (if any) can language designers draw from the victory of RISC?
 -- Choose primitive primitives and migrate complexity upward.
 -- At any given level, implement only those features that are cost-effective.
 -- Optimize operations with a high dynamic frequency in favor of rare
    operations. 

Note that the lesson of RISC is *not* "choose simple systems."  Considering
the additional complexity of the compiler (due to the optimization a RISC
processor demands), a RISC-based programming system is more complicated
overall.  A solution should be as simple as possible, and no simpler.

Consider the minimalist-maximalist dimension of the system design space:
 -- A minimalist design is as simple as possible (in hopes of low production
    costs.) 
 -- A maximalist design is as complex as is useful (in hopes of maximizing
    flexibility and ease of use.)

RISC was initially publicized using largely minimalist arguments:
  "Our chip is faster because it has fewer instructions."

Although the "RISC philosophy" has won over almost all hardware architects,
the minimalist argument has been largely rejected as simplistic.  This is
why you see RISC machines with hundreds of instructions.  The key point is
that although a simple CPU design is inherently cheaper to design and
manufacture, it is not true that the simplest CPU chip invariably produces
the most cost-effective computer *system*.

A minimalist programming system is based on a small set of cheap operations.
Whenever you need another operation, you implement it using these cheap
operations.  At least for simple problems, this results in programs that use
about as little hardware as possible.  There are two major problems with
minimalist programming systems:
 -- The minimalist design goals totally ignore the cost of programming.
 -- The hardware efficiency advantage of minimalist systems is only clear-cut
    with minimalist programming problems.

Today there is no such thing as a minimalist programming system.  All
programming systems (assembler, C, whatever) have accepted some some reduction
in hardware efficiency in exchange for an increase in ease of programming.
This is widely accepted as true, as a "good thing".  [But programming systems
still do differ in their position on the minimalist-maximalist continuum.
More later...]

It is less widely acknowledged that minimalist systems aren't that much more
efficient for solving complex problems.  Could you write X much more
efficiently in assembler than its current C implementation?  Before you answer
"Hell yes!", I point out that I mean X11, xlib, resources, widgets, xtk and
motif, with total software pin-for-pin compatibility with the C version.

Of course, nobody would attempt to do such a thing.  If someone wanted a
maximally efficient window system, they would start by throwing out huge
amounts of "unnecessary" functionality.  Only after they have finished
crafting a minimal window system for their application would they consider
anything drastic like assembly coding.

Everybody agrees that X contains lots of unnecessary stuff.  But given the
evident truth of this, it is amazingly difficult to find features that almost
everyone agrees are unnecessary.  This is "design by committee" in a nutshell.
If you were writing a window system for just you to use, then you really could
throw out all the junk.  But if you want your office-mate to help you, then
you may find he has his own ideas.  Pretty soon you have keymaps and
environment variables.  And he may want to draw splines.

The bottom line is that today's programming systems are used by many people.
This means that a programming system must please many people's tastes and
cater to many people's needs.  This means there will be some mismatch between
your needs and the available offerings, but there is *no better way*.  If you
want to buy a car, but you hate arm-rests, it is cheapest to buy a Ford off
the show-room floor and cut off the arm-rests.

Any system over a certain small complexity has to be a multi-person effort.
At this point, the minimalists are no longer offering an alternative -- they
are merely bitching that systems that complex *should not be*.  There are some
things man was not meant to know...  God's will, Dijkstra's will, whatever.

But suppose you did decide to reimplement on top of a minimalist system.  If
so, there is a real risk that reimplementing "just what you need" will result
in a less efficient program.  To be sure, if you were as wizardly as the X
implementors and spent as much time tuning as they did, then you could get the
same or better efficiency for your operation subset.  But if you say, "2-d
trees are too hairy, I'll use a linear search", then you may be in for a
surprise.

I think that in reality, the biggest efficiency advantage of minimalist
programming systems is not the any inherent advantage of reimplementing all
the time, but rather that:
    **** Minimalist operations provide a good Efficiency Model ****

An efficiency model is a tool the programmer uses to tell how expensive a
particular code fragment will be.  In the current state of the art,
application of efficiency models is purely manual: you stare at the code.
If you see a lot of operations or a lot of repetitions, then you know the
code is expensive.  

In C, the efficiency model is so straightforward that Jon Bentley can
publish an article telling you how to predict the performance of your C
program with 80% accuracy.  In contrast, most of today's Lisp programmers
are working with internal efficiency models having perhaps 5% accuracy.  And
every time somebody writes a Lisp program 20x slower than a C program,
people conclude "Lisp is terribly inefficient."

o.k., now I've stopped baffling you with bullshit about window systems, and
we're back to the Common Lisp v.s. Scheme v.s. C wars.  And RISC v.s. CISC
too!  Way to go...

I'll restate my RISC lessons here so that you don't have to page back:
 -- Choose primitive primitives and migrate complexity upward.
 -- At any given level, implement only those features that are cost-effective.
 -- Optimize operations with a high dynamic frequency in favor of rare
    operations. 

Remember those 100 instruction multi-million transistor RISCs?  Well, the
reason that new instructions creep into RISC processors is one of the reasons
that new functions creep into Lisp.  Integer multiplication was added to the
SPARC because some people use it a lot, and it is a pain for each of them to
design their own hardware multiplier and hook it into the motherboard.
Similarly, Common Lisp has hash-tables because some people use them a lot, and
it is a pain for everyone to have to design their own and hook them into the
garbage collector.

With either integer multiplication or hash-tables it is possible to come up
with high-level software solutions that don't require a soldering iron, but
you will pay a big efficiency hit.

The *other* reason why functions creep into Common Lisp is a strictly
maximalist one: to get economies of scale by standardizing the interfaces to
useful functionality.  In a word, reuse.  In fact, this is a lot of the reason
that CISCs got so complicated: not because the hairy instructions were much
faster, but because they were easier to use in assembly language.

RISC has to some degree punted on the interface standardization issue by
observing that only compilers and assemblers need to know the interface to the
hardware.  In other words, they pushed the interface up a level.  Programming
systems can do this too, but at some point, there has to be a top level that
programmers can recognize.  This strongly encourages standardization of the
intermediate layers as well (to avoid duplication of effort).

Common Lisp is definitely a maximalist programming system.  Does this mean
that it is a total loss for efficiency?  I think not, but there are some
major efficiency challenges:
 1] Make better efficiency models for Common Lisp.  Add automatic compiler
    efficiency notes that flag simple problems, and document advanced compiler
    capabilities so that programmers can exploit them.
 2] Eliminate unnecessary inefficiencies in Common Lisp by adding complexity 
    to the implementation:
     - Compiler optimizations provide graceful performance degradation for
       less-tuned programs.
     - Generational GC makes consing cheap and eliminates offensive GC pauses.
 3] Make simple Common Lisp programs comparably efficient to simple C
    programs.  This is the "delivery" problem.

The Python compiler for CMU Common Lisp has already addressed 1 and 2.  A
case study is Scott Fahlman's "export" connectionist AI simulator.  This is
a medium-size float-intensive application written by someone with a good
Lisp efficiency model.  It was written well before Python's number-crunching
capabilities were designed, and was tuned to perform well in the
then-available Lisp systems.

When first compiled with Python, there were more than 100 efficiency notes.
Even with all these problems, the program ran 2x faster than the commercial
Lisp on the same machine (I won't say whether it was Franz or Lucid.)
After fixing type declaration problems revealed by these notes, it was 5x
faster.  Interestingly, a version of this program had been recoded in C for
speed.  The Python version took only 1.6 times as long as the MIPS C compiler.

Floating point?  Who cares?  That's "unnecessary"...

The point is, that with sufficient effort and complexity in the Lisp
implementation, users can fairly easily come within a factor of 2 of C, even
for problems that seem "un-Lispy".  In other words, the efficiency model has
been improved from 5% to 50% accuracy.

As for 3 above (the delivery problem), there is still a lot of work to be done
there.  Both Lucid and Franz are working on it, and I have some ideas too.
The problem is how to get a relatively small binary for a delivery system.
The difficulty is in figuring out how to do this without throwing the baby out
with the bath-water.  People don't use Lisp instead of C because they like its
funky syntax.  They use Lisp instead of C because it has lots of useful stuff
built in, and is easy to extend.  Unfortunately, those are the exact same
reasons that "ld" doesn't work well on Lisp programs.

What I find more interesting than the prospect of making FORTRAN-in-Lisp as
fast as FORTRAN or C-in-Lisp as small as C, is the prospect of making
Lisp-in-Lisp as fast as possible.  Now we have FORTRAN efficiency in Lisp, but
how about making efficient use of all the stuff that's in Common Lisp but not
FORTRAN?  Although that's lots of stuff, I assure you that there are few
useless features.  I recently read through the index of functions, macros and
special forms in CLtL, and it was rare to find an operation I had never used
(and I found none that I didn't know what they did.)

The real answer to "Why is Common Lisp so big?" is "Compared to what?"
If your system is very small compared to Common Lisp, then maybe you should
attack a more complex problem.  Big systems easily exceed the size of
Common Lisp, making the penalty for having all that stuff around relatively
small.

As systems become more complex, the original CLtL I Common Lisp will come to
seem less and less complex.  Remember ALGOL with no I/O?  Fortunately, Common
Lisp is growing to keep up with progress in computer science.

  Robert A. MacLachlan (ram@cs.cmu.edu)