[comp.lang.lisp] CLOS speed

faustus@yew.Berkeley.EDU (Wayne Christopher) (10/25/89)

The CLOS book says in a few places that in a good implementation,
method calls can be almost as fast as function calls.  I've used the
Xerox version of CLOS (PCL), and it seems rather slow, but I haven't
make any real comparisons.  My questions are:

	Are there any really good implementations of CLOS yet, and will
	they run under Allegro?

	Does anybody have any real numbers for the speed of PCL?

	Is it possible to compile CLOS code as efficiently as Lisp?
	Are the same optimizations that can be done in C++ (inlining)
	possible?  It doesn't seem so.

Thanks,

	Wayne

barmar@kulla (Barry Margolin) (10/25/89)

In article <18769@pasteur.Berkeley.EDU> faustus@yew.Berkeley.EDU (Wayne Christopher) writes:
>The CLOS book says in a few places that in a good implementation,
>method calls can be almost as fast as function calls.  I've used the
>Xerox version of CLOS (PCL), and it seems rather slow, but I haven't
>make any real comparisons.  My questions are:

No one ever said that PCL was a good implementation.  Its primary objective
was portability (that what the "P" stands for).  Since there are no
portable ways to implement compiler optimizations, it would be very
difficult for it to include any of these.  And on systems with
architectural support for objects and generic functions (e.g. Symbolics
machines) PCL can't take advantage of them.

>	Are there any really good implementations of CLOS yet, and will
>	they run under Allegro?

Symbolics announced in the Summer that their next release, which should be
released this winter, will include CLOS.  I suspect Lucid and Franz will be
including CLOS by the summer.

There was an attempt last year to organize a consortium of Lisp vendors to
develop a high-performance, portable implementation of CLOS.  It was
aborted because not enough people bothered to attend the organizing meeting
(held one evening during an X3J13 meeting).

>	Is it possible to compile CLOS code as efficiently as Lisp?

That's the theory.  Of course, it depends on the programmer including lots
of declarations so that the compiler can determine the class of objects and
optimize away most of the type dispatching.

Note that you shouldn't expect generic function calls to be as efficient as
normal function calls.  Remember that CLOS provides extra functionality per
line of code.  Naturally, there are some tradeoffs.

Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

mujica@ra.cs.ucla.edu (S. Mujica) (10/26/89)

on 25 Oct 89 06:41:04 GMT,
barmar@kulla (Barry Margolin) said:

> Note that you shouldn't expect generic function calls to be as efficient as
> normal function calls.  Remember that CLOS provides extra functionality per
> line of code.  Naturally, there are some tradeoffs.

What about setting the value of a slot?  I have noticed that running
PCL on Lucid 3.0,  it takes about 30 times longer to set the value of
a slot than to set the value of local variable.

Sergio Mujica		mujica@cs.ucla.edu
Computer Science Department, UCLA

barmar@kulla (Barry Margolin) (10/26/89)

In article <MUJICA.89Oct25161848@ra.cs.ucla.edu> mujica@ra.cs.ucla.edu (S. Mujica) writes:
>What about setting the value of a slot?  I have noticed that running
>PCL on Lucid 3.0,  it takes about 30 times longer to set the value of
>a slot than to set the value of local variable.

Accessing a local variable is probably the fastest single operation in any
Lisp implementation.  CLOS slots should be compared with DEFSTRUCT slots,
not with local variables.

I'm sure it will still be much slower in PCL, but that's because it does no
optimization.  PCL's simplistic implementation of SLOT-VALUE makes a number
of function calls; (SLOT-VALUE object 'slot) calls (SLOT-VALUE-USING-CLASS
(CLASS-OF object) object 'slot), which has to do some table lookup to
figure out where that slot is in that class.

In a high performance implementation, with type declarations and block
compilation, the offset can be determined at compile time, and most of the
above can be compiled away or inlined.

Note, by the way, that compilers that can provide the kind of performance
I've been talking about probably won't be around for a while.  CLOS is
designed to permit this kind of optimization, but no one ever said that
implementing such optimizers would be easy.  This is very new stuff, so
there's probably still some compiler research necessary.

Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

pf@islington-terrace.csc.ti.com (Paul Fuqua) (10/28/89)

    Date: Wednesday, October 25, 1989  1:41am (CDT)
    From: barmar at kulla (Barry Margolin)
    Subject: Re: CLOS speed
    Newsgroups: comp.lang.lisp
    
    Symbolics announced in the Summer that their next release, which should be
    released this winter, will include CLOS.  I suspect Lucid and Franz will be
    including CLOS by the summer.

TI includes CLOS in their (our) Release 6.0, which I believe started
shipping last week.

    Note that you shouldn't expect generic function calls to be as efficient as
    normal function calls.  Remember that CLOS provides extra functionality per
    line of code.  Naturally, there are some tradeoffs.

Since the sequence of events will be something like (1) call the generic
function, (2) look up the method, (3) call the method, a generic
function call will take about 2x the time for a regular function call,
plus method-lookup overhead.  On my Explorer, I'm seeing 3x or 4x for
simple one-argument dispatches.

Method-lookup overhead can be reduced by caching methods, and some of
the function-call overhead can be reduced by avoiding duplicate work for
the second call.  A really hot compiler might do the method-lookup at
compile time if it had enough type information, but I don't think anyone
does that yet.
    
    Date: Wednesday, October 25, 1989  10:03pm (CDT)
    From: barmar at kulla (Barry Margolin)
    Subject: Re: CLOS speed
    Newsgroups: comp.lang.lisp
    
    Accessing a local variable is probably the fastest single operation in any
    Lisp implementation.  CLOS slots should be compared with DEFSTRUCT slots,
    not with local variables.
    
    In a high performance implementation, with type declarations and block
    compilation, the offset can be determined at compile time, and most of the
    above can be compiled away or inlined.

Another off-the-cuff Explorer example:  SLOT-VALUE with no type
information takes about 50x a DEFSTRUCT slot reference, SLOT-VALUE with
type information takes 25x, an accessor method takes 13x, and WITH-SLOTS
within a method takes 4x or 5x.  Sometimes I've found it useful to make
a function into a method just to improve the slot-access performance.

Paul Fuqua                     pf@csc.ti.com
                               {smu,texsun,cs.utexas.edu,rice}!ti-csl!pf
Texas Instruments Computer Science Center
PO Box 655474 MS 238, Dallas, Texas 75265

harrisr@turing.cs.rpi.edu (Richard Harris) (10/28/89)

In article <31074@news.Think.COM> barmar@kulla (Barry Margolin) writes:
>In article <MUJICA.89Oct25161848@ra.cs.ucla.edu> mujica@ra.cs.ucla.edu (S. Mujica) writes:
>>What about setting the value of a slot?  I have noticed that running
>>PCL on Lucid 3.0,  it takes about 30 times longer to set the value of
>>a slot than to set the value of local variable.
>
>Accessing a local variable is probably the fastest single operation in any
>Lisp implementation.  CLOS slots should be compared with DEFSTRUCT slots,
>not with local variables.
>
>I'm sure it will still be much slower in PCL, but that's because it does no
>optimization.  PCL's simplistic implementation of SLOT-VALUE makes a number
>of function calls; (SLOT-VALUE object 'slot) calls (SLOT-VALUE-USING-CLASS
>(CLASS-OF object) object 'slot), which has to do some table lookup to
>figure out where that slot is in that class.
This is wrong.  Calls to SLOT-VALUE within the body of a DEFMETHOD 
are optimized.  Look at EXPAND-DEFMETHOD-INTERNAL (in boot.lisp).

>
>In a high performance implementation, with type declarations and block
>compilation, the offset can be determined at compile time, and most of the
>above can be compiled away or inlined.
The offset can be determined at compile time if:
 1) the class of the object is known at compile time, and
 2) the metaclass and superclass chains of the object's class
    can also be determined at compile time, and
 3) the class structure uses single inheritance, and
 4) the instance variables of the class and its superclasses
    will not change.
PCL generates (reasonably) good code when the class of the object 
is known at compile time, but PCL can not assume that conditions 
(3) or (4) are true (when the object's metaclass is standard-class).

There may be better slot accessing techniques that could be used
for CLOS:  for instance, when the Flavors system can determine
the class of an object, it arranges to compute at load time
an offset into a vector (which is stored in each instance) of offsets 
into the instance.  This is probably the fastest method of
slot access that CLOS could use (2 vector accesses per slot-value form),
but it has has a significant drawback: it is hard (or impossible) 
to do load-time evalutation of constants (in code created by macros) 
in some implementations of CL, because CL does not support load-time 
constants very well (#, is too limited).
> ...

>
>Barry Margolin, Thinking Machines Corp.
>
>barmar@think.com
>{uunet,harvard}!think!barmar

Rick Harris

barmar@kulla (Barry Margolin) (10/28/89)

In article <1989Oct27.224758.13427@rpi.edu> harrisr@turing.cs.rpi.edu (Richard Harris) writes:
>it is hard (or impossible) 
>to do load-time evalutation of constants (in code created by macros) 
>in some implementations of CL, because CL does not support load-time 
>constants very well (#, is too limited).

ANSI CL will support load-time constants much better.  We've gotten rid of
"#," and added a LOAD-TIME-VALUE special form.
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar