[net.lang.lisp] Using LISP for scientific programming?

davidson@sdcsvax.UUCP (Greg Davidson) (08/30/85)

In Charley Wingate's reply <1418@umcp-cs.UUCP> to my earlier article
<1057@sdcsvax.UUCP> he makes several points I agree with, but also
spreads some typical myths and misunderstandings about LISP.  Let
me set the record straight on two points.  The first has to to with
language syntax, the second with language efficiency.

1. SYNTAX

> Obviously you've [I've] never used an HP calculator, which are
> very popular among engineers and other physical science types.

Indeed I have used them, and I prefer their RPN (Reverse Polish
Notation) for arithmetic calculation.  However, RPN is not the
same as the CPN (Cambridge Prefix Notation) used by LISP.  In RPN,
the expression
		 _____________
[1]		|   2
	 - b + \| b   - 4 a c
	________________________
		2 a

might be entered as

[2]	b neg b sqr 4 a * c * - sqrt + 2 a * /

with perhaps 29 content keystrokes (depending on lexical syntax).
I find this notation easy to enter but hard to read.  I also find
it error prone, since it lacks clear syntactic grouping.  I consider
it a poor general purpose notation, and criticise its use for such
in the language FORTH.

Cambridge prefix (the *default* input syntax in LISP) would enter
expression [1] as

[3]	(/ (+ (- b)
	      (sqrt (- (sqr b)
		       (* 4 a c))))
	   (* 2 a))

using perhaps 42 keystrokes.  This is more readable (when you're
used to it) and much less error prone (due to explicit grouping
and checkable redundancy).

The typical algebraic language would enter this expression as

[4]	( - b + sqrt( b**2 - 4 * a * c ) ) / ( 2 * a )

using 27 keystrokes.  Thus, Charlie correctly notes:

> Lisp also has the disadvantage of requiring more keystrokes, on
> the average, and this is not an unimportant consideration in a
> world where typing skills are generally poor.

However, CPN is only LISP's *default* syntax.  Any LISP system using
lots of algebraic expressions would define a domain specific syntax.
For example, the Macsyma front end (which is just LISP with some syntax
macros) allows you to enter expressions in form [4], and prints them back
at you in form [1], which I think you'll agree is the most readable;
yet Macsyma stores expressions internally in form [3] (as lists).

In fact, algebraic languages like FORTRAN, C and Ada do have a general
purpose notation which is used in expressing datatypes not built into
the language.  Its called functional notation.  Writing expression [1]
in functional notation yields

[5]	divide( plus( minus(b),
		      sqrt( diff( sqr(b)
				  times(4, times(a, c))))),
		times(2, a))

which is somewhat more combersome than CPN in reading and writing.
It is to the credit of Ada and LISP that they are not stuck with this
notation for programmer defined datatypes.  FORTRAN and C make
programmers reluctant to define their own datatypes because of the
resulting syntactic clumsiness.  This has a very bad influence on the
efficiency and overall quality of programs in those languages.

Of the languages I've mentioned, only LISP gives programmers the ability
to build a notation tailored to any job.  And only LISP lets programs
compute with expressions *as* data as well as evaluate them.  Yet the
design of LISP is very different from the similar designs of FORTRAN,
C and Ada.  Learning LISP is particularly hard for FORTRAN programmers.

2. EFFICIENCY

> Well, the problem comes down to this: most really high speed
> computers are much more like Fortran than Lisp.

Actually, really high speed computers are parallel (usually vector)
machines, which are more naturally programmed in pure-LISP (LISP without
side effects) than in Fortran.  Typical FORTRAN compilers first turn
FORTRAN into a LISP-like tree form, then analyse it to find pieces
which obey pure-LISP-like semantics and can be done in parallel.
	
> Fortran tends to be faster, even in the face of compiled,
> optimized Lisp.

Agreed.  This is because more effort is put into the efficiency of Fortran
compilers on numerical code than any other language, including C and
Ada as well as LISP.  This is despite the poor correspondence between
Fortran and modern computer architectures, not because of it.

> the obvious retort is "well, why shouldn't I pick the language
> which is at least as fast as Lisp, and often much faster?"
	
Indeed.  FORTRAN is the most used language, hence gets the most attention
from compiler writers, hence is usually the most efficient language, hence
continues to be the most used.  How can we break this vicious cycle of
mediocrity?

> In an interactive debugging mode, Lisp is almost always much slower.
	
In interactive debugging, LISP is almost always faster, since Fortran
programmers are trapped in the slow and expensive edit/compile/run cycle.
A typical Lisp system's mixture of interpretation or compilation (often
incremental compilation as a third alternative) is much more efficient of
both machine and human time.

> Save Lisp for the applications in which it really shines:
> symbolic manipulations (like MACSYMA).

Indeed it does shine.  Let people know about this.  Use tools like Macsyma
when you can get them, and use LISP whenever suitable.  As LISP has been
getting more attention lately, better compilers have appeared.  As good
compilers become available and LISP becomes better understood, it will
displace FORTRAN, C and Ada; unless and until something even better comes
along.

_Greg Davidson			Virtual Infinity Systems, San Diego

davidson@sdcsvax.UUCP (Greg Davidson) (09/13/85)

Scott Anderson is correct; LISP programmers can use any algebraic input
notation they like.  Many lisps, e.g., InterLisp, have standard macros
for infix algebraic notation, but its trivial to write such.  Its an
elementary exercise in many LISP books.  Real LISP programmers only use
CPN (Cambridge prefix notation) when they want to (which is often, since
it seems natural once you're used to it).

LISP allows complete freedom in creating appropriate notations for new
datatypes and new domains.  Used sparingly, its very nice.  This is one
of the reasons LISP can be viewed as a more general purpose language
than FORTRAN/Pascal/C/etc.

_Greg Davidson			Virtual Infinity Systems, San Diego

greg@hwcs.UUCP (Greg Michaelson) (10/10/85)

> > Fortran tends to be faster, even in the face of compiled,
> > optimized Lisp.
> 
> Agreed.  This is because more effort is put into the efficiency of Fortran
> compilers on numerical code than any other language, including C and
> Ada as well as LISP.  This is despite the poor correspondence between
> Fortran and modern computer architectures, not because of it.
> ...  FORTRAN is the most used language, hence gets the most attention
> from compiler writers, hence is usually the most efficient language, hence
> continues to be the most used.  How can we break this vicious cycle of
> mediocrity?

LISP is slow because its memory model is non-linear and dynamic. FORTRAN
has linear, static memory which corresponds 100% to VonN architecture.
Most speed loss in LISP implementations, compiled or interpreted, is due to
dynamic memory allocation and garbage collection. To 'break the vicious
cycle of mediocrity' involves realising that LISP is now nearly 30 years
and that times have changed. Pro-LISP arguments are awfully like pro-COBOL:
they all come down to familiarity & investment. Have a look at ML for a
clean, modern functional language with fast implementations.

> > In an interactive debugging mode, Lisp is almost always much slower.
> 	
> In interactive debugging, LISP is almost always faster, since Fortran
> programmers are trapped in the slow and expensive edit/compile/run cycle.
> A typical Lisp system's mixture of interpretation or compilation (often
> incremental compilation as a third alternative) is much more efficient of
> both machine and human time.

This is a confusion between a language, its implementation and its development
environment. Compilation and interpreation are implementation techniques which
can be used with any language. Similarly, any language can be implemented in
an interactive or 'batch' environment. For example, does using Pascal with
EMACS make Pascal interactive? Does typing straight into 'cc' make C
interactive? Does running LISP from punched cards make it a batch language?
Incidentally, there was an interactive FORTRAN developed around 1975 though
I don't have the details.

barmar@mit-eddie.UUCP (Barry Margolin) (10/15/85)

In article <3785@garfield.UUCP> greg@hwcs.UUCP (Greg Michaelson) writes:
>LISP is slow because its memory model is non-linear and dynamic. FORTRAN
>has linear, static memory which corresponds 100% to VonN architecture.
>Most speed loss in LISP implementations, compiled or interpreted, is due to
>dynamic memory allocation and garbage collection.

I am a Lisp fan, but I mostly agree.  Lisp is inherently slow for some
things because some of its primitives are inherently expensive.

However, numeric, Fortranish applications often do not require much list
allocation.  And many Lisp implementations represent fixnum and flonum
objects as immediate data, rather than as objects allocated in the heap.
In these implementations, numeric applications will do mostly stack
allocations, not consing.  PDP-10 Maclisp, the one whose compiled code
was rated as good as Fortran, has special fixnum and flonum stacks which
are used for this optimization.  Numeric arrays are also very efficient
in most production Lisp implementations, especially if the arrays are
only allocated at the beginning of the run (as in Fortran, not
coincidentally).  The test which produced this comparison presumably
consisted of a Fortran program translated into Lisp, so it seems likely
that it didn't do much expensive consing.

My point is that both the inherent efficiency of a language (if this is
even a reasonable thing to discuss) and the efficiency of an
implementation depend heavily on the types of programs written using it.
Lisp programs are often slow because Lisp promotes a type of programming
that requires the use of dynamic memory allocation and garbage
collection.
-- 
    Barry Margolin
    ARPA: barmar@MIT-Multics
    UUCP: ..!genrad!mit-eddie!barmar

petera@hcrvax.UUCP (Smith) (10/17/85)

	One thing that is ineffecient when using LISP for numeric computations
is that on many machines a new cell is created for a new value of any numeric
type. This means that any loops etc that increment values or do any kind of
numeric computations will result in a trail of old values waiting to be 
collected. I suspect that unless you specificially request that the cell 
itself be modified any LISP implementation of a lengthy numerical operation
will have to do a fair bit of garbage collection.  Bottom Line? A simply
coded numerical algorithm in LISP will require as much or more garbage 
collection than any other LISP program and certainly more garbage collection
than most FORTRAN programs do (ie none).

		Peter Ashwood-Smith
		Human Computing Resources,
		Toronto, Ontario.

dove@mit-bug.UUCP (Web Dove) (10/21/85)

In article <2018@hcrvax.UUCP> petera@hcrvax.UUCP (Smith) writes:
>
>	One thing that is ineffecient when using LISP for numeric computations
>is that on many machines a new cell is created for a new value of any numeric
>type. This means that any loops etc that increment values or do any kind of
>numeric computations will result in a trail of old values waiting to be 
>collected. I suspect that unless you specificially request that the cell 
>itself be modified any LISP implementation of a lengthy numerical operation
>will have to do a fair bit of garbage collection.  Bottom Line? A simply
>coded numerical algorithm in LISP will require as much or more garbage 
>collection than any other LISP program and certainly more garbage collection
>than most FORTRAN programs do (ie none).
>
>		Peter Ashwood-Smith
>		Human Computing Resources,
>		Toronto, Ontario.


You should be aware that the SYMBOLICS 3600 fits 32bit floats directly in
the pointer field of objects, so for single precision there is no 
garbage.  Such a capability does seem to be essential for fast numerical
analysis in lisp.

For your info, I have seen a 1024 complex fft on a 3600 take 120 ms
versus 90 ms for an equivalent C program on a vax 785.

shebs@utah-cs.UUCP (Stanley Shebs) (10/22/85)

In article <2018@hcrvax.UUCP> petera@hcrvax.UUCP (Smith) writes:
>
>	One thing that is ineffecient when using LISP for numeric computations
>is that on many machines a new cell is created for a new value of any numeric
>type.
>		Peter Ashwood-Smith

This is *entirely* a property of the Lisp implementation and therefore
depends only on the intelligence of the implementors.  For instance, PSL
does *not* do any allocation for numbers (except bignums, for which
there's no choice), but Franz allocates storage for any number outside
the range +-1023.  This is partly responsible for the fact that PSL
programs are up to an order of magnitude faster than the same programs
in Franz.  If I wanted to start arguments, I could comment on the
relative merits of BBOP vs tagging....

							stan shebs

petera@hcrvax.UUCP (Smith) (10/24/85)

>   You should be aware that the SYMBOLICS 3600 fits 32bit floats directly in
>   the pointer field of objects, so for single precision there is no 
>   garbage.  Such a capability does seem to be essential for fast numerical
>   analysis in lisp.

    I don't understand? Using the car or cdr field of a CONS cell as a 
floating point number does not stop you from getting garbage because you
still cannot operate on that location. If you do you will change it's
value everwhere that it is referenced (LISP avoids copying lists if possible)
You can use a specific in-place function to do the operation but you 
must know what you are doing. ie:

   (eq 4 (+ 2 2)) is nil because they are not the same floatnum atom.

   If it is nil then we have created garbage by introducing a new value 
for (+ 2 2). If we use the standard (setq) function such as:

   (setq 'n (+ n 1)) then we have created a new value for 'n  and the
old one is still hanging around. This is regardless of how we store the
pointer/float as a union somewhere. As far as I know the only way to avoid
a construct like this causing garbage is to actually alter the floatnum cell
that is bound to 'n at that moment in the alist. Since this 'n is going to
also be referenced by all other copies of the partially evaluated lambda
expression we will incorrectly alter many other function's intermediate
value of 'n. It's like changing the actual value of paramater when we were
only passed it's value.

    Anyway the point I wanted to make was that using a pure LISP or something 
like it you have to watch what you are doing to avoid massive garbage build
up. 
 		Peter Ashwood-Smith
 		Human Computing Resources,
 		Toronto, Ontario.

unilog@g.cs.cmu.edu (Unilogic) (10/24/85)

Close, but not quite.  Limiting the discussion to Vaxen, which seems
reasonable given the comparisons to Franz:

PSL has 3 different representations for integers: INUMs, FIXNUMs, and BIGNUMs.
I will ignore BIGNUMs in this discussion as they are a seperate issue and are
a loadable option in PSL anyway.  FIXNUMs are indeed stored on the heap and
have a tag of xxx (PSL tags are 5 bits).  INUMs are not stored on the heap and
have two distinct tags, xxx (POSINT) and xxx (NEGINT).  Using this
representation, so long as you keep your integers in the range -2^27 to 2^27-1
there will be no use of the heap when performing arithmetic.  In fact, there
is a package which causes the compiler to use in-line machine instructions
(ADDL, SUBL, etc.) if you are "sure" that you will not need any integers >
27 bits.  Every math function must check its result and if it is outside of
INUM range, allocate a longword on the heap and return a tagged address.

						David Gentzel
----------------
ARPA:	unilog@cmu-cs-g
UUCP:	seismo!cmu-cs-g!unilog

munyer@harvard.ARPA (Robert Munyer) (10/26/85)

This is in response to article <2034@hcrvax.UUCP>, from Peter Ashwood-Smith:

>>		You should be aware that the SYMBOLICS 3600 fits 32bit
>>		floats directly in the pointer field of objects, so for
>>		single precision there is no garbage.  Such a
>>		capability does seem to be essential for fast numerical
>>		analysis in lisp.

>	I don't understand? Using the car or cdr field of a CONS cell
>	as a floating point number does not stop you from getting
>	garbage because you still cannot operate on that location...
>
>	(eq 4 (+ 2 2)) is nil because they are not the same floatnum atom...
>
>	If we use the standard (setq) function such as:
>	(setq 'n (+ n 1)) then we have created a new value for 'n and the
>	old one is still hanging around....

You seem to be missing the point.  The idea is that it is possible (and
necessary, if you want to be able to crunch numbers) to implement Lisp
in such a way that adding two numbers does NOT require allocating
storage to represent the new number.  A simplified explanation follows:

The idea is based on the fact that nobody really NEEDS to be able to
use an entire 4 gigabyte address space for pointers to list cells.  So
you can set aside some fraction of this address space, say 128
megabytes, and arrange that no list cells will ever be stored at these
addresses.  You now have about 128 million "unused" addresses which you
can use to represent about 128 million integers.  You simply treat a
pointer which contains one of these addresses as if it were a pointer
to a "numerical atom", although in reality it does not actually "point"
to anything because there is nothing stored at that address.

Now, when you need to represent the number 4, instead of allocating a
block of memory, copying 4 into it, and returning a pointer to that
block, you can instead simply return a pointer to the 4th "unused"
address.  To increment the number represented by one of these pointers,
you simply return a pointer to the following address.  This has the
desirable quality that the Lisp form

	(setq n (add1 n))

can be compiled to the machine language instruction

	increment register 2

, presuming that the variable n is used frequently and that the
compiler is intelligent enough to use registers for frequently used
variables.  If you put your unused addresses at the very bottom and top
of the address space, then you can compile addition, subtraction &c. in
a similarly straightforward manner.

So far we have the Lisp compiler doing basically the same thing the
FORTRAN compiler would do, for "small" integers.  The difference comes
when your numbers get very large.  If you are using FORTRAN and your
variable gets larger than 2,147,483,647, you are out of luck.  Your
program will crash or, worse yet, give incorrect results.  If you are
using one of the newer implementations of Lisp and your variable gets
larger than, say, 67,108,863, it will no longer be represented by one
of these "bogus" pointers.  Instead, a block of memory of sufficient
size to store the bits of the number will be allocated, and the
register will contain a real pointer to this block.

As a result, the Lisp user sees no limitation on the size of numbers
the system can handle, except for a decrease in speed and an increase
in garbage collection when the numbers become "large".

In my opinion, jobs for which Lisp is a better language than FORTRAN
vastly outnumber those for which FORTRAN is better, even if you only
consider those "numerical" jobs which have traditionally been done in
FORTRAN.

		((name "Robert Munyer")
		 (address (uucp  "...allegra!harvard!munyer")
			  (arpa  "munyer@harvard")
			  (Snail "18 Morton St. #2, Somerville MA 02145-4206")
			  (phone "(617)628-8083")))

levy@ttrdc.UUCP (Daniel R. Levy) (10/28/85)

In article <454@harvard.ARPA>, munyer@harvard.ARPA (Robert Munyer) writes:
>This is in response to article <2034@hcrvax.UUCP>, from Peter Ashwood-Smith:
>a similarly straightforward manner.
>
>If you are using FORTRAN and your
>variable gets larger than 2,147,483,647, you are out of luck.  Your
>program will crash or, worse yet, give incorrect results.  If you are
>using one of the newer implementations of Lisp and your variable gets
>larger than, say, 67,108,863, it will no longer be represented by one
>of these "bogus" pointers.  Instead, a block of memory of sufficient
>size to store the bits of the number will be allocated, and the
>register will contain a real pointer to this block.
>As a result, the Lisp user sees no limitation on the size of numbers
>the system can handle, except for a decrease in speed and an increase
>in garbage collection when the numbers become "large".
>In my opinion, jobs for which Lisp is a better language than FORTRAN
>vastly outnumber those for which FORTRAN is better, even if you only
>consider those "numerical" jobs which have traditionally been done in
>FORTRAN.
>		((name "Robert Munyer")
>		 (address (uucp  "...allegra!harvard!munyer")

If your variables are getting that big, and you are programming on a conven-
tional machine in a conventional language like C or Fortran, you should use
the double precision data type.  You will still get accuracy equivalent to that
of using Lisp's oversize integers up to a magnitude of 2^58 or so (depends on
how many bits the exponent takes up on the particular machine).   That's on
the order of 10^17.  I mean how much accuracy do you want???  Also, many
(conventional, caveat here) machines have instructions for automatically hand-
ling the double precision numbers, unlike the juggling which would have to take
place in Lisp simulating this with integers.  And if that doesn't quite meet 
your absolute accuracy requirements, some conventional machines (e.g. VAX)
have a "quadruple precision" data type that is handled atomically by machine
instructions.  I fail to see, in the light of this, where "jobs for which
Lisp is a better language than FORTRAN [or C, or whatever] vastly outnumber
those 'numerical' jobs which have traditionally been done in FORTRAN [etc.]"
--at least on conventional machines.  Of course if you have a socalled Lisp
machine which is designed to atomically handle this kind of stuff, that
throws the whole comparison out the window, you're comparing apples with
oranges.  When you get right down to it, whether you be programming in Lisp,
C, Fortran, Occam, Pascal, or Swahili for that matter :-) what MATTERS is
the resultant machine-level activity.  I doubt that an assembly-language
programmer would choose a Lisp-like approach any more than a Fortran or C-
like approach, on a conventional machine.  Anything higher level than that
is a convenience in interface for the programmer and nothing more sacred
than that.  If you have a marvelous Fortran compiler/optimizer that can
make up for a lot of sins in the language style, same goes for other languages.
I begin to think I am expounding on the obvious so I had better stop here.
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer or the administrator of any computer
| at&t computer systems division |  upon which I may hack.
|        skokie, illinois        |
 --------------------------------   Path: ..!ihnp4!ttrdc!levy