[comp.lang.lisp] cdr encoding

jack@cca.CCA.COM (Jack Orenstein) (07/17/87)

In article <19747@ucbvax.BERKELEY.EDU> larus@paris.Berkeley.EDU(James Larus) writes:
>Luddy Harrison ... has done some very nice work on
>using vector-like structures to store lists. ...
>Certain common operations, such as APPEND, become very fast and efficient.
>It's main disadvantage is that RPLACA and RPLACD are outlawed because they
>screw up the structure-sharing.  ...

I can see why RPLACD causes trouble, but why RPLACA? Suppose that P
is a pointer to a "cell" in the middle of some list represented by
a vector. P actually points to the location of the car (i.e. one
of the vector components). (RPLACD P ...) won't work because the
cdr field is at the next location in physical memory. (RPLACA P ...)
just updates the contents of the vector component.

If C is an ordinary cell, then I don't see why (RPLACA C P) or
(RPLACD C P) would cause any trouble.

Am I missing something?

Jack Orenstein

larus@paris.Berkeley.EDU.berkeley.edu (James Larus) (07/18/87)

In article <17815@cca.CCA.COM> jack@CCA.CCA.COM.UUCP (Jack Orenstein) writes:
>In article <19747@ucbvax.BERKELEY.EDU> larus@paris.Berkeley.EDU(James Larus) writes:
>>Luddy Harrison ... has done some very nice work on
>>using vector-like structures to store lists. ...
>>Certain common operations, such as APPEND, become very fast and efficient.
>>It's main disadvantage is that RPLACA and RPLACD are outlawed because they
>>screw up the structure-sharing.  ...
>
>I can see why RPLACD causes trouble, but why RPLACA? Suppose that P
>is a pointer to a "cell" in the middle of some list represented by
>a vector. P actually points to the location of the car (i.e. one
>of the vector components). (RPLACD P ...) won't work because the
>cdr field is at the next location in physical memory. (RPLACA P ...)
>just updates the contents of the vector component.
>
>If C is an ordinary cell, then I don't see why (RPLACA C P) or
>(RPLACD C P) would cause any trouble.
>
>Am I missing something?

Yes, but it is not your fault.  I didn't explain Luddy's idea fully
enough to do it justice.  His scheme stores the number of elements in
a list along with a pointer to the list (the number is important for
scheduling multiprocessors tasks).  Suppose I have two lists:

	(setq A (list 1 2 3))

	(setq B (list 4 5 6))

The are roughly represented:

A:  list, 3  -->  |1|2|3|

B:  list, 3  -->  |4|5|6|

Now, suppose I do (setq C (append A B)), I end up with the following 2
(not 3!) structures:

A:  list, 3 --|
	      |--> |1|2|3|--|
C:  list, 6 --|             |
                            |
                            |
B:  list, 3  ---------------|-->  |4|5|6|


I can't really do it just without better drawings.  Needless to say,
you can see that because of the large amount of sharing that results,
any RPLACA or RPLACD can affect other, unrelated lists.  The sharing
has many advantages for the intended uses of these structures and
side-effect-producing operations like RPLACA just produce headaches on
multiprocessors, so the best course is to outlaw them.

/Jim

weening@Gang-of-Four.Stanford.EDU (Joe Weening) (07/19/87)

Actually, in [1, p. 38] Harrison says

    "While the operation RPLACA is no different to perform using our
    representation than when using a conventional one, the increased
    level of sublist sharing causes a concomitant increase in the
    likelihood of RPLACAing a shared cell, and makes it practically
    impossible to give a precise semantic definition of the construct.
    ... Nevertheless, because it is so straightforward to perform the
    operation itself using our representation, further investigation
    is needed to decide if RPLACA can be incorporated in an orderly
    way."

I tend to disagree with the first part of this, at least in the case
of a single-process program (or a list structure known to be accessed
by only a single process).  In this case, I think techniques such as
developed in [2] can be used to adequately define the semantics, and
there may be cases where RPLACA is of some use.

Of course, if you want RPLACA to have the exact same semantics as with a
conventional pointer-based list representation, there will be problems.
But I suspect that this is rarely needed.

RPLACD is another story.  Here again I think it could be implemented with
a well-defined semantics, but the resulting operation would be much less
useful than the conventional RPLACD, as Harrison points out, and to do the
standard RPLACD is quite difficult with this representation.

Since this representation was designed for a parallel Lisp, and the parallel
semantics of RPLACA and RPLACD will make them much less useful than the
serial forms, I can see why he chose to remove them from the language.

						Joe Weening

[1] W. L. Harrison, "Compiling Lisp for Evaluation of a Tightly Coupled
Multiprocessor", Center for Supercomputing Research and Development Report
No. 565, Urbana, Ill., March 1986.

[2] I. A. Mason, "The Semantics of Destructive Lisp", Center for the Study
of Language and Information Lecture Notes No. 5, Stanford, Ca., 1986.

harrison@uicsrd.csrd.uiuc.edu (07/20/87)

	Jim has hit the nail on the head.  The storage scheme does not
make RPLACA difficult to perform, but the increase in sublist sharing makes
it difficult to give it a precise meaning.  Whereas with conventional list
representations, the sharing that results from an APPEND operation is
apparent, using the representation I've proposed the sharing that results
is in part a function of previous APPEND operations.  For instance,
in the code fragment

        ...
        (IF CONDITION1 (SET! A (APPEND B C)))
        (IF CONDITION2 (SET! D (APPEND B E)))
        ...

D and B may wind up sharing all of the cells in B's top level, if
CONDITION1 is false; that is, if (APPEND B C) does not occur.  If
(APPEND B C) occurs, then A and B will wind up sharing all of the cells
in B's top level.  It seemed unreasonable to expect a user to be able
to sort out who was sharing what with whom, and so we ended by declaring the
representation unfit for use with RPLACA as well as RPLACD.

Another way of looking at it, as Jim pointed out, is that if RPLACA and RPLACD
are making the life of the parallelizer (human or automatic) miserable, then
why not lop them off, and take full advantage of the immutability of cons cells
by increasing storage efficiency, providing for fast (parallel) access to top
level cells, allowing recurrences over lists to be solved in parallel,
etc?  (In the lisp system we're currently developing, there are lots of
other mutable objects: structures, variables, hashtables, etc., so the loss
of expressive power is relatively minor.)

--
Luddy Harrison,  Center for Supercomputing R&D, U of IL at Champaign-Urbana

UUCP:	 {ihnp4,seismo,pur-ee,convex}!uiucdcs!uicsrd!harrison
ARPANET: harrison%uicsrd@a.cs.uiuc.edu
CSNET:	 harrison%uicsrd@uiuc.csnet
BITNET:	 harrison@uicsrd.csrd.uiuc.edu or harrison@uicsrd

mac@uvacs.CS.VIRGINIA.EDU (Alex Colvin) (07/20/87)

There are uses for infinite (cyclic) list structures.  Many of these
can be constructed by something like a set of simultaneous recursive
equations (see [1,2]), in much the same way that LABELS defines a set
of mutually recursive functions.

For example, an infinite list of 'T is

(LABELS ( (lotsa-t (cons T lotsa-t))
          )
	...don't try to print it
        )


             +------+
lotsa-t --+->| T | ----+
          |  +------+  |
          +------------+

a somewhat more tangled version
(LABELS ( (a (list b c))
          (b (list c a))
          (c (list a b))
          )
        ...
        )


    +------------------+
    |                  |
    |  +---+    +---+  |
a --+->|+|+---->|+|/|  |
       +|--+    +|--+  |
        |        |     |
    +---|--------+     |
    |   v              |
    |  +---+    +---+  |
b +-|->|+|+---->|+|/|  |
  | |  +|--+    +|--+  |
  +-|---|----+   +-----+
    |   |    |         |
    |   v    +---+     |
    |  +---+    +|--+  |
c --+->|+|+---->|+|/|  |
       +|--+    +---+  |
        |              |
        +--------------+

You get the idea that these definitions are easier to follow than any
chain of RPLACs.
In particular, what you expect of (labels ((c (list a b)) remains true.

[1]
P. J. Landin.
"A Correspondence between ALGOL 60 and Church's Lambda-Notation: Part I"
Comm. ACM 8(2), February 1965.
Defines LET REC as a mutually recursive LET.

[2]
Morris & Schwarz.  "Computing Cyclic List Structures". Proc. 1980 LISP
Conference.
Shows how to apply this to mutually recursive lists.

What's nice is that the same implementation will work for both data and
function recursion.

mac@uvacs.CS.VIRGINIA.EDU (Alex Colvin) (07/20/87)

Someone once (1980 LISP conference) used the term "Heraclitean"
constructor for CONS.  Heraclitus, you recall, was the pre-Socratic who
"you can't step in the same * twice!" Two CONS calls with EQ
arguments, are not EQ.  I suppose the other (mathematically functional)
version would be a Platonic constructor.

harrison@uicsrd.csrd.uiuc.edu (07/21/87)

>/* Written  4:13 pm  Jul 18, 1987 by weening@Gang-of-Four.Stanford.EDU in uicsrd:comp.lang.lisp */
>
>I tend to disagree with the first part of this, at least in the case
>of a single-process program (or a list structure known to be accessed
>by only a single process).  In this case, I think techniques such as
>developed in [2] can be used to adequately define the semantics, and
>there may be cases where RPLACA is of some use.
>
>Of course, if you want RPLACA to have the exact same semantics as with a
>conventional pointer-based list representation, there will be problems.
>But I suspect that this is rarely needed.
>
>						Joe Weening

It feels odd to argue *against* possible uses for this representation;
nevertheless, here is a pathological use of RPLACA, to illustrate the
problem:

        ...
        (SET! X '(A B C D E))
        (IF CONDITION1 (SET! S (APPEND X R))
        (SET! Y (APPEND X X))
        (RPLACA X 'OOPS)
        ...

If CONDITION1 is true (and R non-null), then after this sequence,
Y = (A B C D E OOPS B C D E); else, Y = (OOPS B C D E OOPS B C D E).
(Y will consist of 5 cells, and will have a length of 10.)
In short, in the presence of RPLACA and RPLACD, APPEND appears to have
side-effects, and RPLACA appears to affect two elements of a 
non-circular list.  Note also that the garbage collector affects sharing:
after S and Y are reclaimed, APPENDing to X may again result in sharing with X.

Probably one could make a rule such as "(RPLACA X Y) is undefined
if the cell X has ever been involved in an APPEND operation;" this may
be the sort of alteration to the semantics that Joe has in mind.
As was mentioned before, omitting the RPLAC operations is, for us, part of
a strategy of parallelization and compilation.

-Luddy Harrison