[comp.sw.components] List ADT example

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/24/89)

From scc@cl.cam.ac.uk (Stephen Crawley):
>> OK, first let me clarify my position.  The primary use for pointers
>> is simply to implement abstract data types -- lists, trees, graphs, etc.;
>> by shielding the user from the details of implementation, we can keep
>> the user from having to worry about manipulating pointers.
> 
> Frankly, I can't see how this is possible.  Suppose I have an ADT 'A'
> that has a generator operation NEW and a destructor function DISPOSE.
> Suppose I want a generic ADT LIST[t: type] that can be used in the
> following way.

    -- create X and Y, two instances of type A;
    -- create a list L, containing objects of type A

    -- loop
    --    append X to the list L
    --    append a constant object of type A to the list L
    --    append X to the list L again
    --    assign the head of list L to the variable Y
    -- end loop;

> The generation of instances of A must be under the control of the 
> application and assignment of A must be handled.  I must be able to
> put an A instance into a list any number of times.
> 
> How do I implement the LIST ADT so that it invokes the A$DISPOSE
> operation at the right time?  When is the right time?  What happens
> if I want to put an instance of A into 2 LIST's at the same time?

   When you make your instantiation of the list type for objects of
   type A, you must fulfill the instantiation requirements.  This will
   normally include supplying assignment and destruction procedures for 
   objects of type A.  If you send an object into the list, the assignment
   procedure will be used to create a resident copy.  If you direct the
   removal of an object from the list, the destruction procedure will be
   used to destroy it.  (See Booch, "Software Components with Ada") 

   If the type A was an object, then the objects will be created and
   destroyed as appropriate.  If the type A was a pointer, then the 
   pointers will be created and destroyed as appropriate.  It all depends
   on what the user supplies for type A and how the user directs that the
   assignment and destruction operations for type A are to be performed.

   The "generic" part means that the implementor proceeds with respect to
   an arbitrary, user-supplied type which will have the specific operations
   required of it, without worrying about what that type is or how the
   operations over it are to be implemented; these are decisions which 
   the user has agreed to make in exchange for the ADT's services. 
 

   Bill Wolfe, wtwolfe@hubcap.clemson.edu

scc@cl.cam.ac.uk (Stephen Crawley) (09/25/89)

Bill Wolfe writes:
>
>  When you make your instantiation of the list type for objects of
>  type A, you must fulfill the instantiation requirements.  This will
>  normally include supplying assignment and destruction procedures for 
>  objects of type A.  If you send an object into the list, the assignment
>  procedure will be used to create a resident copy.  If you direct the
>  removal of an object from the list, the destruction procedure will be
>  used to destroy it.  (See Booch, "Software Components with Ada")
>

First objection: I don't want to put a copy of the instance of A into the list
twice.  I want to put the SAME instance of A into the list twice.  If
A is a mutable type (i.e. has operations that can change a value) this
is a significant semantic difference.  

Actually, though you SAY "make a resident copy [of the object]" I think 
you MEAN something slightly different; "make a resident copy of the ref
to the object".  Unfortunately, Ada makes a distinction between pointers
and non-pointers and this makes such things hard to describe concisely.  
I'll assume you meant this, and also that the object has an internal 
reference count that is incremented by the ADT's "assign" and decremented 
by its "destroy".  (The ref count is essential if an object is to be appear 
more than once in a list)

Second objection: You do not say how the assign and destruct operations 
get invoked when an instance of A returned by the LIST ADT to the 
application.  In general this requires some cooperatation on the part
of the application.  

In ADA it seems that by declaring A in the ADT as a "limited private" type 
the ADT can force the application to cooperate as far as assignment is
concerned.  (Since limited private types do not support ANY operations except
for those declare by the PACKAGE, the application programmer can be forced 
to apply the ASSIGN or DESTRUCT operation to the result of every A operation 
that returns an A instance in order to get his program to type check.)

However, I can't see how the application is forced to call DESTROY on an A
variable when its declaration scope exits.  [Please correct me if I am wrong 
.. I am NOT an ADA expert.]

If this is the case, then it is a trivial matter to generate an example
where correct storage management depends on the application programmer 
not making a mistake.  In other words, this is not a correct solution.

---

The ADA partial solution to storage management for lists strikes me as 
ugly.  For every ADT A, the corresponding PACKAGE needs to declare two 
types and a bunch of extra operators (ASSIGN, DESTROY and others that
are forced on us by exporting the type of A as a LIMITED PRIVATE)  Finally, 
since := can't be overloaded, the programmer has to use an awkward idiom
to assign an A instance to a variable. 

Then there is the issue of efficiency.  [After all, efficiency IS the 
reason we are not using GC'ed storage...]  Every time an instance of
the ADT A is assigned, we execute the ASSIGN operator which would be
coded something like

TYPE
  A_obj = RECORD [
    ref_count:	int,
    ...
    ]

  A = POINTER TO A_obj

-- Assign: updates the A variable given by 'dest'
-- to hold the A object given by 'source'
assign = PROCEDURE (source: IN A, dest: IN OUT A)
  IF dest ~= 'uninitialised' THEN
    -- If there is already a A value in 'dest', decrement its ref count.
    dest^.ref_count := dest^.ref_count - 1
    IF dest^.ref_count = 0 THEN
      -- deallocate dest^ and all of its fields.
      END
    END
  source^.ref_count := source^.ref_count + 1 
  dest := source
  END assign

That is a LOT of work for a simple assignment!

Contrast it with the GC'ed case where assignment is probably 1 machine
instruction unless you are using some useless reference counting garbage 
collection scheme.

If the ratio of list access to list building operations is large enough 
(I'd guess maybe 10 to 1 if ASSIGN isn't expanded inline), it will be 
MORE EFFICIENT TO USE GARBAGE COLLECTION.

-- Steve

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/27/89)

From article <915@scaup.cl.cam.ac.uk>, by scc@cl.cam.ac.uk (Stephen Crawley):
> Bill Wolfe writes:
>>
>>  When you make your instantiation of the list type for objects of
>>  type A, you must fulfill the instantiation requirements.  This will
>>  normally include supplying assignment and destruction procedures for 
>>  objects of type A.  If you send an object into the list, the assignment
>>  procedure will be used to create a resident copy.  If you direct the
>>  removal of an object from the list, the destruction procedure will be
>>  used to destroy it.  (See Booch, "Software Components with Ada")
>>
> 
> First objection: I don't want to put a copy of the instance of A into 
> the list twice.  I want to put the SAME instance of A into the list 
> twice.  If A is a mutable type (i.e. has operations that can change 
> a value) this is a significant semantic difference.  

   Then what you want is a list of pointers, not a list of objects.

> Second objection: You do not say how the assign and destruct operations 
> get invoked when an instance of A returned by the LIST ADT to the 
> application.  

   The ADT requires the user to supply the operations, and will invoke
   them when appropriate.  When the user requests that something be
   inserted, the ADT will invoke the assignment procedure to create
   a copy.  When the user requests that something be removed, the
   ADT will invoke the Destroy procedure to destroy it.  

   If the thing being stored is an object, then the ADT makes and 
   subsequently destroys its copy of that object.  If the thing
   being stored is a pointer, then the ADT makes and subsequently
   destroys its copy of that pointer.  

> I can't see how the application is forced to call DESTROY on an A
> variable when its declaration scope exits.  

   As mentioned above, the ADT creates and destroys on its own with
   respect to the stored object.  As far as the ADT's destruction
   goes, the automatic destruction upon scope exit is not a part of
   Ada 83; it is presently being considered as an Ada 9X revision. 
   This revision will greatly improve the ability to specify and
   use components which manage their own space.

> [Example of an assignment procedure using reference counting]

   Try writing the assignment procedure without reference counting,
   since nobody ever does reference counting inside assignment anyway...

   Bill Wolfe, wtwolfe@hubcap.clemson.edu

scc@cl.cam.ac.uk (Stephen Crawley) (09/27/89)

I wrote:
>> First objection: I don't want to put a copy of the instance of A into 
>> the list twice.  I want to put the SAME instance of A into the list 
>> twice.  If A is a mutable type (i.e. has operations that can change 
>> a value) this is a significant semantic difference.  

Bill Wolfe replies:
> Then what you want is a list of pointers, not a list of objects.

Huh???  Since when has it been a property of an object that I can only
have one reference to it?  You see to have a particularly restricted
idea of what an object is!  

No Bill.  Given normal use of the term object == instance of an ADT, I 
want what I said I want ... the ability to put the same object into 
a list twice.

Alternatively, if you insist that I use a different generic list ADT 
(say POINTER_LIST), please explain how it does storage management 
without pushing the tricky part (in particular the decision about when
reference counts should be incremented or decremented) back to application
code.  

I wrote:
>> I can't see how the application is forced to call DESTROY on an A
>> variable when its declaration scope exits.  

Bill replies:
> As mentioned above, the ADT creates and destroys on its own with
> respect to the stored object.

Yes Bill ... I understood all of that ... and I understood the mechanism
that ADA uses to force the application to cooperate in most cases.

> As far as the ADT's destruction
> goes, the automatic destruction upon scope exit is not a part of
> Ada 83; it is presently being considered as an Ada 9X revision. 
> This revision will greatly improve the ability to specify and
> use components which manage their own space.

But my point is that the mechanism doesn't exist NOW.  Given that it
is being CONSIDERED for Ada 9X, it is a safe bet that it won't be 
available in production Ada systems for at least 3-5 years.  What
do we do to till then?  Use inferior software components that can't
get space management right for us?

---

I sketched the code for a LIST ASSIGN operation and pointed out that
the reference counting overheads would in many situations be higher 
that the overheads of GC.

Bill Wolfe replies:
> Try writing the assignment procedure without reference counting,
> since nobody ever does reference counting inside assignment anyway...

The generic list ADT problem that I outline requires that SOMETHING 
does reference counting.  It is a direct consequence of the stipulation 
that the generic list must allow instances of an ADT to appear in a list 
more than once.

Now you have been insisting all along that our generic list ADT and our 
list member ADT (say A) can between them deal with all of the storage 
management decisions associated with instances of A.  Clearly we cannot
expect the application to be explicitly involved in reference counting.
In ADA, this means that the ASSIGN operation must do it.  There is no
other option as far as I can see.

Bill if you are going to contradict me on this, please post the complete
ADA source code of the generic list ADT and another ADT that can be used
to instantiate the list.  Be sure it solves the problem as I've specified 
it ... not some trivialisation.

-- Steve

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe, 2847 ) (09/29/89)

From scc@cl.cam.ac.uk (Stephen Crawley):
> if you insist that I use a different generic list ADT 
> (say POINTER_LIST), please explain how it does storage management 
> without pushing the tricky part (in particular the decision about when
> reference counts should be incremented or decremented) back to application
> code.  

    This is the price you pay for wanting to store references instead 
    of objects.  An object can only be in one place at a time.

> But my point is that [auto-destruct] doesn't exist NOW.  Given that it
> is being CONSIDERED for Ada 9X, it is a safe bet that it won't be 
> available in production Ada systems for at least 3-5 years.  What
> do we do to till then?  

    Ada is the result of a language ENGINEERING process.  There was
    a deliberate decision that the definition of the language would
    be held stable for ten years at a time.  This permits certain
    economic assumptions to be made, which was judged to be more
    important than satisfying those who think that new versions of
    a language should be generated every six months.  This is a
    PRODUCTION language, not a RESEARCH language.  It offers a lot
    of advantages (portability, compiler validation, infrastructure)
    not present with other languages, in addition to a very high level
    of support for good software engineering practices.  

    If you really need auto-destruct right away, use a preprocessor.
    This same technique is already being used for inheritance and
    dynamic binding.  And while you're at it, here's an idea which
    was described by Paul Hilfinger (a member of the Ada design team)
    in his ACM Distinguished Dissertation: literal recognition should
    also be specifiable such that a compiler can be made to recognize
    the literals of any type for which a literal representation 
    recognition procedure has been defined during the compilation
    process.  This is another idea which did not emerge until the
    Ada design had already been frozen.

    We are nearing the end of the ten-year period of definitional
    stability, and obviously this is when the number of new ideas
    which have not yet been incorporated is at its largest.  
    

    Bill Wolfe, wtwolfe@hubcap.clemson.edu

scc@cl.cam.ac.uk (Stephen Crawley) (09/29/89)

I wrote:
>> if you insist that I use a different generic list ADT 
>> (say POINTER_LIST), please explain how it does storage management 
>> without pushing the tricky part (in particular the decision about when
>> reference counts should be incremented or decremented) back to application
>> code.  

Bill Wolfe replied:
> This is the price you pay for wanting to store references instead 
> of objects.  An object can only be in one place at a time.

I read this as a tacit admission that you can't come up with
such a list ADT! 

If so, all your arguments about applications not using pointers,
and the hard problems storage management being safely and efficiently
handled by generic ADTs fall apart like a pack of cards.  Sure generic 
ADT's may work for simple cases, but for data structures with multiple 
references the generic ADT can be no more efficient than a modern GC.

If the application programmer wants more efficiency, he could try to
do the storage management himself, using special knowledge about the
application in hand.  But this relies on the special knowledge being 
true knowledge (no mistakes, no mythology), and on the application 
programmer not writing incorrect code that leaks store, or leaves 
dangling pointers, or uses heap nodes after they have been freed, etc.

The GC versus non-GC argument boils down to a tradeoff between correctness 
and reliability on the one hand versus potential for efficiency improvement
on the other.  The correctness and reliability of GC'ed storage management 
is ASSURED.  The potential increased efficiency of non-GC'ed storage 
management is vanishingly small in many cases, and in all but simple cases
depends on non-trivial development effort by applications programmers.

-- Steve