[comp.lang.c++] Common data structures

budd@fog.cs.orst.edu (Tim Budd) (06/11/91)

I was surprized that Jean-Ren Bouvier's request for a simple ``List'' data
structure generated so little discussion.   Surely this must be one of the
most common programs written in C++.  But that is not to say it is easy to
do.  In my book ``An Introduction to Object-Oriented Programming'' I tried
to create just such classes (Chapter 17).  I was somewhat dissatisfied with
the result, and seemingly a few readers were as well.  So recently I
returned to the problem.

My objectives were as follows:

1) The data structures created should be small and easy to understand.

2) Use of the data structure classes should not imply nor prohibit use of 
any other classes.  That is, using the data structure classes should not
force you into using a particular memory management package, or whatever.

3) They should not make any restriction on the type of values they contain.
For example they should not require that all values held in the data
structures be subclasses of a specific class.  [ This was the constraint I
voilated in my original Chapter 17, and one violated in many other examples
I have seen].

4) The solution should *not* use templates, but should be *guaranteed*
type-safe.  That is, it should be possible to guarantee that any casts used
(and clearly some casts will be necessary) are type-safe.
[ Avoiding templates was not a philosophical decision - I merely wanted to
show it could be done, and easily done at that].

5) It should be possible to manipulate the data structures, for example 
iterating over the elements of a list, without any knowledge of the
internal structure of the data structure.

Anyway.  To make a long story short, you can obtain the fruits of my labors
via an automatic mail server.  I've written classes for lists (ordered and
unordered), sets, stacks, queues, trees and tables.  I will probably be
adding to this as times goes on.
Send a note to ``oopintro@cs.orst.edu'' with
subject line ``Generic'' to obtain the code and a latex description.
I would be interested in feedback on either code or document.
Note that mail to ``oopintro'' is not read by human beings, if you want to
reach me personally my address is below.

--tim budd
budd@cs.orst.edu

dag@control.lth.se (Dag Bruck) (06/11/91)

I agree: writing a list package is a non-trivial task, and we should
encourage anyone who's willing to share his/her interpretation of what
it should look like.  Here are three additional suggestions:

1.  Member of multiple lists

In most list packages an element can be the member of only one list
at a time.  For me and my colleagues this is quite unacceptable.  We
often keep a large group objects in a "master" list and then create
temporary sub-lists for some particular processing task.

The straigh-forward use of a base class "Link" which all other classes
are derived from won't work; some sort of indirection is needed.

2.  Constant objects

It should be possible to declare a list of `const' objects.  This may
not be easy depending on how you use the macro processor.  My own list
package does not handle this case gracefully.  A `typedef' may help:

	typedef const T cT;
	List(cT) clist;

(Note that I assume that a real template implementation handles this
case without any additional typedef.)

3.  Const compatibility

If I have a list of Foo objects, I would like to pass it to a function
than takes a list of `const Foo' objects:

	typedef const Foo cFoo;

	void f(const List(Foo) x) { .... }

	void g(List(cFoo) x) { .... }

	void h() {
		List(Foo) fl;
		f(fl);		// no problem
		g(fl);		// hmmm
	}

In function h(), `fl' is a mutable list of mutable Foo objects.  There
is no problem calling f(), which takes a constant list of mutable
objects.  The difficulty is to explain to the compiler that a list of
non-mutable objects is compatible with a list of mutable objects.

Comments would be appreciated.

Dag Bruck
--
Department of Automatic Control		E-mail: dag@control.lth.se
Lund Institute of Technology
P. O. Box 118				Phone:	+46 46-104287
S-221 00 Lund, SWEDEN			Fax:    +46 46-138118

ttoupin@diana.cair.du.edu (Aerin) (06/12/91)

In article <1991Jun11.061402.21934@lth.se> dag@control.lth.se (Dag Bruck) writes:
>1.  Member of multiple lists
>
>In most list packages an element can be the member of only one list
>at a time.  For me and my colleagues this is quite unacceptable.  We
>often keep a large group objects in a "master" list and then create
>temporary sub-lists for some particular processing task.
>
>The straigh-forward use of a base class "Link" which all other classes
>are derived from won't work; some sort of indirection is needed.

I have done something similar in the past.  First, we must have

	class Object {
		private:
			unsigned long int _users;
		public:
			Object() { _users=0; }
			Object(const Object &) { _users=0; }
			~Object() { --_users; }

			void operator delete(void *__mem) {
				if(!(((Object *)__mem)->_users))
					::operator delete(__mem);
			}

			unsigned long int operator++() { return _users++; }
			unsigned long int operator--() { return --_users; }
			operator unsigned long int() { return _users; }

			virtual Object *copy(int =0) =0;
			virtual enum ClassType typeOf() const =0;
			virtual void *delta(Object *__ptr) const { return __ptr; }
			virtual void *delta() { return this; }
	} ;

The copy member should, when given a non-zero value, make a legitimate copy
of the Object (i.e.: in a new spot in memory), or, otherwise, increment the
user count and just return a pointer to the object itself.  The delta functions
are used for casting up from a base to a derived class.

To keep list, use a structure

	class ListUser : virtual public Object {
		private:
			Object *_node;
			ListUser *_next;
		public:
			ListUser(Object *__node,ListUser *__next=NULL) {
				_node=__node;_next=__next; }
			ListUser(const ListUser &__list) {
				if(_node=__list._node) _node=__list.node->copy(-1); // (1)
				if(_next) _next=new ListUser(*_next);
			}
			~ListUser() { if(_node) delete _node; if (_next) delete _next; }

				// Put append, insert, search, etc. operations here

	/* 
	 * The typing information should make the ListUser object act like what-
	 * ever it contains...
	 */
			virtual enum ClassType typeOf() const
			{
				if(_node) return _node->typeOf();
				else return ListUserType;
			}
			virtual void *delta() const
			{
				if(_node) return _node;
				else return this;
			}
		} ;

Finally, a List class should have a ListUser which is the list that the List
is representing.  A copy of a master list in this way could be manipulated
as desired, w/o affecting the order of the master; however, modifications of
the copies affect the master as well.  (There is a lot missing here--if you
want the working source code, send me email.)

>Dag Bruck
>--
>Department of Automatic Control		E-mail: dag@control.lth.se
>Lund Institute of Technology
>P. O. Box 118				Phone:	+46 46-104287
>S-221 00 Lund, SWEDEN			Fax:    +46 46-138118
--
Tory S. Toupin                         |
ttoupin@diana.cair.du.edu              |  Existence toward perfection...
Unversity of Denver                    |      Life of mediocrity!
Undergraduate: Math & Computer Sciences|
Denver, CO  80208                      |         - M. E.

-----
Ceci n'est pas une signature.

bill@robots.ox.ac.uk (& Triggs) (06/12/91)

I would agree with Dag Bruck that any real list package must
let objects be members of multiple lists - there are too
many toy C++ examples and not enough real implementations
about at present!

However if you have a parametrised list class, you can
always use it to store pointers or references to objects
rather than the objects themselves, whereas if you have a
'list-of-pointer' class you can't use it to store real
objects. So any decent list package should allow you to
store an arbitrary data type in the nodes for flexibility,
even though most people will only use it to store pointers.

He also mentions the problem that 'list-of-foo' and
'list-of-const-foo' are completely independent types as far
as the compiler is concerned. This is a bit of a nuisance,
but can be worked around either by (i) deriving a
'list-of-foo' from 'list-of-const-foo' which casts away the
constness of the returned foo's, or (ii) providing a
list-of-foo -> list-of-const-foo cast operator. (i) is
cleaner & reuses code more, but (ii) is likely to be easier
to implement if you have a generic list template class,
since you don't need to figure out which const-foo -> foo
wrappers to build.

Do other people share my feeling that C++ is not quite
making the grade as far as code reuse goes? - All of the
'solutions' I see seem to be convoluted, bulky, and
difficult to use. In particular, too many little 'helper'
and 'wrapper' functions seem to be required for hygiene and
sanity... Maybe we should all be using sm*llt*lk ::-))

Sorry, I'm probably just dumb and I'll wash my mouth out,
but I find smalltalk collections incomprehensible when
they're implemented in C++ - ie NIHCL. Too many tangled
pages of macros and headers for me!

Libg++ is OK I guess, but it would be nicer if you didn't
need to include an entire separate list implementation for
each type of list you have - a better solution would
probably be a really tight & nasty void*-chasing core
implementation with a fashionable replicatable C++/template
wrapper class to interface with it.

The list code would require comparison operations from the
wrapper and would provide raw bytes for the wrapper to
construct list node data objects into (disgusting!), so we
might have to sort out this silly 'constructors are not real
functions' business [ARM 12.1 p 265]. An explicit
constructor call 'bytes->Foo(...)' or
'((Foo*)bytes)->Foo(...)' OUGHT to be a legal construct
creating a Foo in the given (void*) bytes, just as an
explicit destructor call is legal [ARM 12.4].  Having to
overload operator new just to get this behaviour seems silly
to say the least.

I guess the advantages of this approach are not so clear
when you only have lists, but when your application starts
needing multiple copies of B+ or splay trees you begin to
notice the size difference...

--
Bill Triggs
Oxford University Computing Laboratory
11 Keble Rd
Oxford OX1 3QD
U.K.
tel +44-865-273157
fax +44-865-273839

schwartz@karl.cs.psu.edu (Scott Schwartz) (06/13/91)

bill@robots.ox.ac.uk (& Triggs) writes:
   Do other people share my feeling that C++ is not quite
   making the grade as far as code reuse goes? 

I wonder why users of other languages aren't publicly complaining
about having these problems.  There are substantial libraries for Ada
and Eiffel, and their newsgroups are quiet.  ML and Scheme users don't
seem to be complaining.  The Oberon and Modula-3 crowds seem
optimisic.  Why all the noise here?

pcg@aber.ac.uk (Piercarlo Grandi) (06/15/91)

On 12 Jun 91 13:34:50 GMT, bill@robots.ox.ac.uk (& Triggs) said:

bill> I would agree with Dag Bruck that any real list package must
bill> let objects be members of multiple lists

I believe that this particular example is the core of the difficulties
of making a good reuse oriented language. The issues are very subtle,
and only using pointers solves them, if by gross oversimplification, in
current languages. It is true but not sufficient to say that:

bill> However if you have a parametrised list class, you can always use
bill> it to store pointers or references to objects rather than the
bill> objects themselves, whereas if you have a 'list-of-pointer' class
bill> you can't use it to store real objects.

Yet as to:

bill> - there are too many toy C++ examples and not enough real
bill> implementations about at present!

This is inaccurate; there are whole operating systems and major applications
(in the hudnreds of thousands of line each range) in C++. It is true
though that all these systems have been possible because tha authors,
when in tight spots, have solved them by cheating, like in C (casts and
other horrible things :->).

bill> Do other people share my feeling that C++ is not quite making the
bill> grade as far as code reuse goes? - All of the 'solutions' I see
bill> seem to be convoluted, bulky, and difficult to use.

bill> In particular, too many little 'helper' and 'wrapper' functions
bill> seem to be required for hygiene and sanity...

I agree with you; I think that there is little understanding of typing
and modularization issues inc omputer science. I understand them better
than most, IMNHO, and I am amused by all the searching in the dark that
goes on.

bill> Maybe we should all be using sm*llt*lk ::-))

Smalltalk, and other similar languages, resolve many issues by simply
using references and late binding everywhere, which is bit overkill. The
challenge is not to build a language that works in the general case, but
one that also can be tailored to special cases efficiently. I reckon
that this can only be done with symbolic evaluation techniques
(supercompiling, and other earlier concepts).

In any case C++ is unique in being, bey design, an OO systems
implementation language. This means that it must be lean enough to allow
you to write a production Smalltalk implementation in it. This is a big
constraint, and a big challenge. Too bad that just like with C, C++ is
being used also in application contexts where a less systems oriented
language, for example Objective C, could be used.

bill> Libg++ is OK I guess, but it would be nicer if you didn't need to
bill> include an entire separate list implementation for each type of
bill> list you have - a better solution would probably be a really tight
bill> & nasty void*-chasing core implementation with a fashionable
bill> replicatable C++/template wrapper class to interface with it.

How do you think are template classes implemented in most C++ compilers
that have them? Just like LIBG++ does, a simple preprocessor, just like
generics in Ada. A sumbolic reduction compiler could do much better.
Some similar technology has been described, but I am not sure it has
been implemented.

bill> [ ... ] so we might have to sort out this silly 'constructors are
bill> not real functions' business [ARM 12.1 p 265]. An explicit
bill> constructor call 'bytes->Foo(...)' or '((Foo*)bytes)->Foo(...)'
bill> OUGHT to be a legal construct creating a Foo in the given (void*)
bill> bytes, just as an explicit destructor call is legal [ARM 12.4].

It is legal indeed. C++ here is very unelegant; a difference is made
between the constructor *syntax* and the constructor *function*, and
between the new operator and 'operator new'. In particular by using the
so called placement syntax for 'operator new' one can invoke just the
constructor. In other words while

	new Foo(...)

is an invocation of the new operator that is equivalent to (if it were
legal):

	(Foo::operator new(sizeof (Foo)))->Foo::Foo(...)

this invocation of the new operator (where 'bytes' is some suitable
pointer):

	new(bytes) Foo(...)

does not imply any call to 'operator new', and is equivalent to (if it
were legal):

	((Foo *) bytes)->Foo::Foo(...)

as you require. The placement syntax is used to construct object for
which space has already been allocated. This is typically used for
shared memory or otherwise persistent objects which are to be created at
a specific address.

Admittedly the distinction between the new operator and 'operator new'
is a bit subtle and confusing... The ice is particularly thin here :-).
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk

tmb@ai.mit.edu (Thomas M. Breuel) (06/19/91)

In article <PCG.91Jun14183952@aberdb.aber.ac.uk> pcg@aber.ac.uk (Piercarlo Grandi) writes:
   bill> I would agree with Dag Bruck that any real list package must
   bill> let objects be members of multiple lists

   I believe that this particular example is the core of the difficulties
   of making a good reuse oriented language.

The problem is that C++'s version of polymorphism is rather inconvenient
to use.

In C++, you must introduce type names in templates and specify them at
the point of use even if the compiler has enough information to infer
this information (I don't have a C++ compiler with templates, so the
syntax that your C++ implementation uses for templates may differ
slightly from what I am using in this example):

	template<base> list<base> map(base (*f)(base),list<base> l) ... ;
	...
	map<float>(f,l);

By contrast, in ML, you can simply write:

	fun map(f,l) = ...
	...
	map(f,l)

or
	fun map(f:'base -> 'base,l:'base list): 'base list = ...
	...
	map(f,l)

Note that when using the function "map" in ML, you never have to
specify what the value of the type variable "'base" actually is;
the type checker figures it out by itself.

This means that in ML you only need to specify types if you want a
function to be monomorphic rather than polymorphic, or to resolve
ambiguities in operator overloading.

In fact, Scheme and Lisp go one step further: "generic" arithmetic and
other "generic" operations are provided via dynamic typing rather than
via overloading, so you don't ever need to declare types to resolve
overloaded functions.

   The issues are very subtle,
   and only using pointers solves them, if by gross oversimplification, in
   current languages. It is true but not sufficient to say that:

ML certainly does not force the programmer to use pointers in order to
achieve polymorphism (in fact, ML does not have "pointers" in the
usual sense).  However, many implementations of ML use pointers
internally to implement polymorphism.

But, implementations that are concerned with optimizing data space and
execution time at the cost of program size and compilation time do not
need to be based on pointers; my understanding is that SISAL 2.0 is an
example of a polymorphic language that does not use pointers for
implementing polymorphic functions (it probably generates different
versions of functions for different argument type patterns).

In summary, statically typed languages with efficient implementations
of polymorphism are possible. The C++ programming language, however,
was not designed with this in mind, and I doubt if this is easily
fixable. In particular, polymorphism and overloading interact in
unpleasant ways.

If you want the convenience of a fully polymorphic type system, you
are probably better off using a language that was designed with such a
type system in mind from the ground up.

					Thomas.

hitz@csi.uottawa.ca (Martin Hitz) (06/19/91)

In article <TMB.91Jun18235004@volterra.ai.mit.edu> tmb@ai.mit.edu (Thomas M. Breuel) writes:
>The problem is that C++'s version of polymorphism is rather inconvenient
>to use.
>
>In C++, you must introduce type names in templates and specify them at
>the point of use even if the compiler has enough information to infer
>this information (I don't have a C++ compiler with templates, so the
>syntax that your C++ implementation uses for templates may differ
>slightly from what I am using in this example):
>
>	template<base> list<base> map(base (*f)(base),list<base> l) ... ;
>	...
>	map<float>(f,l);
>
According to the ARM, the example should read:

	template<class base> list<base> map(base (*f)(base),list<base> l);
	
	float f (float);
	list<float> l;
	...
	map(f,l);

Note that at the call to map(), the compiler DOES infer the correct type.

Martin Hitz@csi.uottawa.ca