[comp.lang.c++] About Lists and things...

chip@tct.com (Chip Salzenberg) (06/24/91)

According to djones@megatest.UUCP (Dave Jones):
>From article <285E75EC.37E7@tct.com>, by chip@tct.com (Chip Salzenberg):
>> According to djones@megatest.UUCP (Dave Jones):
>>> while( member = (SomeType*)List_iter_next(&iter))
>> 
>> This is exactly the downcast hackery that I avoid at all costs.
>
>And that is exactly the kind of derisive pedantry that I avoid at all
>costs.

Better pedantic than unsafe.

>Even though the above technique has proved perfectly adequate in C-code
>for six years, my posting asked for a better way to do it in C++.

That technique was never "perfectly adequate," but in C there's little
alternative.

>Do you have any suggestions, or only insults?

My suggestions had been enumerated before.  But, to repeat:

   Never design code that requires downcasts.
      Or, in other words: if the static type is important, why worry
      about regaining it?  Don't lose it in the first place.
   
   Use templates.
      If real templates are unavailable, use the preprocessor.

   Never, ever create "isa" or "typeof" or "classname" functions for
      any purpose other than debugging.  Any time you would depend on
      such a function, create a virtual function that does what you
      really wanted done in the first place.

Comments?
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.com>, <uunet!pdn!tct!chip>
 "You can call Usenet a democracy if you want to.  You can call it a
  totalitarian dictatorship run by space aliens and the ghost of Elvis.
  It doesn't matter either way."  -- Dave Mack

pjg@daedalus.osf.org (Paulo Guedes) (06/25/91)

In article <2865E7A8.179A@tct.com> chip@tct.com (Chip Salzenberg) writes:
[...]
>   My suggestions had been enumerated before.  But, to repeat:
>
>      Never design code that requires downcasts.
>	 Or, in other words: if the static type is important, why worry
>	 about regaining it?  Don't lose it in the first place.
>
>      Use templates.
>	 If real templates are unavailable, use the preprocessor.
>
>      Never, ever create "isa" or "typeof" or "classname" functions for
>	 any purpose other than debugging.  Any time you would depend on
>	 such a function, create a virtual function that does what you
>	 really wanted done in the first place.

While I agree with your suggestions and think they should be part of
normal programming practice, I think there are some kinds of problems
where they don't apply. Let me give you an example derived from my
present work:

I have a system composed by several servers. Each server provides the
implementation for a number of objects. Clients communicate with
servers by RPC. The interfaces to the servers are defined by C++
(abstract) classes. As the result of an operation on a server, an
object may be returned to the client.

One of the servers is the file server, which implements files,
directories, symbolic links, mount points, etc. Without going into
much detail about their interface, all these objects share a common
interface (e.g. all of them can be stat'ed), but some provide
operations that others don't (e.g. a file can be read and written, but
a mount point can't). The basic interface is defined in class Base.
Class File derives from Base and adds the operations specific only to
files, class Directory derives from Base and adds the operations
specific only to directories, etc.

Now, the problem: when I lookup an object by name
obj = open ("someName")
what type should open return ? It can only return type Base, because
at compile time we cannot make any stronger guarantees on the type of
the object that will be returned. However, the real type of the object
will be File, Directory, etc, and *it will be known only at run-time*.
I can use the object as a Base, but not as a File or Directory. In
some cases this is ok (e.g. if I'm just querying its creation time)
but not in others (e.g. read, write). Hence, I need some way of
converting the reference from Base to File, Directory, etc.[*] 

It would be great if C++ offered me this (as Eiffel and Modula-3 do,
as far as I understand), otherwise I just have to create my own (which
I did, based on Keith Gorlen's NIHCL). 

Note that templates don't help here, because this is one case where a
"list" (directory) contains objects of many different types (files,
mount points, symbolic links, etc).

Virtuals could solve the problem, but than, what would happen ? All
the methods would end-up in the base class, with dummy implementations
in the derived classes that would return simply
"MethodNotImplemented". If this solution was adopted, static type
checking would almost disapear because all methods would be valid !
The only errors detected would be calls to non-existing methods and
methods called with wrong parameters, but errors like performing
operations on a file that are only valid on directories would be
detected only at run-time.

Even worse, each time someone invented a new type of object at the
server, the class hierarchy had to be changed to include its new
methods. In client-server programming, you don't want to change your
clients to cope with changes in the servers.

In summary, avoiding downcasts is good programming practice, but we
cannot ignore that some problems just don't fit in this model
(specially when they are *my* problem ! :-)

Paulo Guedes

[*] A similar problem and conclusion was presented in "Adding new code
to a running C++ program" by S. Dorward, R. Sethi and J. Shopiro in
the 1990 C++ Usenix Conference.]
--
---
Paulo Guedes                 Email: pjg@osf.osf.org    Phone: (617) 621-8878
OSF Research Institute       11 Cambridge Center, Cambridge, MA 02142

djones@megatest.UUCP (Dave Jones) (06/25/91)

From article <2865E7A8.179A@tct.com>, by chip@tct.com (Chip Salzenberg):

> Better pedantic than unsafe.

... and I had rather live in a house than stick beens in my ears. But it's not
an exclusive choice is it?

I'm really not desperately worried about the pointer-casting business.
Casting a void* to the wrong type is very unlikely to can cause disasters
in released code -- not if one does even rudimentary testing. I've been in
this business for more years than I like to admit, and I've not yet seen
this kind of mistake make it into a release. What it can do is get you into
a difficult debugging session if you don't spot-test your new code frequently
enough while you're developing it. I have found that such problems can be
almost completely avoided by simply declaring in a comment, at the point
where the object is declared, what type of pointers are stored in the
container. When in doubt, you just find the declaration and read the
comment. Still, one would prefer to have the process automated by the
compiler, which is what prompted this discussion.

> ...   Never design code that requires downcasts.
>       Or, in other words: if the static type is important, why worry
>       about regaining it?  Don't lose it in the first place.

Something tells me this has all been hashed over previously. The "static
type" was not "lost" in the first place. It did not exist in the first place.
The generic container-class may have been written years before the
application's type was defined.

>    Use templates.
>       If real templates are unavailable, use the preprocessor.

That's what I have ended up doing. I've written macros. But to make the
derived classes "type-safe" -- (happier now?) -- the macros had to redeclare
every member-function with a new inline function. That was not too much
trouble with the simple "list" class, but for big ones, that would be
tedious. I hadn't been following the discussion here before the last
couple of weeks, but my initial impression is that I would probably favor
an extension to the language that recognizes the concept of generics.
I'm not yet convinced that "templates" will do the job as
well as could be wished, but my mind is open.

I will take this opportunity to repeat my request: I sure wish someone
would be good enough to send me the "summary" article that I've seen
allusions to. In fact, any intersting articles on this topic would be
appreciated. It is good to know that, "This has all been discussed here
at length before," but would actually like to here the arguments that were
put forward. My mail address is sun!megatest!djones. Thanks.


	Dave

chip@tct.com (Chip Salzenberg) (06/27/91)

According to djones@megatest.UUCP (Dave Jones):
>> ...   Never design code that requires downcasts.
>>         Or, in other words: if the static type is important, why worry
>>         about regaining it?  Don't lose it in the first place.
>
>Something tells me this has all been hashed over previously. The "static
>type" was not "lost" in the first place. It did not exist in the first place.

By "static type" I mean "compile-time type".  Perhaps an example will
clarify my point.

If you put a Circle into a ShapeList, you lose the Circle's
compile-time type, because ShapeList.first() returns |Shape*|.
Suppose that you decide to call a Circle-specific function on a Circle
stored in a ShapeList.  Some would cast the |Shape*| to a |Circle*|
and carry on.  My coding practice would require you to replace the
ShapeList with a CircleList.

Of course, that change may not be practical, perhaps because there are
other non-Circle objects in the ShapeList.  If so, congratulations!
You have just discovered that the function in question is actually a
Shape function in disguise.  You should therefore add it to the Shape
interface, presumably as a new virtual function.

Clearer now?
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.com>, <uunet!pdn!tct!chip>
 "I want to mention that my opinions whether real or not are MY opinions."
             -- the inevitable William "Billy" Steinmetz

fmhv@minerva.inesc.pt (Fernando Manuel Vasconcelos) (06/27/91)

In article <28692A4A.59B7@tct.com> chip@tct.com (Chip Salzenberg) writes:

>By "static type" I mean "compile-time type".  Perhaps an example will
>clarify my point.
>
>If you put a Circle into a ShapeList, you lose the Circle's
>compile-time type, because ShapeList.first() returns |Shape*|.
>Suppose that you decide to call a Circle-specific function on a Circle
>stored in a ShapeList.  Some would cast the |Shape*| to a |Circle*|
>and carry on.  My coding practice would require you to replace the
>ShapeList with a CircleList.
>
>Of course, that change may not be practical, perhaps because there are
>other non-Circle objects in the ShapeList.  If so, congratulations!
>You have just discovered that the function in question is actually a
>Shape function in disguise.  You should therefore add it to the Shape
>interface, presumably as a new virtual function.
>
>Clearer now?

I may be missing the point, however consider two objections to your proposal:

1. A pratical one: You may not be able to change shape.h because it belongs
to a library which you have only in binary ( of course you have the .h's but
you can't change them ... )

2. A conceptual one: That means the interface of a base class depends on the
interface of the derived classes. Using your example it is normal that a 
circle should know to answer it's radius. If I keep all my graphical
objects in a list of shapes* , I'll have to add a getRadius message to
the Shape class, only because a circle knows what that means ... But a GENERAL
shape doesn't .

Only my two cents ...

--
Fernando Manuel Hourtiguet de Vasconcelos  INESC - Instituto de Engenharia de
fmhv@inesc.inesc.pt                                Sistemas e Computadores
mcsun!inesc!fmhv@uunet.uu.net          Rua Alves Redol No 9, sala 208
Tel: +351(1)545150   Ext. 216          Apartado 10105

vinoski@apollo.hp.com (Stephen Vinoski) (06/27/91)

In article <1991Jun27.095856.2@minerva.inesc.pt> fmhv@minerva.inesc.pt (Fernando Manuel Vasconcelos) writes:
>In article <28692A4A.59B7@tct.com> chip@tct.com (Chip Salzenberg) writes:
>>If you put a Circle into a ShapeList, you lose the Circle's
>>compile-time type, because ShapeList.first() returns |Shape*|.
>>Suppose that you decide to call a Circle-specific function on a Circle
>>stored in a ShapeList.  Some would cast the |Shape*| to a |Circle*|
>>and carry on.  My coding practice would require you to replace the
>>ShapeList with a CircleList.
>
>I may be missing the point, however consider two objections to your proposal:
>
>1. A pratical one: You may not be able to change shape.h because it belongs
>to a library which you have only in binary ( of course you have the .h's but
>you can't change them ... )
>
>2. A conceptual one: That means the interface of a base class depends on the
>interface of the derived classes. Using your example it is normal that a 
>circle should know to answer it's radius. If I keep all my graphical
>objects in a list of shapes* , I'll have to add a getRadius message to
>the Shape class, only because a circle knows what that means ... But a GENERAL
>shape doesn't .

The answer to your objections is that you shouldn't be designing your software
so that you lose the "compile-time type" as Chip calls it.

If you're putting a Circle onto a ShapeList, it effectively becomes a Shape.  In
a sense, it is no longer a Circle (though the use of virtual functions allow it
to keep some of its "circle-ness").  Its interface becomes a Shape interface,
not a Circle interface.

If you need a Circle interface, design your software so that you always put your
Circle onto a CircleList.  Only put your Circles onto ShapeLists when you're
dealing with Shapes.

I agree with Chip 300%.


-steve

| Steve Vinoski  (508)256-0176 x5904       | Internet: vinoski@apollo.hp.com  |
| HP Apollo Division, Chelmsford, MA 01824 | UUCP: ...!apollo!vinoski         |
-- 
| Steve Vinoski  (508)256-0176 x5904       | Internet: vinoski@apollo.hp.com  |
| HP Apollo Division, Chelmsford, MA 01824 | UUCP: ...!apollo!vinoski         |

pat@bnrmtl.bnr.ca (Patrick Smith) (06/28/91)

In article <1991Jun27.095856.2@minerva.inesc.pt>, fmhv@minerva.inesc.pt (Fernando Manuel Vasconcelos) writes:
|> In article <28692A4A.59B7@tct.com> chip@tct.com (Chip Salzenberg) writes:
|> 
|> >If you put a Circle into a ShapeList, you lose the Circle's
|> >compile-time type, because ShapeList.first() returns |Shape*|.
|> >Suppose that you decide to call a Circle-specific function on a Circle
|> >stored in a ShapeList.  Some would cast the |Shape*| to a |Circle*|
|> >and carry on.  My coding practice would require you to replace the
|> >ShapeList with a CircleList.
|> >
|> >Of course, that change may not be practical, perhaps because there are
|> >other non-Circle objects in the ShapeList.  If so, congratulations!
|> >You have just discovered that the function in question is actually a
|> >Shape function in disguise.  You should therefore add it to the Shape
|> >interface, presumably as a new virtual function.
|> >
|> >Clearer now?
|> 
|> I may be missing the point, however consider two objections to your proposal:
|> 
|> 1. A pratical one: You may not be able to change shape.h because it belongs
|> to a library which you have only in binary ( of course you have the .h's but
|> you can't change them ... )
|> 
|> 2. A conceptual one: That means the interface of a base class depends on the
|> interface of the derived classes. Using your example it is normal that a 
|> circle should know to answer it's radius. If I keep all my graphical
|> objects in a list of shapes* , I'll have to add a getRadius message to
|> the Shape class, only because a circle knows what that means ... But a GENERAL
|> shape doesn't .


I tend to agree with Chip here.  But that doesn't mean that I would
add a getRadius() method to the Shape class.  If getRadius() only
makes sense for Circles, then the only way you can use it on
an arbitrary Shape is something like this:

		if ( /* *this is a Circle */ )
			// do something with this->getRadius()
		else
			// do something else with *this
			// may be several cases - Squares, Triangles, Blobs, etc.

Even if you define getRadius() for any Shape, if it only makes
sense for Circles, you're going to get code like the above.

The style I prefer is to replace the entire if block with
a single call to a virtual function:

		this->doSomething();

Presumably, "do something" makes sense for every Shape, since
you're doing it for an arbitrary Shape you got from a ShapeList.
And then you can add appropriate member functions for each type
of Shape:

	Circle::doSomething() { /* uses this->getRadius() */ }
	Square::doSomething() { /* uses this->sideLength() */ }

etc.


If you can't change the definition of Shape (or don't want to),
you can put another class of your own in between Shape and the specific
classes:

class MyShape : public Shape { /*...*/ };
class Circle : public MyShape { /*...*/ };
class Square : public MyShape { /*...*/ };

Now you can make doSomething() a method of MyShape.

-- 
Patrick Smith      Bell-Northern Research, Montreal, Canada
(514) 765-7914   bnrmtl!pat@larry.mcrcim.mcgill.edu   patrick@bnr.ca

... Any resemblance between the above views and those of my employer,
my terminal, or the view out my window are purely coincidental.

gary@neptune.uucp (Gary Bisaga x4219) (06/29/91)

This is pretty long, you may want to hit 'n' now.

In article <1991Jun27.175256.3224@scrumpy@.bnr.ca> bnrmtl!pat@larry.mcrcim.mcgill.edu writes:
>In article <1991Jun27.095856.2@minerva.inesc.pt>, fmhv@minerva.inesc.pt (Fernando Manuel Vasconcelos) writes:
>|> In article <28692A4A.59B7@tct.com> chip@tct.com (Chip Salzenberg) writes:
>|> 
>|> >If you put a Circle into a ShapeList, you lose the Circle's
>|> >compile-time type, because ShapeList.first() returns |Shape*|.
>|> I may be missing the point, however consider two objections to your proposal:
>|> 
>|> 1. A pratical one: You may not be able to change shape.h because it belongs
>|> ...
>|> 
>|> 2. A conceptual one: That means the interface of a base class depends on the
> ...
>Even if you define getRadius() for any Shape, if it only makes
>sense for Circles, you're going to get code like the above.
>
>The style I prefer is to replace the entire if block with
>a single call to a virtual function:
>
>		this->doSomething();
I agree, but with the caveat that saying this is often easier than doing it.
The problem in most of these messages (although it was alluded to in a
couple) is that you need to consider what these "objects" really are.  They
really represent real-world things and should be treated accordingly.  One
need to approach the problem from an analysis point of view first, not a
coding point of view.

What is the real problem here?  Well, in trying to make a Shape type
with various sub-shapes under it, we are saying that each of those
sub-shapes is a kind of Shape.  So the interface to Shape must be in
terms that the sub-shapes also understand.  The problem here really is
trying to extract an attribute ("radius") of a Shape, something that
does not in the general case apply.  I have seen the same kinds of
problems come up when trying to start your development effort by
saying: "Ok, I want to have a class called 'Object' at the top ..."
You're starting with design rather than analysis.

So, hopping down off the soapbox, what probably needs to be done is
two things.  First, in the case where you need to know how big the
Shape is, return something that makes sense for a Shape in general.
What is this?  Well, not radius ... unless you define the "radius" of a
Shape to be "The maximum distance taken up by the Shape in any
direction" or something like that.  Perhaps you need a new object type
called Area or Region?  In most cases, where you don't need an
attribute returned, but you need some Real Work done, you should use
member functions as much as possible, as indicated above.

Of course, the situation where you're using a predefined library which
defines a Shape in this manner is a slightly, but not necessarily
hugely, different animal.  First, you could always define a MyShape
class that derives from Shape and just adds a virtual (pure?) GetRegion
function.  In some situations, that would make the most sense.

In others, particularly the list example (my, we've strayed a long way
from the original question, haven't we?) you could make List a member
object of Shape (or MyShape).  Heresy?  No, not if you consider it from
the analysis point of view.  After all, inheritance describes an "is a"
relationship.  A question: Is a Shape a kind of List?  No.  But members
describe a "has a" relationship.  Does a Shape have a List?  Well, sort
of.  Maybe a better question is: Does a newly-defined ListedShape object 
have a List?  Does it have a Shape?  Probably (hopefully) the answer is
yes.

Some cases may not fit any of these "ivory tower" situations.  In those
cases, almost anything seems to me preferable to putting virtual functions
in the base class that don't make sense for some of most of the base's
possible sub-classes.  Just because it is good-looking C++ code (yeah,
lots of virtual functions!) doesn't mean it's necessarily a good solution
to the problem.

chip@tct.com (Chip Salzenberg) (06/29/91)

According to pjg@daedalus.osf.org (Paulo Guedes):
>In article <2865E7A8.179A@tct.com> chip@tct.com (Chip Salzenberg) writes:
>>
>>      Never design code that requires downcasts.
>>	 Or, in other words: if the static type is important, why worry
>>	 about regaining it?  Don't lose it in the first place.
>>
>
>[...]
>
>Now, the problem: when I lookup an object by name
>obj = open ("someName")
>what type should open return ?

Yes, I can see how an NIHCL-like safe-downcast trick would be
appealing in this case.  However, there are alternatives.

One alternative is to allow for the generic open, but also to have
filetype-specific open() calls:

      FsObject  *FsDir::open(const char *);
      FsFile    *FsDir::open_file(const char *);
      FsDir     *FsDir::open_dir(const char *);
      FsSem     *FsDir::open_sem(const char *);

A pitfall of this approach is the race condition between finding the
type of a directory entry and the call to the appropriate open_xxx()
function: the target object could disappear or be replaced, perhaps
with an object of another type.

On the other hand, a locking feature (something like FsDir::lock() and
FsDir::unlock()) would solve this problem, and would also prove useful
for other situations.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.com>, <uunet!pdn!tct!chip>
 "I want to mention that my opinions whether real or not are MY opinions."
             -- the inevitable William "Billy" Steinmetz

chip@tct.com (Chip Salzenberg) (06/29/91)

According to fmhv@minerva.inesc.pt (Fernando Manuel Vasconcelos):
>In article <28692A4A.59B7@tct.com> chip@tct.com (Chip Salzenberg) writes:
>
>>Of course, [a CircleList] may not be practical, perhaps because there are
>>other non-Circle objects in the ShapeList.  If so, congratulations!
>>You have just discovered that the function in question is actually a
>>Shape function in disguise.  You should therefore add it to the Shape
>>interface, presumably as a new virtual function.
>
>1. A pratical one: You may not be able to change shape.h because it belongs
>to a library which you have only in binary ( of course you have the .h's but
>you can't change them ... )

The best way to solve this problem is to avoid it: get source code.
My personal choice is never to derive from a class unless I have
source code for it.  Instead, I compose (create new classes with the
binary-only classes as members).

>2. A conceptual one: That means the interface of a base class depends on the
>interface of the derived classes.

My policy simply acknowledges this obvious fact: as a programmer gains
experience in deriving from class X, her insight into the nature of
class X deepens.  Such insight can lead her to change class X in
accord with her newly gained experience.

To return to Shapes and Circles: A ShapeList can contain anything
derived from Shape, including classes not yet invented at compile
time.  Any code that works with a ShapeList of necessity deals only
with the Shape interface.

Therefore, for you to perform a particular operation on the elements
of a ShapeList, it is apparent that the operation in question must (or
should) be a part of the Shape interface.  If such is not the case,
then the code needs to be recast so the operation in question is not
applied to a ShapeList, but rather to a more specialized collection
such as a CircleList.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.com>, <uunet!pdn!tct!chip>
 "I want to mention that my opinions whether real or not are MY opinions."
             -- the inevitable William "Billy" Steinmetz

schwartz@groucho.cs.psu.edu (Scott Schwartz) (06/29/91)

chip@tct.com (Chip Salzenberg) writes:
   The best way to solve this problem is to avoid it: get source code.

But that completely begs the question.  If you can change all the
sources, why do you need inheritance?  The whole point is to be able
to extend existing immutable types, without having to recompile the
world in the process.

thomasw@hpcupt1.cup.hp.com (Thomas Wang) (06/29/91)

> If you need a Circle interface, design your software so that you always put
> your Circle onto a CircleList.  Only put your Circles onto ShapeLists when
> you're dealing with Shapes.

This restriction may be acceptable to a large class of programmers, but I
still see it as a significant restriction.

It is best when a description of the physical world problem can be translated
directly into object oriented pseudo code.  Consider how real world containers
work:  If I put cookies into a box that can contain everything, I still know
there are cookies inside the box.  They do not suddenly become generic
objects.  Their type information is not lost.

> I agree with Chip 300%.

It's a trade off between type safety, and information loss.

-steve

> | Steve Vinoski  (508)256-0176 x5904       | Internet: vinoski@apollo.hp.com

 -Thomas Wang                                wang@hpdmsjlm.cup.hp.com
              (Everything is an object.)     thomasw@hpcupt1.cup.hp.com