[comp.lang.c++] Do class libraries have to be in source form

mike@taumet.UUCP (Michael S. Ball) (11/20/89)

The question to debate:  Are class libraries distributed in object-only
form useful in encouraging software reuse.

The obvious answer to the question is yes:  object-only libraries are
in wide use already, and if we are willing to restrict reuse to that
level I won't argue too much.  I would like to point out, though,
that there are a large number of successful libraries which are
licensed in source form, and the users of such libraries are often
happy to have source available.

Object oriented programming claims to expand reuse by providing:

1.  Objects with (possibly elaborate) internal state

2.  Objects which we can modify as necessary by inheritance.

We have far less experience with these forms of reuse, and essentially
no experience with reusing such objects in object-only form.

First let's consider specification and documentation.

1.  Pure functions with no state - Relatively easy to document as
    the output is defined strictly in terms of the input.  Formal
    methods are usually simple and are commonly used.  Mathematic
    functions are simple examples.

2.  Functions with state - Much harder to document, and though
    formal methods are available they are seldom used.  The most
    common examples are I/O functions.  The documentation is
    typically incomplete and often confusing.  They are easy
    to use in common cases, but difficult in unusual cases.  They
    tend to accumulate a lot of folklore.

3.  Abstract data types -  Similar to functions with state, but
    frequently simpler and intended to be used as components in a
    program design.  There are formal methods available, but they
    are difficult to write and read and are seldom used.  ADTs are
    frequently simple enough that informal methods are adequate to
    give a good understanding.  Complicated types (like windows)
    can require books.

4.  Base classes for inheritance -  In C++ this would be called the
    "protected" interface.  Since this tends to get into implementation
    it is usually much more complex then the external interface, which
    could be described as an abstract data type.  I've seen no use of
    any formal methods, and I suspect they would be difficult to use.
    The description is typically informal and incomplete.
    Basically, we don't know how to do this very well.

I claim that the limitations on specification alone are enough to
make source necessary for successful derivation from a class.  All
the successful examples I have seen have had source available.  An
article in the Sept/Oct 1989 issue of JOOP, "Object-Oriented Software
Reuse: The Yoyo Problem,"  discusses this issue at some length.  I
will refrain from further arguments on this subject since I want to
get others' opinions.

A second area of concern is debugging.  Since classes are meant to
be used as in integral part of a program it is reasonable to want
to look at the internal state of an object when debugging the
program.  For example, when using a container class, we would like to
be able to observe its contents with the debugger.  We might also
like to be able to set breakpoints when items are added or removed
from the class.  The possibilities are larger than any set of
utility and debugging functions dreamed up by the class designer.
Current debuggers require source for this.  Debugging a derived
class could be even more interesting.  Somehow debugging a class
while being able to observe only one member function sounds hard.

One can come up with special cases where debugging won't require
source of the class, but I think that in general it will go much
better with the source.

There are some other arguments for source which are unique to C++.

1.  Sometimes base classes have to become virtual base classes
    to make a class useful for multiple inheritance.

2.  Sometimes a function should be made virtual when the original
    class designer didn't do so.

3.  Sometimes moving something from the private interface to the
    protected interface can simplify derived classes enormously.

4.  The proposed template design requires source (at some level).

In short, my claim is that achieving maximum reuse with a class
library virtually requires source code.  The need is especially
acute when the user plans to derive from the library classes, and
I have seen no evidence at all that object-only classes are
useful as base classes.  The object-oriented environments where
derivation is heavily used all include source for the library.

Comments? (he says as he ducks rapidly)

Michael S. Ball	     		       email: uunet!taumet!mike	
TauMetric Corporation		       Phone: (619)275-6381
1094 Cudahy Pl. Ste 302		       MCI: TauMetric
San Diego, CA 92110

mike@taumet.UUCP (Michael S. Ball) (11/20/89)

Why do potential vendors of class libraries resist distributing source
code?  Warning:  the following is filtered through my prejudices.

I heard the following reasons at "C++ At Work":

1.  They will steal our source code.  ("They" was left as a free
    variable.)

2.  We can't maintain our code if the user can modify it.

3.  User's shouldn't want to see the source code.  They wouldn't
    understand it anyway.

I can speculate on a couple of other unstated reasons:

4.  We'll loose our mystique.  Once people see how simple the code
    is they will wonder why we charge so much.

5.  Let someone else see how bad my code is?  Horrors!

And a reason that they are so insistent on stating that libraries
should be object only:

6.  If we can persuade the competition not to distribute source
    the users will have no choice.

Only the first two can be argued, the others are simple (though
possibly valid) slander.  Let's take fears of thievery first.

1.  Many companies distribute source libraries with no difficulty.
    Source code for libraries isn't high on the list of hacker-bait,
    and the professional will be constrained by copyright and licenses.

2.  If, as many of these same people claim, all the work is in the
    design, the important data will be in the class interfaces.  The
    implementation will be a minor part of the work, so why hide it.

The maintenance question seems like a complete red herring.  Professionals,
who will be the major users of such libraries, know enough to be
careful when changing code, and would certainly not expect maintenance
for modified code.  Breaking the seal usually voids the warranty.

So what do you think?

Michael S. Ball	     		       email: uunet!taumet!mike	
TauMetric Corporation		       Phone: (619)275-6381
1094 Cudahy Pl. Ste 302		       MCI: TauMetric
San Diego, CA 92110

lisch@mentor.com (Ray Lischner) (11/23/89)

In <177@taumet.UUCP>, Michael Ball puts up a question for debate:
> Are class libraries distributed in object-only
> form useful in encouraging software reuse.

He goes on to make some argument specific to C++:
> 1.  Sometimes base classes have to become virtual base classes
>     to make a class useful for multiple inheritance.
> 
> 2.  Sometimes a function should be made virtual when the original
>     class designer didn't do so.

Well, I agree completely.  Due to the nature of C++, there are a lot
of things the original designer must do differently when using virtual
base classes instead of non-virtual base classes.  It is much easier
to DESIGN a class library without using virtual base classes.  It is
often much easier to USE a class library if it uses virtual base classes.

Without access to the sources, a class library can quickly become useless.
-- 
Ray Lischner        UUCP: {uunet,tektronix,decwrl}!sequent!mntgfx!lisch

bright@Data-IO.COM (Walter Bright) (11/23/89)

In article <178@taumet.UUCP> mike@taumet.UUCP (Michael S. Ball) writes:
<Why do potential vendors of class libraries resist distributing source
<code?  Warning:  the following is filtered through my prejudices.

At Zortech, we distribute full source to the libraries that come with it.
The reasons are:
o	It really ain't that clever. Really, how complicated can strlen be?
o	Many users use the library source as examples of how to interface
	asm code to the compiler.
o	If there is a problem, the user can simply fix the code for his
	application, and continue, instead of waiting for an update.
o	Sometimes (though surprisingly rare), a user will find a bug
	in the library, and will propose a correct fix.
o	The extra money for the library source helps pay the bills.
o	We encourage users to modify the library if they have special
	needs (such as romable applications).
o	Library source helps third party library vendors be compatible
	with us.
The source to the graphics library is excluded from this because of licensing
problems.

<I heard the following reasons at "C++ At Work":
<1.  They will steal our source code.
	Who cares? A capable programmer can easilly disassemble your OBJ
	file anyway. There are several excellent programs that can do this.
	OBJ files are insufficient obfuscation to prevent it.

<2.  We can't maintain our code if the user can modify it.
	Programmers are well aware that if they modify it, they're on their
	own.

<3.  User's shouldn't want to see the source code.  They wouldn't
<    understand it anyway.
	Some of them want it so they *can* learn. Working, functional
	source code is great to teach people, as it is not a contrived
	example out of a book.

<I can speculate on a couple of other unstated reasons:
<4.  We'll loose our mystique.  Once people see how simple the code
<    is they will wonder why we charge so much.
	If it's so simple they would have done it themselves. Professional
	code just never turns out to be so simple. (I've not discovered
	a single customer who has figured out how my floating point code
	works! I don't even know anymore :-) and it's fully commented!).

<5.  Let someone else see how bad my code is?  Horrors!
	If you're that embarrassed by your code, maybe you're in the
	wrong profession!

<And a reason that they are so insistent on stating that libraries
<should be object only:
<6.  If we can persuade the competition not to distribute source
<    the users will have no choice.
	I decide for myself. If I see a competitive advantage to distributing
	source, I'll do it.

I can think of one pessimistic reason why some outfits won't distribute
source: the source was plagarized from someone else, and distributing
source would lay them wide open for a lawsuit. I wonder if this really
happens, and how prevalent it is.

I once had someone try to sell me source to a utility for my
compiler. Something about it looked suspicious (the comments didn't match
what the code was doing), and upon further investigation I discovered that
it was a disassembled version of one of my competitors' products!
Needless to say, I was very upset about that.

Another incident happened a few years ago when somebody tried to get Zortech
to OEM a MASM-compatible assembler. They wanted a lot of up front money.
The closer we looked at it, the more compatible it seemed. In fact, it was
so compatible that the only difference was in the copyright message!

The moral is, there are (unfortunately) crooks out there.

jacob@gore.com (Jacob Gore) (11/23/89)

(I wish this was in comp.object instead of comp.lang.c++.)

I can give a very recent example of how lack of source can be an obstacle
to reusability of a class.  It is a case where the reuse would be in the
form of making a subclass in order to modify the behavior of the objects.

The NextStep (tm NeXT, I bet) Application Kit is a hierarchical class
library (in Objective-C (tm Stepstone)).  It contains class Matrix, which
is a two-dimentional array of graphics objects (specifically, Cells).  It
has one limitation which makes it unusable for what I needed: all Cells
must be of the same size (meaning X and Y size, not memory storage).  In
other words, you cannot use it to display a table, unless all columns are of
the same width and all rows are of the same height.

I considered making a Table class that would be a subclass of Matrix, but
would allow each row to be of its own height and each column of its own
width.  This would involve overriding creation methods (easy), row and
column manipulation methods (easy), display methods (I don't know), and
mouse tracking methods (I don't know), as well as some new instance
variables to support them.

Why couldn't I reuse Matrix this way?  Because of the "I don't know" parts.
Not having the source to Matrix, I would have to either re-invent or
reverse-engineer its display and mouse tracking methods.  That is NOT
reusability.

Jacob
--
Jacob Gore		Jacob@Gore.Com			boulder!gore!jacob

sjb@cs.toronto.edu (Stephen Bellantoni) (11/23/89)

In article <1989Nov22.181203.16204@mentor.com> lisch@mentor.com (Ray Lischner) writes:
>
> Without access to the sources, a class library can quickly become useless.
> 

If this is true then it may be the death knell for object oriented
programming. For, the only reasons to have the source are (1) to look at
it and (2) to change it. 

In case (2) you are not re-using code, you are modifying it. If this is 
neccessary for making the best use of object oriented code, then OO has
failed in its goal of making code re-usable: it merely makes it easier
to re-use by encouraging a more structured (i.e. easier to modify) style.

Case (1) you should be able to handle using proper (and I mean really 
complete) documentation. Thus an argument that case (1) is the reason for
having source code is an argument that, ultimately, source code is the only
complete form of documentation. Equivalently, it is an argument that, 
practically speaking, functionality cannot be separated from implementation.

:stephen bellantoni

schmidt@zola.ics.uci.edu (Doug Schmidt) (11/24/89)

In article <1989Nov23.105650.17030@jarvis.csri.toronto.edu>, sjb@cs (Stephen Bellantoni) writes:
>If this is true then it may be the death knell for object oriented
>programming. 

[Note: this has certainly been the week to predict the end of OOP...,
was there a fire-sale on crystal balls or something? ;-)]

>In case (2) you are not re-using code, you are modifying it. If this is 
>neccessary for making the best use of object oriented code, then OO has
>failed in its goal of making code re-usable: it merely makes it easier
>to re-use by encouraging a more structured (i.e. easier to modify) style.

I believe this terminology conflates two distinct (but related) forms
of code reuse: Black-Box reuse and White-Box reuse.  Your statements
above pertain mostly to Black-Box reuse, i.e., `reusing a source code
component without modifying its internals or interface in any manner.'
Naturally, if you apply the synecdochical argument that:

                        Black-Box reuse == All reuse

your conclusion above is true by definition.

However, White-Box reuse, i.e., `selectively reusing source code
components by editing portions of their interface or implementation'
also facilitates increased programmer productivity and decrease module
fault-proneness (I'm speaking mostly from personal experience here,
but there are references in the software engineering literature that
address these points more empirically).

I'm sure most good C++ programmers maintain a library of reusable
classes and code templates that are recycled, revised, and refined for
each new project (check out the ./etc and ./gperf subdirectories in
the GNU libg++ release for some examples of mine).  Furthermore,
if/when parameterized types and exception handling become part of the
official C++ definition class library designers and application
programmers will receive increased incentive and support to migrate
the results of their ad hoc, `White-Box reusable' hacks into more
formal and complete `Black-Box reusable' componentry.

I assert that at this stage of OOP's (and C++'s) evolution exploiting
both Black- *and* White-Box reuse is healthy and necessary.  After
all, the main point of software reuse is not to become obsessed with
`purity of essence,' but rather to produce higher quality and more
reliable software systems at lower costs, Questing Quixotically after
the holy-grail of pure Black-Box reuse seems rather counter productive
at this stage of development.

Before dispensing with OOP it is probably useful to understand exactly
what we are dealing with!

Doug
--
----------------------------------------
"The fundamental principle of science, the definition almost, is this:
the sole test of the validity of any idea is experiment." -- R. P. Feynman

bs@alice.UUCP (Bjarne Stroustrup) (11/24/89)

Could it be that there is a class of libraries for which source is essential,
a class of libraries for which source is merely useful, and a class of libraries
for which source is mainly trouble? If so, can we at this stage predict which
kinds of libraries will fall into which classes and give guidelines (to people
writing and distributing libraries) for when to ship source?

Two observations:

	Type-safe linkage increases the amount of information available in
	the object code; this ought to help where source is not available.

	Parameterized types (templates) is a source code technology.

shap@delrey.sgi.com (Jonathan Shapiro) (11/28/89)

In article <1989Nov23.105650.17030@jarvis.csri.toronto.edu> sjb@cs.toronto.edu (Stephen Bellantoni) writes:
>For, the only reasons to have the source are (1) to look at
>it and (2) to change it. 
>
>In case (2) you are not re-using code, you are modifying it...

Claptrap.  Modification is a form of reuse.  A *major* form of reuse.
One of the most often overlooked and significant advantages to
object-oriented programming languages is that the things one needs to
think about modifying are all collected together at teh object
definition.

Jonathan Shapiro
Silicon Graphics, Inc.

bturner@hpcvlx.cv.hp.com (Bill Turner) (11/28/89)

><2.  We can't maintain our code if the user can modify it.
>	Programmers are well aware that if they modify it, they're on their
>	own.

I have been involved with a program (internal use) for which I distributed
source.  I don't agree with your assertion.  I got so many calls back asking
to have something fixed that was in the added or modified code...

Also, there is the problem of multiple indirection.  Yes, maybe the people
you gave the code to understand that "they touch, they own," but have they
given the code on to others who now don't realize that things have been
modified from the original author?

(BTW, I do think that having source is useful, and I try to give out source
when I give out the binaries, but I cringe about the support headaches.  I
just can't say "Tough!" when the bug reports come in...)

--Bill Turner (bturner@hp-pcd.hp.com)
HP Corvallis Information Systems

ksand@appleoz.oz.au (Kent Sandvik) (12/07/89)

bturner@hpcvlx.cv.hp.com (Bill Turner) writes in article <102090001@hpcvlx.cv.hp.com>:
      ><2.  We can't maintain our code if the user can modify it.
      >	Programmers are well aware that if they modify it, they're on their
      >	own.

In many times it sure helps to override a method, if you are able to see
how the original method was implemented. IMHO the lack of source code
sometimes lead to misbehaving methods that are overridden. Any comments
on that?

Kent

-- 
Kent Sandvik  --  ksand@appleoz.oz.AU  | Apple Australia DTS  Ph: +61 2 452 82 93
{uunet,mcvax}!munnari!appleoz.oz!ksand | AppleLink: AUSTAUX, Discl: All comments mine
-- CyberSpace, the Final Frontier --