[net.lang] Treating Data Abstractly

Pavel.pa@PARC-MAXC.ARPA@cornell.UUCP (Pavel.pa@PARC-MAXC.ARPA) (09/01/83)

From: Pavel.pa@PARC-MAXC.ARPA
To: net-lang@CORNELL

Hal Perkins mentions that he knows of no clean solutions to the problem
of treating variables abtractly while maintaining independence from the
underlying implementation.

Hal, that seems funny, coming form you.  I think that I know of at least
four languages that seem to meet that criterion and I'm sure you know at
least two of them.  Please tell me what I'm missing, why you're
dissatisfied with these:

1) Smalltalk-80.
	Since I started implementing my work here at PARC in Smalltalk, I have
really come to realise just how well it does this job.  For example, it
is interesting to watch the debate about I/O while sitting here and
using streams in Smalltalk.  I can write code which assumes a parameter
is a stream and make it work without knowing whether that stream is a
file, a byte-stream to another host on the network, or some sequenceable
collection, like a string, or array, or linked-list or whatever.
Furthermore, I \never/ need to make a decision about what is being used;
the same code could be used in the same application on different kinds
of streams at different times.  As long as there is some standard set of
messages which stream-like objects understand (such as nextPut: for a
single object, nextPutAll: for a collection, and perhaps cr and space as
abbreviations for certain nextPut: operations), I can use the protocol
in blissful ignorance of what I'm really talking to.
	As an example, there are two messages (actually many more than two, but
I'm only interested in these two) which are understood by all objects:
		printOn: aStream
meaning, 'please print a representation of yourself on the given
stream', and
		printString
meaning, 'please return a string which is a representation of yourself'.
Almost all classes of objects have a specialised version of the printOn:
message (the default is to simply print the name of the object's class
preceded by 'a' or 'an', as in 'an Array'; not a very useful
representation), but there is only one implementation of printString.
It appears at the top of the class hierarchy, in class Object:
	printString
	"Answer a String whose characters are a description of the receiver."

	| aStream |
	aStream _ String newWriteStream: 16.
	self printOn: aStream.
	^aStream contents

This routine simply makes a stream on a new String object and sends the
printOn: message to print the representation on that stream.  It then
returns whatever was printed on the stream.  In this way, objects can
make exactly one routine to print a representation of themselves,
whether that representation is to go on a file, a string or across the
network.
	Smalltalk gains a lot of advantage from this style of
information-hiding.  The only piece of code which needs to know what
kind of object is really being dealt with is the one that creates it, a
very reasonable point of view.


2. CLU/Mesa/Cedar
	These three languages, far more traditional in their philosophies than
Smalltalk, all take a similar approach to providing the ability to treat
data in an abstract manner, unconcerned with the implementation (It
should be pointed out that the main features of their approach also
appear in other languages, such as Alphard and, more recently and more
well-known, Modula-2).
	In these languages, one creates a 'cluster' (in CLU terminology) or
'module' (in Mesa, Cedar and Modula-2) which is a collection of data and
functions that have complete access to one another but protection from
the outside world.  Part of the specification of a module is the
explicit 'exportation' of certain of the functions for use by the
outside world.  These constitute the so-called 'interface' on the
module.  Frequently these modules represent the implementation of a
data-type, with the exported functions comprising the set of operations
that make sense on that type.  Since programs can only use the
operations on the type that are exported, one can set up a reasonable
interface specification, write much code which uses it and \then/ decide
upon an implementation.  CLU, in fact, uses a library of
implementations, one of which is selected for use only at program
linking time.  I believe Mesa and Cedar have similar mechanisms.


3. Conclusions
	The obvious similarity between the mechanisms used in these languages
is that the operations are part of the data objects (or their
descriptions).  This is in contrast to the approaches taken in Pascal
and C (and many other languages) in which a description of a data-type
only talks about the components and structure of that type.  To my mind,
the components and structure (i.e. the implementation of the type) are
exactly what should \not/ be visible to clients of the type.  (This is
certainly not a new point of view, just less well-known than it
deserves.)

	So tell me, Hal, what are these languages missing in the way of data
abstraction that you'd like to see?  For their general approach (i.e. a
procedural specification of the program), they seem to do the job as
well as anything that springs to mind.

	Pavel Curtis
	Xerox PARC, Software Concepts Group
	{decvax | vax135 | allegra | ...}!cornell!pavel		(UUCP)
	Pavel@Cornell		(ARPA, CSNET)

hal@cornell.UUCP (Hal Perkins) (09/02/83)

Yes, I do know about Smalltalk, Mesa, CLU, Modula-2, and whatnot.  These
don't seem to do what I want, but I must tread lightly here.  I am not a
certified expert in any of these languages and still have some reading to
do, so I will speak in vague generalities and try to avoid putting my foot
too far down my throat.

All of these languages do a decent job of separating the specification of
a module from the implementation.  What they don't do (CLU might, but as
I said, I still need to look into it) is handle the problem of multiple
representations of the same abstract type simultaniously.  For example,
suppose I have two implementations of complex variables:  one using polar
coordinates the other using rectangular coordinates.  And suppose I want
to have some of my data use one representation and other data use the
other one.  While the languages Pavel mentions make it easy for me to
write a program in terms of a single abstract type (complex) and later
specify an implementation (rectangular or polar), I don't think it is easy
or transparent to use several different implementations of the same abstract
type at once.  This is the same problem as having some arrays in main
storage and others in the Unix or OS/360 file system.

As for Smalltalk, the paradigm of sending messages to objects does indeed
seem to solve the problem at a conceptual level.  But there are two things
I worry about here:  One is efficiency--doesn't it require a fair amount
of overhead to access an array element by sending a message to an array
object?  It would seem that no matter how cleverly this is implemented,
it wouldn't be as fast as executing a half-dozen instructions to compute
a memory location and load the desired array element.  I realize that it
is not fashionable to worry about efficiency (particularly when you have
a dorado on your desk), but I don't want to exclude it from consideration.
Secondly, Smalltalk lives in a world of its own.  I'm hoping that the
ideas in it can be applied in other settings and other languages.


Anyway, I'm getting a little beyond my depth.  I think I'll avoid any
further comments on Smalltalk, Mesa, and CLU until I've learned more
about them.


Hal Perkins                         UUCP: {decvax|vax135|...}!cornell!hal
Cornell Computer Science            ARPA: hal@cornell  BITNET:  hal@crnlcs