[comp.lang.c] Structured design, goto's, and the holy grail

g-rh@cca.CCA.COM (Richard Harter) (01/29/88)

In article <11528@brl-adm.ARPA> dsill@NSWC-OAS.arpa (Dave Sill) writes:
>In article <22580001@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.EDU writes:
>>[Doug Gwyn, I believe, writes:]
>>>Actually, inter-module communication via global data is contrary to
>>>well-established structured design principles.  It should be used
>>>sparingly or not at all.
>>
>>Bullshit. Inter-module communication via global data is probably the most
>>potent method (to the programmer, not internal to a compiler) available
>>for both space and time optimization. Particularly space.

	Interesting issue.  There is a school that holds that inter-module
communication via global data is undesirable (strong, unstructured coupling.)
This school is wrong.  If procedure invocation is strictly heirarchical,
and all communication is via parameter sequences, then there are large
classes of programs which effectively cannot be written.  If procedure
invocation is non-heirarchical (i.e. you can call a routine from anywhere)
and you can have variables remembered from one invocation to the next,
then a simulation of global data is trivial.

	Heirarchical structure and data localization lead to clean
comprehensible software, when they are applicable.  Some have concluded
that all software should be written this way.  The position is an error;
there are (demonstratably) large classes of programs where this type of
structure is not effective.  

	The problem with global data, the goto, the procedure that can
be called from anywhere, et al, is that these are primitive constructs
that do not have structure associated with them.  The fundamental problem
is lay out principles of structuring that can be used in general, and not
merely in a restricted subclass of problems.

>Yeah, structured design has a habit of outlawing some of the most
>powerful techiques, e.g. the "goto".

	This particular statement is false in its implications, although
technically true.  There is a definite heirarchy of strength of control
structure constructs, running as follows:

(1)	if-then-else and while loops
(2)	forever loops with conditional escapes
(3)	conditional goto's
(4)	subroutine call/return
(5)	computed goto's (case statement)

Stength here means that there are programs which can be written using a
higher level construct which cannot be written using only lower level
constructs with paying either exponential penalties in space or logarithm
penalties in time.

	Forgoing the goto means, at most, loss of micro efficiencies as long
as one retains (4) and (5), which most structured programming advocates do.
The implication of the quoted sentence is that "power" is unstructured and
therefore "bad".  It is not the power of the goto that is "bad" -- there are
more powerful constructs -- it is its obscurity.

	It should be noted that procedure invocation has the same problem
as the goto -- you cannot tell, from the code within the procedure how
you got there.  Modularization is often cited as an important programming
principle, and it is.  But what it really does is push the problem up the
scale.  When you have hundreds or thousands of modules, you have the same
kinds of problems as when you have hundreds or thousands of statement numbers.


>For some odd reason, the
>proponents of structured design seem to feel that it's important to
>make code easy to understand, modify, and maintain.

	And have you stopped beating your wife?  One might well agree
that it is desirable to make code easy to understand, modify, and maintain,
and yet believe that the proponents of structured design offer a pseudo
solution that makes the objective harder to attain because  their 'solution'
is based on a fundamental misunderstanding of the problem.  Or, to put it
bluntly, replacing reason by religion may get you to Heaven, but it is not
the way to produce good software.  [Note: I do not necessarily endorse this
view nor oppose it.]
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

chris@mimsy.UUCP (Chris Torek) (01/29/88)

In article <23777@cca.CCA.COM> g-rh@cca.CCA.COM (Richard Harter) writes:
>... There is a school that holds that inter-module communication
>via global data is undesirable (strong, unstructured coupling.)
>This school is wrong.

I would not quite put it that way.  I think the greatest problem
with global data can be summed up in a single word: encapsulation.

>	The problem with global data, the goto, the procedure that can
>be called from anywhere, et al, is that these are primitive constructs
>that do not have structure associated with them.

Exactly.  The structure must be imposed externally:

>The fundamental problem is lay out principles of structuring that
>can be used in general, and not merely in a restricted subclass of
>problems.

and one way to impose that structure is to reduce the number of
such constructs to something manageable.

>Modularization is often cited as an important programming principle,
>and it is.  But what it really does is push the problem up the scale.
>When you have hundreds or thousands of modules, you have the same kinds
>of problems as when you have hundreds or thousands of statement numbers.

Sure.  Then the problem becomes one of heirarchically making clusters
of modules, with rules for interaction between clusters, and so
forth.  I think both Ada and Modula-2 have problems here; indeed,
I know of no language that provides a really decent solution.
(Look at the problems people have building pipelines on Unix
machines, for instance.  If each utility is considered a module,
we have hundreds of modules available.  Since a utility that
consists of a pipeline is simply another utility, what we did by
making one was to make the module space bigger!)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

ok@quintus.UUCP (Richard A. O'Keefe) (01/30/88)

In article <23777@cca.CCA.COM>, g-rh@cca.CCA.COM (Richard Harter) writes:
> 	Interesting issue.  There is a school that holds that inter-module
> communication via global data is undesirable (strong, unstructured coupling.)
> This school is wrong.  If procedure invocation is strictly heirarchical,
> and all communication is via parameter sequences, then there are large
> classes of programs which effectively cannot be written.  If procedure
> invocation is non-heirarchical (i.e. you can call a routine from anywhere)
> and you can have variables remembered from one invocation to the next,
> then a simulation of global data is trivial.

(1) It's hierarchical ("hieros" = a priest; has nothing to do with "heirs").
(2) May I recommend

	Structured Design: Fundamentals of a Discipline of Computer
	Program and Systems Design
	Edward Yourdon/Larry L. Constantine
	Prentice-Hall 1979
	ISBN 0-13-854471-9

    This book explains various sorts of intermodule coupling quite clearly,
and also some of the problems with them.  The material in it goes back a
few years, but it's none the worse for that.

    In any programming language you have to use the tools you've got.
C's modularity is better than Fortran's (two routines can share a
variable without having to making available to the whole world), but
that isn't saying much.

g-rh@cca.CCA.COM (Richard Harter) (01/30/88)

In article <10381@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <23777@cca.CCA.COM> g-rh@cca.CCA.COM (Richard Harter) writes:
>
>>	The problem with global data, the goto, the procedure that can
>>be called from anywhere, et al, is that these are primitive constructs
>>that do not have structure associated with them.

>Exactly.  The structure must be imposed externally:

>>The fundamental problem is lay out principles of structuring that
>>can be used in general, and not merely in a restricted subclass of
>>problems.

>and one way to impose that structure is to reduce the number of
>such constructs to something manageable.

	I'm not sure what you meant to say here.  If you mean number
as 'number of kinds of such constructs' then I would say no -- one
such construct is sufficient if it is used liberally.  If you mean to
say that the number of instances in a program of 'unstructured
constructs' should be small, then who can argue?

	However, I will contend that the 'numbers' issue is not the
point.  The nice thing about eliminating goto's is that one can eliminate
them without pain.  (In new code, at least.)  The constructs of structured
programming suffice for writing code quite nicely; I would say that they
make it rather more pleasant and easier.  However (assuming that one
breaks thing up into modules of decent size) the issue of flow control
constructs is less important than it is made out to be -- the problems
and issues don't get out of hand because their arena is so small.

	Large scale problems are fundamentally different.  Let me give
a typical example.  We have a program which accepts a file name as an
argument; this is the name of a file which to a report will be written.
In the program there is a package which handles the report; there is
also a routine which processes the arguments.  Who owns the data and
how do we store it?

	We would like the argument cracker to be pure; it shouldn't
know what the file name is about; we want it to send a message to the
report package, here is a file name that you expect.  So far, so good.
But now, we need to give the argument cracker the destination of this
file.  Somebody has to do this.  The problem created by the connection
between these two packages has been pushed back a level, but it remains.
Our program also has a file manager, which keeps track of the files
that the program is using.  Etc.

	So we have all these connections between different packages
that have to know something about this data item.  Maybe we make a
little package just to handle this data item.  Fine.  But then we
need protocols between the various packages that need the data and
the package managing the data.  Which implies that the various packages
have to know about a lot of protocols.

	The point of this dadaist excursion into programming is that
a large system is composed of multiple interconnected subsystems.
The problem of 'global data' is a manifestation of the problem of
organizing large systems.

>>Modularization is often cited as an important programming principle,
>>and it is.  But what it really does is push the problem up the scale.
>>When you have hundreds or thousands of modules, you have the same kinds
>>of problems as when you have hundreds or thousands of statement numbers.
>
>Sure.  Then the problem becomes one of heirarchically making clusters
>of modules, with rules for interaction between clusters, and so
>forth.  I think both Ada and Modula-2 have problems here; indeed,
>I know of no language that provides a really decent solution.
>(Look at the problems people have building pipelines on Unix
>machines, for instance.  If each utility is considered a module,
>we have hundreds of modules available.  Since a utility that
>consists of a pipeline is simply another utility, what we did by
>making one was to make the module space bigger!)

	I think the key line here is "I know of no language that
provides a really decent solution.".  I suspect that one has to
go beyond the concept of a programming language to get a decent
solution.  Actually, Unix has done rather well along these lines.
It does provide a common environment and set of interfaces that
works fairly well -- up to a certain size.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

g-rh@cca.CCA.COM (Richard Harter) (01/30/88)

In article <607@cresswell.quintus.UUCP] ok@quintus.UUCP (Richard A. O'Keefe) writes:
](1) It's hierarchical ("hieros" = a priest; has nothing to do with "heirs").

	Color me blushing.  Yes, I know how to spell hierarchy, et al.
I even know the derivation.  The fingers refuse to do what the mind dictates.
One of my maxims of programming is "Never use variable names that you
consistently misspell."

](2) May I recommend
]
]	Structured Design: Fundamentals of a Discipline of Computer
]	Program and Systems Design
]	Edward Yourdon/Larry L. Constantine
]	Prentice-Hall 1979
]	ISBN 0-13-854471-9
]
]    This book explains various sorts of intermodule coupling quite clearly,
]and also some of the problems with them.  The material in it goes back a
]few years, but it's none the worse for that.

	Not to me, you can't.  I have strong prejudices against Yourdon
and his operation.  To be fair, the book you mention is a good book.

]    In any programming language you have to use the tools you've got.
]C's modularity is better than Fortran's (two routines can share a
]variable without having to making available to the whole world), but
]that isn't saying much.

	In prinicple labelled common is Fortran's method for routines
sharing data; life is too short to discuss the, ah, merits of this 
approach.  C's approach is quite nice for writing packages stuffed in
a single file, when the shared data is truly local.  Much better than
nothing.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

chris@mimsy.UUCP (Chris Torek) (01/30/88)

>In article <10381@mimsy.UUCP> I said
>>and one way to impose that structure is to reduce the number of
>>such constructs to something manageable.

In article <23816@cca.CCA.COM> g-rh@cca.CCA.COM (Richard Harter) replies:
>	I'm not sure what you meant to say here.  If you mean number
>as 'number of kinds of such constructs' then I would say no -- one
>such construct is sufficient if it is used liberally.  If you mean to
>say that the number of instances in a program of 'unstructured
>constructs' should be small, then who can argue?

I meant the latter.  (Nice not to have to argue. :-) )

>However, I will contend that the 'numbers' issue is not the point.

But I think, ultimately, it is!  Borrowing your example:

>... the issue of flow control
>constructs is less important than it is made out to be -- the problems
>and issues don't get out of hand because their arena is so small.

(and, being small, they remain comprehensible.)

>Large scale problems are fundamentally different.

I think not:

>... We have a program which accepts a file name as an
>argument; this is the name of a file which to a report will be written.
>In the program there is a package which handles the report; there is
>also a routine which processes the arguments.  Who owns the data and
>how do we store it?
>
>...  The problem created by the connection
>between these two packages has been pushed back a level, but it remains.
>Our program also has a file manager, which keeps track of the files
>that the program is using.  Etc.
>
>	So we have all these connections between different packages
>that have to know something about this data item.  Maybe we make a
>little package just to handle this data item.  Fine.  But then we
>need protocols between the various packages that need the data and
>the package managing the data.  Which implies that the various packages
>have to know about a lot of protocols.

And when the number of protocols expands beyond some limit, it
becomes incomprensible as a whole---just as when the number of
labels in a Fortran program gets out of hand, the structure of a
module dissolves.  While loops and case statements and such provide
a way to keep the structure of modules, without having to break
them into sand-grains rather than building blocks.  Object oriented
message passing models provide a way to add structure to global
interactions, but someday, someone will wonder whether to use a
SemaphoredQueue or a MonitoredSparseArray, or was there a better
type...?  Time to root through the 15 GB index of possible datatypes!

>The problem of 'global data' is a manifestation of the problem of
>organizing large systems.

I think no one really knows how to do this, no matter *what* the
system, when it gets truly enormous.  Large companies and governments
have the same problem.  Hierarchical [my fingers want to type
`heir'!] breakdown works well up to some size, but eventually the
sheer number of levels in the hierarchy becomes a problem.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

g-rh@cca.CCA.COM (Richard Harter) (01/31/88)

In article <10392@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:

>And when the number of protocols expands beyond some limit, it
>becomes incomprensible as a whole---just as when the number of
>labels in a Fortran program gets out of hand, the structure of a
>module dissolves.  While loops and case statements and such provide
>a way to keep the structure of modules, without having to break
>them into sand-grains rather than building blocks.  Object oriented
>message passing models provide a way to add structure to global
>interactions, but someday, someone will wonder whether to use a
>SemaphoredQueue or a MonitoredSparseArray, or was there a better
>type...?  Time to root through the 15 GB index of possible datatypes!

	Yep.  Some expounders of CS (but not Chris) don't seem to
understand this fundamental principle.

>>The problem of 'global data' is a manifestation of the problem of
>>organizing large systems.
 
>I think no one really knows how to do this, no matter *what* the
>system, when it gets truly enormous.  Large companies and governments
>have the same problem.  Hierarchical [my fingers want to type
>`heir'!] breakdown works well up to some size, but eventually the
>sheer number of levels in the hierarchy becomes a problem.

	Chris and I are really saying the same thing (I think).
One way to look at it, very loosely, is this.  Suppose that there
is a limit to the number of things that we can keep track of, call
it N.  Then we can understand any system with N or less items in it.
When we build a large system we use decomposition so that any particular
piece can be understood by understanding the piece and its connections.
This lets us expand the size of the system that we can deal with to
2**N items.

	There are three implications to this idea.  The first is the
implementation of the general scheme -- how to build the pieces and
their connections so that the scheme works.  The second is that the
scheme still fails when the size reaches a much larger limit.  The
third is that we have traded complete comprehension for a limited
comprehension.

	An interesting thought is that, if we want to deal with
really large systems, is that we ought to look at how life works.
Life (biological life, not the game) is a really immense system
that works.  A single cell, considered as a system, is more complex
than our largest programs.  A living animal has trillions of cells,
organized in multitudes of subsystems.  The biosphere has billions
of living creatures, interlocked in the general ecology.  And it all
works.  Maybe if we really understood how the human body works, we
would also know how to create really large systems software.


-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

levy@ttrdc.UUCP (Daniel R. Levy) (02/03/88)

In article <607@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes:
>     In any programming language you have to use the tools you've got.
> C's modularity is better than Fortran's (two routines can share a
> variable without having to making available to the whole world), but
> that isn't saying much.

FORTRASH can achieve a similar effect by having the two routines concatenated,
one of them being an ENTRY, with the "shared data" local to the combined
routine, using SAVE to preserve it between calls.  This is ugly but it sure
works.
-- 
|------------Dan Levy------------|  Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
|         an Engihacker @        |  	<most AT&T machines>}!ttrdc!ttrda!levy
| AT&T Computer Systems Division |  Disclaimer?  Huh?  What disclaimer???
|--------Skokie, Illinois--------|

terry@terminus.UUCP (terry) (02/05/88)

In article <23838@cca.CCA.COM>, g-rh@cca.CCA.COM (Richard Harter) writes:
> A single cell, considered as a system, is more complex
> than our largest programs.


I don't think you have ever seen a "portable" raw device interface for a
piece of communications software for UNIX/PC/MAC/VMS, where we are not
talking something as simple as UUCP or TIP.  One piece of code that runs
unalterd (I don't count #define... the byte-count doesn't change) on about
140 boxes can get pretty complicated if it's still fast...

Not to mention the BUA (Branch on User Alone) some optimizers insert.


>O,o<        Ack!  Thhhhtttppp!

Aren't I a little bored with sideways thingys?


+-----------------------------------------------------------------------------+
| Terry Lambert           UUCP: ...!decvax!utah-cs!century!terry              |
| @ Century Software       or : ...utah-cs!uplherc!sp7040!obie!terminus!terry |
| SLC, Utah                                                                   |
|                   These opinions are not my companies, but if you find them |
|                   useful, send a $20.00 donation to Brisbane Australia...   |
| 'There are monkey boys in the facility.  Do not be alarmed; you are secure' |
+-----------------------------------------------------------------------------+