[net.lang.c] Unix/C program modularity

papke@dicomed.UUCP (Kurt Papke) (10/16/85)

As a manager of programmers and engineers for 5 years and a practicioner
previous to that, I have noticed a disconcerting problem that tends to arise
in software systems designed under the Unix family of operating systems.
Since the vast majority of this software is written in C, this article is
being posted to Net.lang.c (if you feel this newsgroup is inappropriate,
tell me where to go :-))

The problem I have observed is that:

	Applications programs that have been designed to run under Unix
	tend to have a low percentage of re-usable code.

My observations are based on inspection of graphics applications (which
Dicomed is in the business of producing) which tend to be predominantly
user-interface stuff.  I am specificly NOT commenting on code used to
support program development, operating systems, and tools, but rather
applications programs that are used in a graphics production environment.

Why might this be the case ??  Further inspection of much code shows that
applications designed for the Unix environment tend to follow the spirit
of the Unix operating system: design your system as a series of small
programs and "pipe" them together (revelationary!)

As a result of this philosophy to design systems as a network of filters
piped together:

	o Much of the bulk of the code is involved in argument parsing,
	  most of which is not re-usable.

	o Error handling is minimal at best.  When your only link to the
	  outside world is a pipe, your only recourse when an error
	  occurs is to break the pipe.

	o Programs do not tend to be organized around a package concept,
	  such as one sees in Ada or Modula-2 programs.  The programs are
	  small, so data abstraction and hiding seem inappropriate.  Also
	  the C language support for these concepts is cumbersome, forcing
	  the programmer to use clumsy mechanisms such as ".h" files and
	  "static" variables to accomplish packaging tasks.

	o Programmers invent "homebrew" data access mechanisms to supplement
	  the lack of a standard Unix ISAM or other file management.  Much
	  of this code cannot be re-used because the programmer implemented
	  a primitive system to satisfy the needs of this one filter.

Despite all this, the graphics community is settling in on using Unix as
the operating system of choice.

Are we being lulled into using an O/S and language that allows us to whip
together quicky demos to demonstrate concepts, at the expense of long-term
usefulness as a finished product ??

(Speaker steps off soapbox amid a torrent of rotting vegetables)

My secondary intent here is to try to stimulate discussions in this newsgroup
that rise above disputes as to how far to indent your curly braces.  I
welcome counter-examples that would prove me merely mistaken.

	Kurt

guy@sun.uucp (Guy Harris) (10/17/85)

> Are we being lulled into using an O/S and language that allows us to whip
> together quicky demos to demonstrate concepts, at the expense of long-term
> usefulness as a finished product ??

Nothing in UNIX or C *forces* you to write applications in the style you
describe.  It is not appropriate to write every application under the sun as
a collection of filters, and lots of excellent applications running under
UNIX are not so written.

	Guy Harris

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/17/85)

> My observations are based on inspection of graphics applications (which
> Dicomed is in the business of producing) which tend to be predominantly
> user-interface stuff.  I am specificly NOT commenting on code used to
> support program development, operating systems, and tools, but rather
> applications programs that are used in a graphics production environment.

UNIX code comes in several flavors.  I am not familiar with Dicomed's
software design and coding practice.  Maybe it's just not very good?

> Why might this be the case ??  Further inspection of much code shows that
> applications designed for the Unix environment tend to follow the spirit
> of the Unix operating system: design your system as a series of small
> programs and "pipe" them together (revelationary!)

Exactly!  Re-usability is obtained at the higher, process, level.
Many new applications should be produced by combining existing tools
rather than by writing code in the traditional sense.  This works
especially well when one is trying to support a wide and growing
variety of graphic devices.

> As a result of this philosophy to design systems as a network of filters
> piped together:
> 
> 	o Much of the bulk of the code is involved in argument parsing,
> 	  most of which is not re-usable.

The shell is eminently reusable.  Within each process, argument
processing should be done via getopt(), in the standard library.
Beyond that, obviously different processes are going to have
different specific requirements.

> 	o Error handling is minimal at best.  When your only link to the
> 	  outside world is a pipe, your only recourse when an error
> 	  occurs is to break the pipe.

If a subordinate module is not able to perform its assigned task,
it should so indicate to its controlling module.  Error recovery
is best performed at the higher strategic levels.  UNIX processes
indeed do have a simple means of returning error status to their
parents.

> 	o Programs do not tend to be organized around a package concept,
> 	  such as one sees in Ada or Modula-2 programs.  The programs are
> 	  small, so data abstraction and hiding seem inappropriate.  Also
> 	  the C language support for these concepts is cumbersome, forcing
> 	  the programmer to use clumsy mechanisms such as ".h" files and
> 	  "static" variables to accomplish packaging tasks.

There is no need to emulate Ada packages, if module interfaces are
clean and well-defined.  The UNIX process interface usually is.
Within processes, the facilities C provides are generally adequate,
although some prefer to spiffy up intra-process module design via
"classes", "Objective-C", "C++", or some other preprocessing scheme.
We have not felt much need for this in our UNIX graphics work.

> 	o Programmers invent "homebrew" data access mechanisms to supplement
> 	  the lack of a standard Unix ISAM or other file management.  Much
> 	  of this code cannot be re-used because the programmer implemented
> 	  a primitive system to satisfy the needs of this one filter.

It is relatively rare that UNIX applications have to be concerned
with detailed file access mechanisms.  There is as yet no standard
UNIX DBMS, so portable UNIX applications have to either work without
one or provide their own.  Most graphics applications do not need the
complexity of a DBMS, but can work with simple data formats.

> Despite all this, the graphics community is settling in on using Unix as
> the operating system of choice.

That's because it supports rapid development of good, flexible systems
that can be ported widely with little additional expense.

> Are we being lulled into using an O/S and language that allows us to whip
> together quicky demos to demonstrate concepts, at the expense of long-term
> usefulness as a finished product ??

You should make your own decisions.
Do you have a better approach to suggest?

laura@l5.uucp (Laura Creighton) (10/17/85)

In article <637@dicomed.UUCP> papke@dicomed.UUCP (Kurt Papke) writes:
>As a manager of programmers and engineers for 5 years and a practicioner
>previous to that, I have noticed a disconcerting problem that tends to arise
>in software systems designed under the Unix family of operating systems.
>
>The problem I have observed is that:
>
>	Applications programs that have been designed to run under Unix
>	tend to have a low percentage of re-usable code.
>
>My observations are based on inspection of graphics applications (which
>Dicomed is in the business of producing) which tend to be predominantly
>user-interface stuff.  I am specificly NOT commenting on code used to
>support program development, operating systems, and tools, but rather
>applications programs that are used in a graphics production environment.
>
>Why might this be the case ??  Further inspection of much code shows that
>applications designed for the Unix environment tend to follow the spirit
>of the Unix operating system: design your system as a series of small
>programs and "pipe" them together (revelationary!)

Boy!  Maybe I should go work for you.  The applications that I see the most
are these huge megaliths which reivent the wheel all the way down the line.
There is a body of people who think that the best sort of application program
is one that does everything.  As a result you see programs which show the
strain marks as everything including the kitchen sync was jammed in to fit.

People who are working on fairly hostile O/S's like MS/DOS may be absolutely
correct in this perception for their environment, but every time I find a
unix application program that reimplements strcpy **AGAIN** or atoi or
any number of other things...but I digress. 

>
>As a result of this philosophy to design systems as a network of filters
>piped together:
>
>	o Much of the bulk of the code is involved in argument parsing,
>	  most of which is not re-usable.
>

I think that you have missed out on the unix design philosophy here. There
is nothing sacred in filters, per se. A good filter does one job well. If
people are rewriting the argument parsing for every new application, rather
than reusing exiting argument parsing, then either argument parsing cannot
be done by a standard filter, or you have not written the filter you need
yet.  For a long time *all* unix programs did their own argument parsing.
Now we have getopt(3) <and some earlier programs which cannot be conveniently
converted to use getopt, alas>.  Getopt solves the parsing problem -- noone
need ever write an argument parser for a standard unix program again.

If your application programs are such that it is possible that one general
parser could parse all or most of them, then you should write that and then
you will have the reusable code that you want.  If your applications are
not structures this way and cannot be restructureshd this way, then you
will have to write an argument parser for each application. But I fail to see
that you are going to avoid this problem if you write it in any other style -
it seems inherant in the nature of such applications.

>	o Error handling is minimal at best.  When your only link to the
>	  outside world is a pipe, your only recourse when an error
>	  occurs is to break the pipe.

This is *wrong* *wrong* *wrong*.  Most unix programs do not check for errors,
this is true. But this is because the programmers are either sloppy, or do
not know how to check for errors.  See *Real Programs Dump Core* by Ian Darwin
and Geoff Collyer.  I think that this paper was in the winter 84 usenix, but
if I am wrong I am sure that they will both post corrections....

Most unix programs filters do not break at the pipes.  They break because
you run out of file descriptors, or because malloc fails, or because you
cannot open a file for some reason, or because you try to divide by zero.
All of these things can be, and should be checked.  There is nothing in the
unix philosophy which says that you have to be sloppy or lazy about this.

[More and more I am coming to the conclusion that the problem is not sloppiness
or laziness, just sheer ignorance, by the way.  Do the world a favour. Teach a
friend to check the return codes of system calls. Then teach him to use lint.]

>
>	o Programs do not tend to be organized around a package concept,
>	  such as one sees in Ada or Modula-2 programs.  The programs are
>	  small, so data abstraction and hiding seem inappropriate.  Also
>	  the C language support for these concepts is cumbersome, forcing
>	  the programmer to use clumsy mechanisms such as ".h" files and
>	  "static" variables to accomplish packaging tasks.

This is a real deficiency.  However, if you write your filters correctly, you
can view them as packages and treat them the same way. I have never used any
language which has modules for any serious work, but I have often wondered how
useful they actually are.  There are nights when I think that data abstraction
is a virtue because ``real programmers won't use lint'' and its chief virtue is
that it handles your casts for you.  Some modula-2 enthusiasts have agreed with
me about this, and I will probably get a lot of rotten tomatoes from the rest.
But I still don't know how to measure how useful classes and the like are.  I
don't know how to meausre why I like programming in lisp more than programming
in C either, though.
>
>	o Programmers invent "homebrew" data access mechanisms to supplement
>	  the lack of a standard Unix ISAM or other file management.  Much
>	  of this code cannot be re-used because the programmer implemented
>	  a primitive system to satisfy the needs of this one filter.

What you need to do is to select your standard and then write the rest of 
your code to deal with it.  This is not a problem with the unix philosophy, but
a problem because you have not set a standard and required your code use it.

>
>Despite all this, the graphics community is settling in on using Unix as
>the operating system of choice.
>
>Are we being lulled into using an O/S and language that allows us to whip
>together quicky demos to demonstrate concepts, at the expense of long-term
>usefulness as a finished product ??

It depends on how you run your company, of course.  If you do not have
re-usable code then I think that you need to identify what you are rewriting
and then make a standard and comply with it.  If you can't get re-usable
code with unix then I can't see why you expect to get it anywhere else...the
mechanisms seem the same to me.  I may be missing something, but I can't see
what.

In addition there are a fair number of exisiting unix graphics standards in
existence. Couldn't you standardise around one of them?

>
>(Speaker steps off soapbox amid a torrent of rotting vegetables)
>

Well, i don't thinkt hat I was that bad, was I?

-- 
Laura Creighton		
sun!l5!laura		(that is ell-five, not fifteen)
l5!laura@lll-crg.arpa

jeff@isi-vaxa.ARPA (Jeffery A. Cavallaro) (10/18/85)

I have a suggestion:

Have all such programmers read the VMS manual:

	"Guide to Creating Modular Library Procedures"

Then, force them to work on VMS for awhile.

(The torrent of rotting vegetables has just shifted direction)

david@ecrhub.UUCP (David M. Haynes) (10/18/85)

> In <637@dicomed.UUCP>, Kurt Papke writes...
> 
> 	Applications programs that have been designed to run under Unix
> 	tend to have a low percentage of re-usable code.
> 
> Why might this be the case ??  Further inspection of much code shows that
> applications designed for the Unix environment tend to follow the spirit
> of the Unix operating system: design your system as a series of small
> programs and "pipe" them together (revelationary!)

Such an approach implies at least one scenario for this type of code.
Programmer A has been given an assignment and writes small, easily
debugged pieces of code to test the design and general algorithms of
the task. This done, s/he approaches his/her manager for approval of
the design. The mananger says great! It works! Let's leave that code
alone. Here's your next assignment. Voila! Design code becomes production
code in one easy step! (I must have seen this one happen millions of times)

> Are we being lulled into using an O/S and language that allows us to whip
> together quicky demos to demonstrate concepts, at the expense of long-term
> usefulness as a finished product ??

Only when MANGEMENT has not allowed sufficient time/resources for the
implementation of production code and has not set the STANDARDS required for
code to be accepted as production code. Can you say code review?

One of the things UNIX (and C) has allowed programmers and designers is
the opportunity to test concepts and algorithms before committing
major resources to a project. I have been involved in projects where the
end product would not work simply because the basic assumptions made
in the design were not valid, but there was no method for spotting the
problem a priori. UNIX allows designers to, at least, test the concepts
ahead of time.

-david-
-- 
--------------------------------------------------------------------------
They only asked me one question, and 		David M. Haynes
that was, "What is your name?"			Exegetics Inc.
And I got 75% on that one...			..!utzoo!ecrhub!david
[Peter Cook - Beyond the Fringe]

Exegetics Inc. is a legal convenience and does not care what I have to say.
Emerald City Research Inc. is very kind to let me use their machine, but
in no way is even remotely responsible for the stuff I post.

chris@umcp-cs.UUCP (Chris Torek) (10/18/85)

[This is in response to article <637@dicomed.UUCP> by
papke@dicomed.UUCP (Kurt Papke).]

Perhaps I should not speak of it, since I have not been involved in
any of the actual coding, but I believe I know of a counterexample.
The Center for Automation Research (nee Computer Vision Laboratory),
umcp-cs!cvl, has a very large body of reusable code:  the CVL
picture library.  I do not, however, know much about this, so I
well be wrong.

But in any case, I think you have, as the saying goes, lost sight
of the forest for the trees.  Why *should* Unix programmers write
reusable code for each program?  Instead, or perhaps in addition
but more importantly, Unix programmers should---and at times do---
write reusable *programs*.  The very `Unix Philosophy' of which
you speak is that you should create a set of tools which can be
used together to solve many problems, though each tool solves only
a subset of any one problem.

To give an example, however contrived or even erroneous---as I
mentioned, I do not work for CfAR---consider taking a set of picture
files, performing some algebraic transformation on each pixel value,
applying histogram equalization, then halftoning and printing on
an Imagen laser printer:

	for i in *.pict; do
		lop "your operation here" < $i | histeq | ht | pi |
		qpr -q imagen-imp
	done

(I have made up some of these program names; CVL people may correct
me if I have important details wrong.  `lop' stands for Local
Operation on Picture, by the way.)  If instead you need to display
one of these on the Grinnell:

	lop "your operation here" < foo.pict | histeq | ht | put "params"

or without halftoning:

	grey	# Grinnel to B/W display
	lop "your operation here" < foo.pict | histeq | put "params"

The point of all this is that reuse of code itself is unnecessary
if the code is in a separate program.  All you need do insert the
program at the appropriate point in the pipe.

Now, if you are talking about applying the same operation to thousands
of pictures a day, then (and *only* then) you should consider taking
the `guts' of each operation out of each of the programs in question,
building argument and error handling around them, and packaging that
up as an `application'.

I have tried to keep my response within the domain of computer
graphics, as that was the focus of your article; graphics has not
been one of my studies, and I ask those who are more knowledgeable
to forgive glaring errors, or to quitely correct them (i.e., `flames
to /dev/null').  But my point---that Unix *programs* should be
reusable---applies to many domains.  A proper set of tools is an
invaluable asset in any line of work.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

mojo@kepler.UUCP (Morris Jones) (10/18/85)

In article <637@dicomed.UUCP> papke@dicomed.UUCP (Kurt Papke) writes:
>Are we being lulled into using an O/S and language that allows us to whip
>together quicky demos to demonstrate concepts, at the expense of long-term
>usefulness as a finished product ??

Kurt,

Your observations and points in this posting were very astute!  Indeed
they may lead me to a possible scenario:  The most useful and coherent
UNIX applications may not be developed on UNIX originally!

That's certainly the case with WordStar 2000 (soon to be bundled with
every UNIX PC).

-- 
Mojo
... Morris Jones, MicroPro Product Development
{ptsfa,hplabs,glacier,lll-crg}!well!micropro!kepler!mojo

nz@wucs.UUCP (Neal Ziring) (10/18/85)

In article <637@dicomed.UUCP> papke@dicomed.UUCP (Kurt Papke) writes:
> .. I have noticed a disconcerting problem that tends to arise
>in software systems designed under the Unix family of operating systems.
>
>The problem I have observed is that:
>
>	Applications programs that have been designed to run under Unix
>	tend to have a low percentage of re-usable code.

	Even just within a University environment, I have noticed that, too.
However, I don't know if that low percentage is being view in perspective,
and what the low percentage is relative to.

>My observations are based on inspection of graphics applications 
>
>Why might this [low percentage ] be the case ??  Further inspection
>[of code] designed for the Unix environment tend to follow the spirit
>of the Unix operating system: design your system as a series of small
>programs and "pipe" them together (revelationary!)

	Yes, and the fact that Unix provides the elegant and powerful
functionality of pipes, redirection, and signals is a great aid to 
developing powerful software quickly.  But, having powerful toys to
play with does not encourage discipline (like programming in assembler 
on a PDP-11 :-)

>As a result of this philosophy to design systems as a network of filters
>piped together:
>
>	o Much of the bulk of the code is involved in argument parsing,
>	  most of which is not re-usable.

	I would rather see non-portable argument-parsing code than
lose the ability to read command-line arguments.  However, there
are argument parsing library routines on some version of Unix.  If
a programmer re-invents the wheel just to kill time, or because he
doesn't want to read the manual...

>	o Error handling is minimal at best.  When your only link to the
>	  outside world is a pipe, your only recourse when an error
>	  occurs is to break the pipe.
>

	Alas, too true!  This can be alleviated somewhat by having
the parent program monitor things carefully, but recovery would probably
still be expensive.  The solution, in such cases, it to try to have
the filter fail soft (revert to `cat') or perhaps send a tell-tale
signal to it's entire process.  I am waiting for the Ach (shell with an
Ada-like syntax :-) so I can use Exception Handling in my shell scripts.

>	o Programs do not tend to be organized around a package concept,
>	  such as one sees in Ada or Modula-2 programs.  The programs are
>	  small, so data abstraction and hiding seem inappropriate.  Also
>	  the C language support for these concepts is cumbersome, forcing
>	  the programmer to use clumsy mechanisms such as ".h" files and
>	  "static" variables to accomplish packaging tasks.

	Poor organization like that is just plain laziness.  In a 
large applications effort, I think, object libraries should be used
extensively.  I admit that C has no support for data abstraction and
hiding; I wish it did!  You can still set up package-like libraries
without those facilties though -- "static" variables are useful for this.

>	o Programmers invent "homebrew" data access mechanisms to supplement
>	  the lack of a standard Unix ISAM or other file management.  Much
>	  of this code cannot be re-used because the programmer implemented
>	  a primitive system to satisfy the needs of this one filter.

	Laziness again.  The lack of standard high-level I/O facilities
is an invitation to the user to make up facilities that are no more complex
than he needs :-)  For serious applications, there are very good database
products available.  If a programmer is about to implement some standard
data structure (e.g. B-trees) he should consult with his fellows, and
see if they could use it, too!

>Despite all this, the graphics community is settling in on using Unix as
>the operating system of choice.

	Hooray!!

>Are we being lulled into using an O/S and language that allows us to whip
>together quicky demos to demonstrate concepts, at the expense of long-term
>usefulness as a finished product ??

	I don't think so.  Sure, Unix and C have problems.  The have LOTS
of nice aspects too, and many of the problems are shared by other systems
and language.  For instance, try to do data abstraction in Fortran.   Ada
promises to provide all the lovely Computer Science concepts that we all
would like (packaging, object-orientation, parallelism) but so far those
services are not widely available in commercially viable language systems.
Ada has problems, too, but I'll save discussion of them for net.lang.ada.

>My secondary intent here is to try to stimulate discussions in this newsgroup
>that rise above disputes as to how far to indent your curly braces.  

	Good idea.  Any other comments from the net about the points
raised here, and in Mr. Papke's original article?
-- 
========
...nz (ECL - we're here to provide superior computing)
	Washington University Engineering Computer Laboratory

    "Now we'll see some proper action..." 

	old style:	... ihnp4!wucs!nz
	new style:	nz@wucs.UUCP

geoff@utcs.uucp (Geoff Collyer) (10/20/85)

In article <197@l5.uucp> laura@l5.uucp (Laura Creighton) writes:
>See *Real Programs Dump Core* by Ian Darwin
>and Geoff Collyer.  I think that this paper was in the winter 84 usenix, but
>if I am wrong I am sure that they will both post corrections....

"Can't Happen or /* NOTREACHED */ or Real Programs Dump Core" appears in
the proceedings of the Dallas (January 1985) Usenix conference.
Ian gave the talk and I'm told it was much funnier than the paper.

ln63fkn@sdcc7.UUCP (Paul van de Graaf) (10/21/85)

The problem here is not a question of programmers forever reinventing the wheel,
but rather a lack of uniformity of the various flavors of Unix.  Most defensive
programmers take into account that the programs they write might be ported to
v6 or XENIX or even a micro that only has an incomplete implementation of the
stdio package (if that much).  Writing a program with a myriad of fancy pipes
and filters may be faster and easier, but if you don't want to go the extra
mile to write emulations of certain system calls and a lot of ugly #ifdefs, 
you're going to lose if you try to port. 

The GNU project, the ANSI C standardization commitee, and the /usr/group people
are all working on this problem from different directions, so don't expect an
answer too soon.  The GNU effort has great promise, because it could act as a
clearinghouse for all those system-call emulators.  Whether this fits in with
their plans, I can't say, but I'd love to be able to call them up and get the
102nd version of getopt() for my Hal 9000 for a nominal charge.  I kind of feel
the ANSI C people decided that it was boring to standardize C, so they spilled
over into Unix interface standardization.  C and Unix aren't always synonymous,
especially on micros:  witness the Amiga and the Atari ST.  If they don't stay
on track, they may jeapordize their whole standard.  The /usr/group folks talk
a lot, but nobody listens... they're JUST a user group anyway :-)!

Tough decisions need to be made in order to come up with a standard.
In the meanwhile, I'm coding defensively.

Paul van de Graaf	   sdcsvax!sdcc7!ln63fkn		U. C. San Diego

jss@sjuvax.UUCP (J. Shapiro) (10/21/85)

Doug, I have yet (to my recollection) to disagree with you on
anything, but I think I finally do...

The following comments were edited (no applause - just send grant
applications)...

----cast of characters---
> Doug's Comments...
> > someone else, whose name did not appear in Doug's posting (sorry).
-------------------------

> > ...Applications designed for the Unix environment tend to follow the
> > spirit of the Unix operating system: design your system as a series
> > of small programs and "pipe" them together (revelationary!)
> 
> Re-usability is obtained at the higher, process, level.

Doug, you are entirely correct, but this does not negate the case in
favor of lower level reusability.  Go figure out one day how much of
your disk is being used by copies of the C library. Compile a program

	main(){}

using the "-lc" option and multiply this size by the number of
executable files on your system.  I think you will be astonished.
Having modularity at the procedure level can be a great boon,
particularly if you have a loader which links in libraries at run time
and doesn't duplicate code unnecessarily.

> Many new applications should be produced by combining existing tools
> rather than by writing code in the traditional sense.

I agree, in a development environment.  In an applications environment
this leads to systems which are inconsistent, have no reasonable error
messages, are poorly documented, and are confusing as hell in
general.  This is not an intrinsic property of the approach, except to
the extent that the approach does not enforce programmer discipline in
such matters.

> This works especially well when one is trying to support a wide and growing
> variety of graphic devices.

Again, a general purpose device independent graphics library would
probably serve the need better, and would almost certainly allow for
more efficient collapseing of common code used.  It would also narrow
the scope of modifications necessary to support new devices, thereby
helping to minimize support costs.

> > As a result of this philosophy to design systems as a network of filters
> > piped together:
> > 
> > 	o Much of the bulk of the code is involved in argument parsing,
> > 	  most of which is not re-usable.

If your argument parsing is so long, you haven't done it sensibly, or
your argument conventions need to be rethought.  No matter how you
slice the program, It has to take arguments, and the shell is doing
all of the argument division anyway, so there is no added complexity
to speak of involved in the use of pipes.

> > 	o Error handling is minimal at best.  When your only link to the
> > 	  outside world is a pipe, your only recourse when an error
> > 	  occurs is to break the pipe.

This is not really true.  I do agree, however, that error recovery is
substantially harder if your disjoint pieces of code are not properly
modularized.  If you and the next guy in the pipe sequence need to
resynchronize your notion of the data stream, you have invoked a lot
of code AND protocol overhead.  C's modularity and data sharing
facilities are not what they could be.  It is one of the only features
of Modula-2 which I like, though I think the Modula-2 approach is a
pain.

> If a subordinate module is not able to perform its assigned task,
> it should so indicate to its controlling module.

Have you ever tried to implement this using only error(n)?  Talk about
rendering your code complex...

> Error recovery is best performed at the higher strategic levels.

Depends on the kind of error.  This argument comes back to the
modularization point I made above.

> > 	o Programmers invent "homebrew" data access mechanisms to supplement
> > 	  the lack of a standard Unix ISAM or other file management.  Much
> > 	  of this code cannot be re-used...

I know of no database system where speed is important which has ever
been distributed using a standard record support library.  I believe
that you will find that all of the major database products, even under
VMS, ultimately have given up and gone directly to doing their own
thing directly using Block I/O because the provided record structure
facilities, by virtue of the fact that they are general, are
necessarily not well optomized to any particular task.  In particular,
the overhead associated with error checks which your database system
doesn't need is hideous.

On the other hand, a resaonable record structure facility is something
sorely lacking in UNIX, and is very useful when simply trying to get
the code running initially.  It allows you to leave the modifications
for a database hacker and get it running.  For non-critical code, or
code where the critical element is development time and ease of
support, this is crucial.

> It is relatively rare that UNIX applications have to be concerned
> with detailed file access mechanisms.

Is this cause, or effect?

I hope that my replies here are not unduly long.  I went over them,
and I believe that I have made my points succintly: UNIX needs both a
standard database facility and an intelligent notion of run-time
libraries.

Jon Shapiro
Haverford College
-- 
Jonathan S. Shapiro
Haverford College

	"It doesn't compile pseudo code... What do you expect for fifty
		dollars?" - M. Tiemann

papke@dicomed.UUCP (Kurt Papke) (10/23/85)

In article <1898@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:
>[This is in response to article <637@dicomed.UUCP> by
>papke@dicomed.UUCP (Kurt Papke).]
>
>Perhaps I should not speak of it, since I have not been involved in
>any of the actual coding, but I believe I know of a counterexample.
>The Center for Automation Research (nee Computer Vision Laboratory),
>umcp-cs!cvl, has a very large body of reusable code:  the CVL
>picture library.  I do not, however, know much about this, so I
>well be wrong.

You may be, but its probably my fault: one point I may not have made
sufficiently clear is that the domain of applications programs I was
referring to is that of "commercial" code, i.e. that used in a production
environment running 24 hours/day, with no "computer operator" intervention.
From my standpoint, an R&D lab makes a poor counterexample because the user
is assumed to be "computer-literate", manipulating the programs directly.

>But in any case, I think you have, as the saying goes, lost sight
>of the forest for the trees.  Why *should* Unix programmers write
>reusable code for each program?  Instead, or perhaps in addition
 ===============================
Why indeed ??  Because many of us have to write code fragments that
someday may not exist in the Unix environment.  For instance at Dicomed
(and by the way I'm not proud of this) we are currently selling systems
that run under RSX-11m, MS-DOS, Xenix, and RMX-86.

>but more importantly, Unix programmers should---and at times do---
>write reusable *programs*.  The very `Unix Philosophy' of which
>you speak is that you should create a set of tools which can be
>used together to solve many problems, though each tool solves only
>a subset of any one problem.
>
>To give an example, however contrived or even erroneous---as I
>mentioned, I do not work for CfAR---consider taking a set of picture
>files, performing some algebraic transformation on each pixel value,
>applying histogram equalization, then halftoning and printing on
>an Imagen laser printer:
>
>	for i in *.pict; do
>		lop "your operation here" < $i | histeq | ht | pi |
>		qpr -q imagen-imp
>	done
>
>(I have made up some of these program names; CVL people may correct
>me if I have important details wrong.  `lop' stands for Local
>Operation on Picture, by the way.)  If instead you need to display
>one of these on the Grinnell:
>
>	lop "your operation here" < foo.pict | histeq | ht | put "params"
>
>or without halftoning:
>
>	grey	# Grinnel to B/W display
>	lop "your operation here" < foo.pict | histeq | put "params"
>
>The point of all this is that reuse of code itself is unnecessary
>if the code is in a separate program.  All you need do insert the
>program at the appropriate point in the pipe.
>

I think this is an excellent example of the proper use of the Unix
design philosophy, and re-inforces my above comment that in an R&D
environment one often wants to "re-pipe" the plumbing.  In the graphics
world image processing applications lend themselves well to this approach
because one is applying succesive operators to an image.

Where often this falls down in a production situation, is that the overhead
involved in the successive pipes can often exceed the processing time
required for doing the "real" work.

>Now, if you are talking about applying the same operation to thousands
>of pictures a day, then (and *only* then) you should consider taking
>the `guts' of each operation out of each of the programs in question,
>building argument and error handling around them, and packaging that
>up as an `application'.

Precisely my point.  What you seem to be missing, is that the time and
effort involved in doing this packaging for production software often
is several times greater than that required "each operation".

gwyn@BRL.ARPA (VLD/VMB) (10/24/85)

I don't think that Jon and I disagree very much.  My posting was
in response to a fellow who appeared to have not fully assimilated
the use of processes as opposed to subroutines.  Therefore I
emphasized the worth of processes, perhaps giving the impression
that I don't think much of library functions.  The following
lengthy discussion is mostly about software engineering, not UNIX.

KEY:  >>> original worrier  >> me  > Jon

> > Re-usability is obtained at the higher, process, level.

> Doug, you are entirely correct, but this does not negate the case in
> favor of lower level reusability.  Go figure out one day how much of
> your disk is being used by copies of the C library. Compile a program
>
> 	main(){}
>
> using the "-lc" option and multiply this size by the number of
> executable files on your system.  I think you will be astonished.

342 disk bytes each, of which 216 bytes is overhead, for a total of
126 bytes of code and data.  I am astonished -- at how small it is!
Your example did not illustrate the point you were trying to make,
which is not important if you have "shared libraries" (see below).

I am for good software economics at all levels.  The ultimate "low
level" is the UNIX kernel, which offers nice reusable facilities.

> Having modularity at the procedure level can be a great boon,
> particularly if you have a loader which links in libraries at run time
> and doesn't duplicate code unnecessarily.

I do agree that "shared libraries" can cut disk space significantly.
They should only be used for modules with absolutely stable interface
definitions, however; otherwise, at some future date a module change
can instantly break a lot of formerly correct executable binaries.

I have argued for the use of libraries as well as for processes.
Libraries are good when they implement access routines for some
relatively complicated object, e.g. frame buffers or B-trees.  They
are also a nice way to provide generally useful programming support
functions, e.g. complex arithmetic, polynomials, vector math, list
structures, etc.  They can be handy in enforcing file structure,
although often just having good #include files is sufficient.  But
libraries have real problems in some cases (see below).

> > Many new applications should be produced by combining existing tools
> > rather than by writing code in the traditional sense.

Notice that I didn't say "all" or even "most".

Usually, an interesting application requires that its central
computational module(s) be implemented from scratch.  This
does not imply that the application should be a single monolithic
process, however.  I find that designing separate processes for
the hard computation and for the user interface often leads to
a better, more flexible, design.  At the last place I worked,
the Data Flow Diagram for a large new system ended up with
almost every bubble implemented as a separate UNIX process!
The user interface of that system consisted of a few screen-
oriented processes controlling and monitoring the data flows
between subordinate processes.  One of the nice things was
that the computational modules could be developed and tested
separately, they could be used in a batch mode quite easily,
and our blind programmer could operate the calculations via
a simple terminal interface (Bourne or C shell on a Braille
soft-copy terminal).  If we had bundled everything into one
bulky executable module, the system would have been much less
adaptable and effective.

> I agree, in a development environment.  In an applications environment
> this leads to systems which are inconsistent, have no reasonable error
> messages, are poorly documented, and are confusing as hell in
> general.  This is not an intrinsic property of the approach, except to
> the extent that the approach does not enforce programmer discipline in
> such matters.

As you say, not an intrinsic property of the approach.

I don't think any approach that really automatically enforces programmer
discipline is as good as simply having conscientious programmers.
UNIX was designed for skilled software developers for their own
use; it may well be true that it is not sufficiently rigid to keep
mediocre or incompetent programmers out of trouble.  I don't think
you can have it both ways.  (By the way, the people who were pushing
the raw UNIX shell interface as desirable for nontechnical end-users
were fools!  Such users need a controlled, "safe" environment, much
as poorer programmers need highly contrained language systems.)

> > This works especially well when one is trying to support a wide and growing
> > variety of graphic devices.

> Again, a general purpose device independent graphics library would
> probably serve the need better, and would almost certainly allow for
> more efficient collapseing of common code used.  It would also narrow
> the scope of modifications necessary to support new devices, thereby
> helping to minimize support costs.

In non-"shared library" environments, that approach would require
rebuilding all executable binaries before they could be used with
a new device.  With device-independent intermediate files/pipes,
the code that generates graphics is cleanly separated from the code
that displays them, which makes new display devices much less hassle.

Any viable form of modularity suffices to limit the scope of work
to add new devices.

The argument is often heard that interactive graphics requires a full-
duplex connection between the graphics-generating application and the
display/input device (this is true) and that that cannot be achieved
efficiently enough by separate processes (this is false).  We have a
counterexample in daily production use.

The best interaction is obtained with device-specific code handling
the interaction, but that should not affect most of the application.

> > > As a result of this philosophy to design systems as a network of filters
> > > piped together:
> > > 
> > > 	o Much of the bulk of the code is involved in argument parsing,
> > > 	  most of which is not re-usable.

> If your argument parsing is so long, you haven't done it sensibly, or
> your argument conventions need to be rethought.  No matter how you
> slice the program, It has to take arguments, and the shell is doing
> all of the argument division anyway, so there is no added complexity
> to speak of involved in the use of pipes.

I think maybe he meant that there was a loss of efficiency in turning
binary data into character arguments to pass to a process and to decode
them in the invoked process.  If so, the counter-argument is that there
should not be much information passed as process arguments; if a large
number of parameters have to cross the process-process interface, then
either they should be in the major data flows (files, pipes) or someone
has not designed the module interfaces right.

> > > 	o Error handling is minimal at best.  When your only link to the
> > > 	  outside world is a pipe, your only recourse when an error
> > > 	  occurs is to break the pipe.

> This is not really true.  I do agree, however, that error recovery is
> substantially harder if your disjoint pieces of code are not properly
> modularized.  If you and the next guy in the pipe sequence need to
> resynchronize your notion of the data stream, you have invoked a lot
> of code AND protocol overhead.  C's modularity and data sharing
> facilities are not what they could be.  It is one of the only features
> of Modula-2 which I like, though I think the Modula-2 approach is a
> pain.

All part of proper module design, no matter how implemented.

> > If a subordinate module is not able to perform its assigned task,
> > it should so indicate to its controlling module.

> Have you ever tried to implement this using only error(n)?  Talk about
> rendering your code complex...

Success/failure return requires only one bit, and UNIX already has
an established convention for this.  If a lot of complicated dialog
between master and slave modules is required to establish what has
gone wrong, then that too is part of necessary module interface
design no matter how implemented.  Often a pipe is used with a very
simple protocol for such communications.

An important point is that each level in the module hierarchy should
make its own assessment of the situation based on what its slaves
report, and after taking appropriate actions it should return a
boiled-down report to its own master.  The complexity at each level
of module interface should be about the same.

> > Error recovery is best performed at the higher strategic levels.

> Depends on the kind of error.  This argument comes back to the
> modularization point I made above.

Strategic decisions at low levels actually harm the goal of
reusability; if they are inappropriate for the application,
near-duplicate substitutes for the low levels must be developed.
A notorious example was the 4.2BSD network library module that
printed on stderr when a failure was detected.  Really!

> > > 	o Programmers invent "homebrew" data access mechanisms to supplement
> > > 	  the lack of a standard Unix ISAM or other file management.  Much
> > > 	  of this code cannot be re-used...

> I know of no database system where speed is important which has ever
> been distributed using a standard record support library.  I believe
> that you will find that all of the major database products, even under
> VMS, ultimately have given up and gone directly to doing their own
> thing directly using Block I/O because the provided record structure
> facilities, by virtue of the fact that they are general, are
> necessarily not well optomized to any particular task.  In particular,
> the overhead associated with error checks which your database system
> doesn't need is hideous.
>
> On the other hand, a resaonable record structure facility is something
> sorely lacking in UNIX, and is very useful when simply trying to get
> the code running initially.  It allows you to leave the modifications
> for a database hacker and get it running.  For non-critical code, or
> code where the critical element is development time and ease of
> support, this is crucial.

UNIX almost trivially supports fixed-length records; for more complex
file organizations, as you imply, no matter what might be provided it
would probably not be what a particular application really needed.
This would not be a problem if a general facility can be made "good
enough", and sometimes that is possible.  I think a "good enough"
record locking primitive already exists (in some UNIXes), but there
is not yet a standard database access library that is "good enough".
(Not counting ones that nobody can get their hands on.)

> > It is relatively rare that UNIX applications have to be concerned
> > with detailed file access mechanisms.

> Is this cause, or effect?

> I hope that my replies here are not unduly long.  I went over them,
> and I believe that I have made my points succintly: UNIX needs both a
> standard database facility and an intelligent notion of run-time
> libraries.

Amen! to the need for a database facility.  It need not be in the
kernel; simply nice ISAM and B-tree libraries would be a good start.
If these became generally available, then one could consider
establishing a "good enough" standard; that would be premature now.

More generally-available support library functions would be welcome;
I find the new ones in the System V standard C library to be quite
useful, but there are many others one can think of that should be
added (it looks like the directory access routines are on their way
to becoming standard, at long last).

brooks@lll-crg.ARpA (Eugene D. Brooks III) (10/24/85)

In article <2426@sjuvax.UUCP> jss@sjuvax.UUCP (J. Shapiro) writes:
>Doug, you are entirely correct, but this does not negate the case in
>favor of lower level reusability.  Go figure out one day how much of
>your disk is being used by copies of the C library. Compile a program

An interesting proposition, I found megabytes tied up with copies of
1 and 0 in executable programs.  We of course used find to ferret them
out and remove them "$%&')(%$#%&'

sh: postnews not found					:-)

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/24/85)

One advantage of processes as opposed to library functions
is that they are a natural way to exploit parallelism on
multiprocessor architectures.

guy@sun.uucp (Guy Harris) (10/27/85)

> The problem here is not a question of programmers forever reinventing the
> wheel, but rather a lack of uniformity of the various flavors of Unix.
> Most defensive programmers take into account that the programs they write
> might be ported to v6 or XENIX or even a micro that only has an incomplete
> implementation of the stdio package (if that much).

If you're talking about a micro with only an incomplete standard I/O
implementation, you're talking about the lack of uniformity of various C
implementations, not of various flavors of UNIX.  Every UNIX since V7 has
had a standard I/O library, although there are some differences between
them.  The original V6 didn't, but V6 without the "50 changes" and
"Phototypesetter, Version 7" is sufficiently different from all subsequent
UNIXes that trying to write code that builds under it and other systems is
extremely difficult.

> Writing a program with a myriad of fancy pipes and filters may be faster
> and easier, but if you don't want to go the extra mile to write emulations
> of certain system calls...

Again, if you're talking about "standard" UNIX system calls, you're dealing
with the problem of porting between different operating systems.  I know of
few languages at the approximate level of C which permit you to
transparently port applications which run as screen editors, or which run
other programs, or....  The whole reason the people at Bell Labs ported UNIX
was to get around the problem of dealing with multiple machine architectures
*and* operating systems:

	The realization that the operating systems of the target
	machines were as great an obstacle to portability as their
	hardware architecture led us to a seemlingly radical suggestion:
	to evade that part of the problem altogether by moving the
	operating system itself.  (S. C. Johnson and D. M. Ritchie,
	"Portability of C Programs and the UNIX System", BSTJ Vol. 57,
	No. 6, Part 2, July-August 1978, pp. 2021-2048)

At that time, both machines (PDP-11 and Interdata 8/32) were (I presume)
running V7.  Since then, several cooks (the UNIX Support Group/UNIX System
Development Laboratory, U. C. Berkeley's Computer Science Research Group,
10000 other universities, Microsoft, 10000 other UNIX vendors, etc., etc.)
have made their contributions to the broth.  I hear that lots of
applications broke when moving from VMS 3.x to VMS 4.x also...

> C and Unix aren't always synonymous, especially on micros:  witness the
> Amiga and the Atari ST.  If they (the ANSI C standards committee) don't
> stay on track, they may jeapordize their whole standard.

Amen.  The trouble is that C has, for example, no built-in I/O constructs,
so the original UNIX C implementation had an I/O library.  A portable
version was written (two, actually - the Portable I/O library and its
replacement, the Standard I/O library), but it still had a UNIX flavor to
it.  The ANSI C committee is doing some really dumb things like building the
signal mechanism into their standard.  Anybody who wants to write an
application in *any* language which they want to run under, say, VMS and
UNIX, and which makes use of the operating system's facilities in ways that
can't be subsumed by the languages built-in I/O capabilities is going to
have to build an OS interface library and hide the OS dependencies there
anyway.

	Guy Harris

jss@UCB-VAX.Berkeley.EDU (10/27/85)

I am delighted to receive your comments, and I suspect that in great bulk
we agree with each other, though your statements are more refined than mine.

A question about database systems, however.  Unix provides fixed length record
access, but not locking.  While I agree that a database library should be
external to the kernel (sp?), I think that named and unnamed semaphores and
file byterange locking should be added to the kernel, as these would provide
the basis for a great deal of flexibility not presently available.

As to the comment that UNIX does not protect mediocre or bad programmers, I
am inclined to agree. Sadly, there are all too few goo programmers, including
those who wrote UNIX (i.e. they weren't ALL good).

I would be curious to find out what you feel the minimal changes to the UNIX
kernel would be to support network interprocess communication and secure
database transactions, particularly in a distributed environment.

Jon

P.S., if either of wyou wishes to be taken off of the cc list, please let me 
know.

cottrell@NBS-VMS.ARPA (COTTRELL, JAMES) (10/29/85)

/*
> The problem I have observed is that:
> 
> 	Applications programs that have been designed to run under Unix
> 	tend to have a low percentage of re-usable code.

Exactly! What this means is that you get to keep writing NEW stuff! The
common fragments have mostly been culled out and stuck into a library
for you. This is the mark of a successful design. 

	jim		cottrell@nbs
*/
------

henry@utzoo.UUCP (Henry Spencer) (10/30/85)

> > 	Applications programs that have been designed to run under Unix
> > 	tend to have a low percentage of re-usable code.
> 
> Exactly! What this means is that you get to keep writing NEW stuff! The
> common fragments have mostly been culled out and stuck into a library
> for you. This is the mark of a successful design. 

Actually, "mostly" is overstating the situation.  As various people, notably
the Software Tools folks, have pointed out, it's so easy to do various
things in Unix that nobody gets around to making library routines out of
them -- it's too easy to reinvent the wheel each time.  There ought to be
rather more use of libraries than there is.  Things like getopt(3) and the
SysV string(3) routines are forward steps.  (Note to listeners:  both of
these examples exist in public-domain versions that have been posted to
the net repeatedly, so there is NO EXCUSE for not having them.)  Some of
the functions in Kernighan&Pike are also good candidates for putting in
libraries.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

kenny@uiucdcsb.CS.UIUC.EDU (11/03/85)

/* Written  3:20 pm  Oct 28, 1985 by cottrell@NBS-VMS.ARPA in uiucdcsb:net.lang.c */
> This is the mark of a successful design. 

	jim cottrell@nbs */ ------ /* End of text from uiucdcsb:net.lang.c
*/

Not a suxessful design, Jim?  If you're advocating orthographic reforms, at
least be consistent about them.

heiby@cuae2.UUCP (Heiby) (11/06/85)

In article <2474@brl-tgr.ARPA> jss@UCB-VAX.Berkeley.EDU writes:
>A question about database systems, however.  Unix provides fixed length record
>access, but not locking.  While I agree that a database library should be
>external to the kernel (sp?), I think that named and unnamed semaphores and
>file byterange locking should be added to the kernel, as these would provide
>the basis for a great deal of flexibility not presently available.

I'm afraid that jss shows his UCB orientation in the above.  UNIX does have
file and record locking (byterange) in the kernel.  References are the
1984 /usr/group Standard prepared by the /usr/group Standards Committee
November 14, 1984 and the System V Interface Definition, Issue 1, Spring 1985.
The former is available from /usr/group, the latter from AT&T (select code
is 307-127).  What jss probably meant to say was that these features should
be incorporated into 4bsd, with which I tend to agree.  (I know that there
are no named semaphores in UNIX.  It sounds like an interesting idea.)
-- 
Ron Heiby {NAC|ihnp4}!cuae2!heiby   Moderator: mod.newprod & mod.unix
AT&T-IS, /app/eng, Lisle, IL	(312) 810-6109
"I am not a number!  I am a free man!" (#6)

guy@sun.uucp (Guy Harris) (11/11/85)

(Redirected to net.unix because it has nothing to do with C.)

> I'm afraid that jss shows his UCB orientation in the above.

Well, actually the "Berkeley.EDU" in his name was there because his message
was to the ARPANET INFO-C mailing list, and got gatewayed onto "net.lang.c"
by the gateway machine for that mailing list/newsgroup, which is (surprise!)
UCB-VAX.Berkeley.EDU.

> UNIX does have file and record locking (byterange) in the kernel.

Make that "some UNIX implementations have file and record locking (byte
range) in the kernel."  Some which don't include

	V7
	2.9BSD
	32/V
	4.xBSD
	System III
	System V, Release 1
	System V, Release 2, Version N, for values of N less than some
		machine-dependent value - yes, it seems to be a different
		value for VAX S5R2 and 3B20 S5R2

> (I know that there are no named semaphores in UNIX.  It sounds like an
> interesting idea.)

What's in a name?  S5's semaphores can be referred to by a 32-bit unique
identifier, which could be considered a name.  There is a routine "ftok"
which turns a file name into a 32-bit unique identifier by jamming the
device and i-number of that file together with an 8-bit code; this can be
used as a way of binding a name in the file system to a semaphore set.

	Guy Harris