[comp.software-eng] Coding standards

jharkins@sagpd1.UUCP (Jim Harkins) (11/28/89)

In article <??> dopey@sun.UUCP (Can't ya tell by the name) writes:
>I would set strict standards that deal with well written programs
>(e.g. IMHO a. few if any globals, b. one routine per file, c. well
>documented, etc.).

I've never understood why one routine per file is such a *good* thing and
thou shalt not deviate from this under penalty of dirty looks.  In a lot
of cases I feel that routines that belong together should be in the same
file.  An example is malloc/free.  They use the same data structures, nobody
else has any business seeing these structures, so why not have them in the
same file.  Other times I'll write a function to do something and, in the
interests of small functions doing one thing the body of the function
is "call fred, call wilma, call barney, return something".  Fred, wilma, and
barney are usually declared statics as they aren't much use to anyone else.

I can understand the 1 routine 1 file idea, but in the real world it falls on
its face pretty often.  The real solution is to change C.  As most software
projects are divided into pieces, it seems I should be able to control which
variable/function names are visible by other pieces.  This calls for 2
different types of globals, a half-baked global thats visible amongst
files in my piece, and full globals that are visible to other pieces.
This is more of a linker issue than a C issue, all C needs is a new keyword
to declare half-baked globals.

I'm sure you language designers out there have a term for what I'm talking
about in that last paragraph, but hopefully you all understand my point.
my piece

jim
"Congress.  Outside of Zsa Zsa the most bloated, conceited, self-indulgent
 entity in the world."

hollombe@ttidca.TTI.COM (The Polymath) (11/29/89)

In article <546@sagpd1.UUCP> jharkins@sagpd1.UUCP (Jim Harkins) writes:
}In article <??> dopey@sun.UUCP (Can't ya tell by the name) writes:
}>I would set strict standards that deal with well written programs
}>(e.g. IMHO ... one routine per file ...

}I've never understood why one routine per file is such a *good* thing and
}thou shalt not deviate from this under penalty of dirty looks.  In a lot
}of cases I feel that routines that belong together should be in the same
}file.  ...

I think this goes back to the days of Data General FORTRAN compilers that
_required_ one routine per file.  I used to do a lot of work with those
beasts.  Keeping the common blocks straight was such a pain the company
wrote an elaborate utility to automate most of it (and, incidentally,
implement a pretty nifty data dictionary).  This is the only environment
I've ever worked in where the "one routine per file" rule was even
considered.  The only real advantage I can see for it is cheaper (faster)
compiles because only the changed routines get re-compiled.  In these days
of relatively cheap CPU cycles that's a poor trade off for the hassle of
keeping track of all those little files.

I agree with Jim's attitude and have found it to be the de facto situation
in most of the jobs I've done (and am doing).  Related routines and
functions should be grouped together and invisible to anything that
doesn't need to use them directly.

-- 
The Polymath (aka: Jerry Hollombe, hollombe@ttidca.tti.com)  Illegitimis non
Citicorp(+)TTI                                                 Carborundum
3100 Ocean Park Blvd.   (213) 452-9191, x2483
Santa Monica, CA  90405 {csun|philabs|psivax}!ttidca!hollombe

hue@netcom.UUCP (Jonathan Hue) (11/30/89)

>In article <??> dopey@sun.UUCP (Can't ya tell by the name) writes:
>>I would set strict standards that deal with well written programs
>>(e.g. IMHO a. few if any globals, b. one routine per file, c. well
>>documented, etc.).

(Following assumes programming in C)

My $0.02: I don't agree with (b) at all.  In practice, (a) and (b) are
often mutually exclusive.  Sometimes two or more functions will need to
use the same variable, and if you can shove the functions that use it in
one file and make the variable static, you save polluting your program's
global name space.  If you adhere to (b), you've no choice but to make it
global.  You can also hide functions with static that no one else is
interested in.  I think adhering to (b) would tempt you into writing
functions longer than you really should (Aw, I don't want to come up with
another filename for this 40 line chunk of code.  I'll just shove it in
inline wherever I need it).

At my last job we had a guy that adhered to (b) because he "didn't like
searching around to find out where a function was".  I suggested using
tags, but since he used Microsoft Word on a Mac to edit programs (then
he would upload them to a Sun to compile via TOPS - I'm not kidding!) tags
weren't very useful.  Because of his strict adherence to (b), and his desire
to keep the number of files down, he would write 700 to 2000 line functions.

Regarding (c), well documented to me doesn't necessarily mean lots of comments.
At one place I worked there were separate documents which described how
each subsystem of the product worked. The documents gave an overview, and
had separate sections for each of its parts which described what all the
functions did and how they worked.  The comments in the code described anything
which wouldn't be obvious to someone who had read the documentation.  Having
this type of documentation is extremely valuable when bring new people on
board.  Much better than sitting down with 100K lines of code and going through
it with a new hire.  'Course, none of this ever gets written until the first
release goes out...


-Jonathan

dmt@pegasus.ATT.COM (Dave Tutelman) (11/30/89)

In article <4727@netcom.UUCP> hue@netcom.UUCP (Jonathan Hue ) writes:
>>In article <??> dopey@sun.UUCP (Can't ya tell by the name) writes:
>>>(e.g. IMHO a. few if any globals, b. one routine per file...
>
>(Following assumes programming in C)
>
>My $0.02: I don't agree with (b) at all.  In practice, (a) and (b) are
>often mutually exclusive.  Sometimes two or more functions will need to
>use the same variable, and if you can shove the functions that use it in
>one file and make the variable static, you save polluting your program's
>global name space.  
	I'm glad you posted that; I was thinking of doing so.  Right on!

>I suggested using tags...
	Ditto!

>.... but since he used Microsoft Word on a Mac to edit programs, tags
>weren't very useful.
	I have a similar problem, in that most of my editors on the PC
	don't support tags, but I use "stevie" when I need tags on the PC.
	I also use "cpr" to generate function indices in my hard copy.

There IS one argument, in some cases a compelling one, for "one function
per file".  In general, linkers aren't smart enough to link just
PART of a binary file (.OBJ or .o), when that file contains a function
needed by the link.  Consider, therefore, developing a library of functions
to be used by several programs.  For instance, consider a library that
manipulates a widget, and can open, close, bump, and bash the widget.

As Jonathan correctly notes, these functions are likely to share a
set of "limited global data" about the widget, private to the outside
but public among open(), close(), bump(), and bash() of widget.
So we are tempted to write the four functions in a single C file, and
declare the limited-globals as "static".

However, suppose the library will be used by an application that:
   1.	Has strict memory constraints, and
   2.	The only thing it does to the widget is "bash()" it.
The linker wouldn't be able to separate out the bash() function from
the binary file, and the application would carry the memory burden
of all the widget functions.

The alternative is to put each of the the four public widget functions
in its own file, compiling to its own binary, and use a librarian
program to combine them into a library file.  Decent linkers can
easily link part of a library file, in quanta of the original object
files.

How do we deal with the shared data?  There are two ways, one ugly
and one clean (but more effort, and a little bigger which partially
offsets the memory savings):
   -	UGLY: pollute the global data space, and choose names not
	likely to be used by the application (like "widg_lib_firstone").
   -	HARDER: write a "data-manager" function, in yet another file,
	which owns the static variables and responds to requests for
	them from the library function.  The variable names can be
	translated into integers through a header file common to the
	library functions.  The only pollution of the global name
	space is the name of the data manager for widget data.  So
	calls would look like:
		firstone = widg_lib_datamgr( GET, FIRSTONE );
		error    = widg_lib_datamgr( SET, FIRSTONE, firstone );

I've occasionally had to write libraries where this sort of thing
was important.

Hope this helps.
+---------------------------------------------------------------+
|    Dave Tutelman						|
|    Physical - AT&T Bell Labs  -  Lincroft, NJ			|
|    Logical -  ...att!pegasus!dmt				|
|    Audible -  (201) 576 2194					|
+---------------------------------------------------------------+

sccowan@watmsg.waterloo.edu (S. Crispin Cowan) (12/01/89)

In article <4727@netcom.UUCP> hue@netcom.UUCP (Jonathan Hue ) writes:
>>In article <??> dopey@sun.UUCP (Can't ya tell by the name) writes:
>>>I would set strict standards that deal with well written programs
>>>(e.g. IMHO a. few if any globals, b. one routine per file, c. well
>>>documented, etc.).
>
>(Following assumes programming in C)
[good stuff]
>At my last job we had a guy that adhered to (b) because he "didn't like
>searching around to find out where a function was".  I suggested using
>tags, but since he used Microsoft Word on a Mac to edit programs (then
>he would upload them to a Sun to compile via TOPS - I'm not kidding!) tags
>weren't very useful.  Because of his strict adherence to (b), and his desire
>to keep the number of files down, he would write 700 to 2000 line functions.
I would want anyone who produced 2000 line functions fired, unless
they had REALLY good reasons, and "I don't like vi" doesn't even come
CLOSE to cutting it.

>-Jonathan

----------------------------------------------------------------------
Login name:	sccowan			In real life: S. Crispin Cowan
Office:		DC3548	x3934		Home phone: 570-2517
Post Awful:	60 Overlea Drive, Kitchener, N2M 1T1
UUCP:		watmath!watmsg!sccowan
Domain:		sccowan@watmsg.waterloo.edu

"We have to keep pushing the pendulum so that it doesn't get stuck in
the extremes--only the middle is worth having."
	Orwell, Videobanned
		-- Kim Kofmel

tim@hoptoad.uucp (Tim Maroney) (12/02/89)

In article <??> dopey@sun.UUCP (Can't ya tell by the name) writes:
>I would set strict standards that deal with well written programs
>(e.g. IMHO a. few if any globals, b. one routine per file, c. well
>documented, etc.).

In article <4727@netcom.UUCP> hue@netcom.UUCP (Jonathan Hue ) writes:
>My $0.02: I don't agree with (b) at all.
[...]
>I think adhering to (b) would tempt you into writing
>functions longer than you really should (Aw, I don't want to come up with
>another filename for this 40 line chunk of code.  I'll just shove it in
>inline wherever I need it).
>
>At my last job we had a guy that adhered to (b) because he "didn't like
>searching around to find out where a function was".
[...]
>Because of his strict adherence to (b), and his desire
>to keep the number of files down, he would write 700 to 2000 line functions.

I share your distaste for the rule of one routine per file. Short
functions are almost always easier to read than long ones.  A
medium-sized project (say, 15,000 lines) with functions averaging out
at a reasonable size (say, forty lines) would have 375 source files
using this rule.  What an atrocity!

>Regarding (c), well documented to me doesn't necessarily mean lots of comments.

Absolutely.  Clear code shouldn't *need* a lot of comments; a
programmer should be able to read it and understand what's going on
from the routine names, the variable names, and the flow of control,
with just a few added comments if any.  A lot of extraneous comments
about things that would be perfectly clear just from reading the code
actually damages code readability; the control structures become much
harder to follow.

There are a lot of people who adhere to an rule that more comments are
always better.  I worked with a piece of code like that this year.  I
couldn't make heads or tails out of the commented version, which wound
up a few hundreds of lines.  So I sat down and ruthlessly stripped out
all the comments, and when the code was reduced to a few tens of lines,
I then reduced the control structures to the simpler forms which
emerged when you could actually start to see the forest for the trees.
After that, it became comprehensible.

In summary:  Clear code is far more important than extensive comments.

>At one place I worked there were separate documents which described how
>each subsystem of the product worked. The documents gave an overview, and
>had separate sections for each of its parts which described what all the
>functions did and how they worked.  The comments in the code described anything
>which wouldn't be obvious to someone who had read the documentation.  Having
>this type of documentation is extremely valuable when bring new people on
>board.  Much better than sitting down with 100K lines of code and going through
>it with a new hire.  'Course, none of this ever gets written until the first
>release goes out...

Again, I agree.  External documentation is very useful; far more so than
most code comments.
-- 
Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com

"Every institution I've ever been associated with has tried to screw me."
	-- Stephen Wolfram

wayne@dsndata.uucp (Wayne Schlitt) (12/04/89)

In article <9157@hoptoad.uucp> tim@hoptoad.uucp (Tim Maroney) writes:
> In article <??> dopey@sun.UUCP (Can't ya tell by the name) writes:
> >I would set strict standards that deal with well written programs
> >(e.g. IMHO a. few if any globals, b. one routine per file, c. well
> >documented, etc.).
>
> [ .... ]
> 
> I share your distaste for the rule of one routine per file. Short
> functions are almost always easier to read than long ones.  A
> medium-sized project (say, 15,000 lines) with functions averaging out
> at a reasonable size (say, forty lines) would have 375 source files
> using this rule.  What an atrocity!
> 


hmmm...  one of the first things i usually do to code that i get off
the net is break it up into one function per file.  i am not that
dogmatic about it, i just it because it seems to me to be easier to
work with.

take your example of a project with 15,000 lines of C code.  the two
extremes are one file of 15,000 lines, or 375 files with around 40
lines per file.  if given only the choice between these two extremes,
i would definitely pick the latter.  the compile time alone will kill
you if you choose just one file.

in practice i usually do not go to the extreme of always just one
function per file, but i rarely let any given file go over 1000 lines.
i only put functions in the same file if they are closely related.  if
the directory gets too many files, i break the directory up into sub
directories.

maybe some of the reason why i find it easier to work with lots of
files is that i use emacs and it is very easy to have several files
loaded at the same time.  when i am looking at the code it is very
easy to see what another function does by doing a "Ctl-x 4 f function_name.c"
and then i can see both the calling function and the called function
on the screen at the same time.  going to the beginning or end of a
function is easy as going to the beginning or the end of a file.
searching and replacing text doesnt spill over into other functions.

yes, you can do these things when everything is in one file too, but
to me it seems simpler and easier to have one function per file.  

another reason why i may lean toward one function per file is that i
am used to working on large projects (>100k lines) and when you get to
that size, you almost _have_ to work with lots of files, directories
of directories, libraries and such.  


oh well... to each there own...

-wayne

crm@romeo.cs.duke.edu (Charlie Martin) (12/04/89)

the problem with (do/do not) use one file per function is that it's an
optimization.  I'll assume that we're talking about C, not Ada or
Fortran,say.

There are some really good reasons to use more than one function per
file in C; one of the best is to take advantage of C's feature(?) of
file scope.  You could, for example, implement a stack object of a
hidden type by having the stack itself declared static in file stack.c,
then implementing push, pop, etc as functions that access this static
stack.  The type of the stack representation and the details of storage
are hidden from the user.

This same trick can be done in, say, fortran, using functions and a
BLOCK DATA subprogram, but fortran doesn't care much about how many
files used.

(Does anyone know of any other languages than C -- other than C++ etc --
that have this file scope mechanism?)

On the other hand, there are real good reasons not to put a whole
10KSLOC program into one file: editing and compilation time strike me.

What the optimum is between one function per file and only one file is
probably determined by the problem and program architecture.

Charlie Martin (crm@cs.duke.edu,mcnc!duke!crm) 

perry@apollo.HP.COM (Jim Perry) (12/05/89)

In article <4290@pegasus.ATT.COM> dmt@pegasus.ATT.COM (Dave Tutelman) writes:
>In article <4727@netcom.UUCP> hue@netcom.UUCP (Jonathan Hue ) writes:
>>>In article <??> dopey@sun.UUCP (Can't ya tell by the name) writes:
>>>>(e.g. IMHO a. few if any globals, b. one routine per file...
>>
>>My $0.02: I don't agree with (b) at all.  In practice, (a) and (b) are
>	I'm glad you posted that; I was thinking of doing so.  Right on!
>
>There IS one argument, in some cases a compelling one, for "one function
>per file".  In general, linkers aren't smart enough to link just
>PART of a binary file (.OBJ or .o), when that file contains a function
>needed by the link.  Consider, therefore, developing a library of functions
...
>The linker wouldn't be able to separate out the bash() function from
>the binary file, and the application would carry the memory burden
>of all the widget functions.
>
>The alternative is to put each of the the four public widget functions
>in its own file, compiling to its own binary, and use a librarian
>program to combine them into a library file.  Decent linkers can
>easily link part of a library file, in quanta of the original object
>files.
>
>How do we deal with the shared data?  There are two ways, one ugly
>and one clean (but more effort, and a little bigger which partially
...
>
>I've occasionally had to write libraries where this sort of thing
>was important.

Instead of having zillions of programmers standing on their heads like
this in order to get their jobs done, how about a couple of people
smarten up the linkers to do the job right?  I'm not familiar with the
particular linkage conventions used by the compilers/linkers that
affect this group (presumably unix and popular-PC ones), but there's
nothing fundamental keeping linkers from separating out the bash()
function from the binary file, because I've used such linkers/loaders.
Jim Perry   perry@apollo.com    HP/Apollo, Chelmsford MA
This particularly rapid unintelligible patter 
isn't generally heard and if it is it doesn't matter.

perry@apollo.HP.COM (Jim Perry) (12/05/89)

In article <9157@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>In article <??> dopey@sun.UUCP (Can't ya tell by the name) writes:
>>Regarding (c), well documented to me doesn't necessarily mean lots of comments.
>
>Absolutely.  Clear code shouldn't *need* a lot of comments; a
>programmer should be able to read it and understand what's going on
>from the routine names, the variable names, and the flow of control,
>with just a few added comments if any.  A lot of extraneous comments
>about things that would be perfectly clear just from reading the code
>actually damages code readability; the control structures become much
>harder to follow.
>
>There are a lot of people who adhere to an rule that more comments are
>always better.  I worked with a piece of code like that this year.  I
>couldn't make heads or tails out of the commented version, which wound
>up a few hundreds of lines.  So I sat down and ruthlessly stripped out
>all the comments, and when the code was reduced to a few tens of lines,
>I then reduced the control structures to the simpler forms which
>emerged when you could actually start to see the forest for the trees.
>After that, it became comprehensible.
>
>In summary:  Clear code is far more important than extensive comments.

Clear code and clear comments are both important.  As you observe, it's
quite possible to obfuscate a program in any number of ways.  However,
this example doesn't say much, other than that you were presented with
a small program you didn't understand (presumably because it was badly
written/commented), and by extensively editing it, and substantially
rewriting a significant percentage of it, came to understand it.  Let's
assume that you've now rearranged the code to the optimal C language
(again, an assumption) description of the solution, but no comments.  I
submit that I can then pass over that file adding comments, and by so doing
produce an even better program.

My definition of "even better"?  I assign an arbitrary engineer, who's
never seen that piece of code (or who last worked on it six months ago,
effectively the same thing), to make some functional modification to the
program.  The sooner the correct new solution is reached, the better the
(original) program.  

>>  Much better than sitting down with 100K lines of code and going through
>>it with a new hire.  'Course, none of this ever gets written until the
>>release goes out...
>
>Again, I agree.  External documentation is very useful; far more so than
>most code comments.

Again, you're throwing out the baby with the bathwater.  External
documentation has a fundamental flaw alluded to by dopey (no offense): it's
not generally there, and it's out of date.  "Most code comments" are also
missing or out of date, but only because most code is poorly documented.
As Fred Brooks says in The Mythical Man-Month:

     "[external] Program documentation is notoriously poor, and its
    maintenance is worse.  Changes made in the program do not promptly,
    accurately, and invariably appear in the paper."
     "The solution, I think, is to merge the files, to incorporate the
    documentation into the source program.  This is at once a powerful
    incentive toward proper maintenance, and an insurance that the
    documentation will always be handy to the program user.  Such programs
    are called *self_documenting*".

The proper rule, of course, is not that more comments are always better,
but that sufficient comments are always better.  In your example there were
presumably too many comments, but then the code was apparently not clearly
written either.  It is true that what Knuth calls a literate programmer
must have both the skill of coding, and that of documenting.  All
programmers are in effect technical writers, documenting their work for
other programmers who will see it/work on it.  Not all current programmers
excel at both of these skills, but it is a goal to aspire to.  

>>  Much better than sitting down with 100K lines of code and going through
>>it with a new hire.

Well, of course this is the heart of the matter.  A few-hundred or few-ten
line program tells us very little about real life software engineering
situations.  Actually, if the code is properly self-documenting, then the
new hire *can* just sit down with the code and learn from the code itself. 
Documentation, like code, is hierarchical.  At the beginning of each
program, library, whatever, is a broad overview of that unit.  More
specific comments would be associated with modules, functions, algorithms,
etc.

For instance, let's say I've been asked to change the memory allocation
implementation of a moderately large program I've never seen before.  From
the documentation of the program I determine generally what it does and
what sort of data it deals with, and further that it's internally broken
down into twelve modules, one of which deals with storage allocation.  In
that module's primary .c file is a description of the general memory model,
a breakdown of the operations on that memory (functions in the module), and
perhaps a summary of what the cost and benefit of that model are compared
to likely alternatives.  At each subsidiary function the particular
algorithms used are described, potential pitfalls, potential interaction
with other functions.  Within a function the variables are described, and
the high points of the algorithm, such as potential trouble sites for
concurrency, etc.

There's not much time overhead in generating this documentation, assuming a
basic competence at technical writing to one's own level.  At design time
most of this information is probably either already written down or on the
forefront of the programmer's brain (I often design code by writing the 
documentation).  This sort of information *can't* easily be reconstructed
from reading C code.  ("now WHY was I cocky enough to code this loop
without explicitly guarding against interrupts?")  I experienced an
epiphany once when I realized that for the fourth time in two years I was
drawing little linked-list boxes-and-lines to prove to myself that a list
handling function was correct in all cases.  I put that diagram into the
code (and subsequently did in fact refer to it a few times on later
occasions, saving myself significant time).  I hope if subsequent
maintainers have had occasion to visit that code they benefit from it, but
it doesn't really matter, in this case, I've already benefitted myself.
-
Jim Perry   perry@apollo.com    HP/Apollo, Chelmsford MA
This particularly rapid unintelligible patter 
isn't generally heard and if it is it doesn't matter.

tada@athena.mit.edu (Michael J Zehr) (12/05/89)

In article <473ae701.20b6d@apollo.HP.COM> perry@apollo.HP.COM (Jim Perry) writes:
>There's not much time overhead in generating [heirarchical 
>documentation], assuming a
>basic competence at technical writing to one's own level.  At design time
>most of this information is probably either already written down or on the
>forefront of the programmer's brain (I often design code by writing the 
>documentation).  This sort of information *can't* easily be reconstructed
>from reading C code.  ("now WHY was I cocky enough to code this loop
>without explicitly guarding against interrupts?")  
>-
>Jim Perry   perry@apollo.com    HP/Apollo, Chelmsford MA

I heartily agree with this, but unfortunately i rarely see it in
practice.  The largest project i ever did entirely by myself (a library
of calls to handle a user interface, including things like 'make button'
and 'call this function when the user clicks this region', etc.)
amounted to around 2000 lines of C code.  After designing it, I started
the coding by writing 3-5 lines describing each function.  When i went
to write the actual code, i was able to do it all very quickly, and
those 3-5 lines of description for each function have saved enormous
time and effort enhancing the library.

On top of that, I don't think it took any extra time to write, even to
start with.  I was basically sitting at the terminal thinking "how do i
start this" and decided typing what i knew was the best way to start.  I
think the time to write the comments plus the time to write the code was
less than if i had just started in on the code.  

Unfortunately, i've gotten a lot of comments along the lines of "i'm too
busy writing code to write a comment describing it" when suggesting it
to others.  I guess keeping maintenance costs high is what keeps some
people in business... :-\ 

-michael j zehr

tim@hoptoad.uucp (Tim Maroney) (12/05/89)

In article <4290@pegasus.ATT.COM> dmt@pegasus.ATT.COM (Dave Tutelman) writes:
>There IS one argument, in some cases a compelling one, for "one function
>per file".  In general, linkers aren't smart enough to link just
>PART of a binary file (.OBJ or .o), when that file contains a function
>needed by the link.

WHAT?  What year is this?  I don't think I've ever used a linker that
didn't eliminate unused routines.  Any such linker would be seriously
brain damaged.
-- 
Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com

"Everything that gives us pleasure gives us pain to measure it by."
    -- The Residents, GOD IN THREE PERSONS

bill@twwells.com (T. William Wells) (12/05/89)

I'll add my two cents about commenting. I am a big fan of
*useful* comments. And I despise over-commenting. The latter is
any comment that is not the former.

I have three simple rules:

     1) Comments should explain the purpose of the code and its
	relationship to the rest of the program.

     2) Comments should explain the *abstract* character of the
	code.

     3) Comments should *never* explain how the code works or what
	it is doing *unless* you have done something tricky.
	(Tricky: if, six months later, after reading the comments
	and the code you have to think about it, it is tricky.)

BTW, the abstract character of the code and its purpose, while
they can usually be explained simultaneously, are not always
close enough that this will work. A routine that converts binary
time to a useful printable form (this is its abstract character)
might have as its purpose permitting printing binary time
usefully; in that case, one comment easily enough serves two
purposes. On the other hand, you might have this complicated date
conversion routine whose character would be described by its I/O
relationships. However, the purpose of the routine might be: "to
satisfy the wants of our VP from Alpha Centauri". :-)

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

nick@lfcs.ed.ac.uk (Nick Rothwell) (12/05/89)

In article <9157@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>Absolutely.  Clear code shouldn't *need* a lot of comments; a
>programmer should be able to read it and understand what's going on
>from the routine names, the variable names, and the flow of control,
>with just a few added comments if any.

As long as we're forced to use old fashioned low-level languages like
C, where it's impossible to express the pure algorithm directly in the
target language, there's a need for comments. There are two reasons.
The first is that the original algorithm might use concepts which
can't be expressed directly in C (higher order functions, or
polymorphic data objects, for example). The second is that there has
to be some low-level implementation of the things which were assumed
as part of the "universe" of the high-level description (e.g. garbage
collection).

Let me see you write a garbage collector (for example), where it's
clear exactly what GC algorithm you're using and what assumptions
you're making about the format, storage, invariants of the objects,
in C, without comments.

		Nick.
--
Nick Rothwell,	Laboratory for Foundations of Computer Science, Edinburgh.
		nick@lfcs.ed.ac.uk    <Atlantic Ocean>!mcvax!ukc!lfcs!nick
~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~
           "You're gonna jump!?"       "No, Al. I'm gonna FLY!"

bill@polygen.uucp (Bill Poitras) (12/05/89)

In article <9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
%In article <4290@pegasus.ATT.COM> dmt@pegasus.ATT.COM (Dave Tutelman) writes:
%%There IS one argument, in some cases a compelling one, for "one function
%%per file".  In general, linkers aren't smart enough to link just
%%PART of a binary file (.OBJ or .o), when that file contains a function
%%needed by the link.
%
%WHAT?  What year is this?  I don't think I've ever used a linker that
%didn't eliminate unused routines.  Any such linker would be seriously
%brain damaged.
%-- 
%Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com
Yes you have.  ANY linker you have does this.  What you are thinking of is 
a LIBRARY, ie. .lib file. (lib*.a if you're a Unix person), which when used in
the link process, only the functions used in the program begin linked, get 
linked.  Although I'm not a compiler/linker expert, I almost positive that 
this is true.

+-----------------+---------------------------+-----------------------------+
| Bill Poitras    | Polygen Corporation       | {princeton mit-eddie        |
|     (bill)      | Waltham, MA USA           |  bu sunne}!polygen!bill     |
+-----------------+---------------------------+-----------------------------+

bill@twwells.com (T. William Wells) (12/05/89)

In article <9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
: In article <4290@pegasus.ATT.COM> dmt@pegasus.ATT.COM (Dave Tutelman) writes:
: >There IS one argument, in some cases a compelling one, for "one function
: >per file".  In general, linkers aren't smart enough to link just
: >PART of a binary file (.OBJ or .o), when that file contains a function
: >needed by the link.
:
: WHAT?  What year is this?  I don't think I've ever used a linker that
: didn't eliminate unused routines.  Any such linker would be seriously
: brain damaged.

Then you have been in a very limited universe.

Most linkers will not take, from a single object file, just those
routines needed by the rest of the program. Most linkers *will*
take only those object files needed from an archive, but that is
not the same thing.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

jef@well.UUCP (Jef Poskanzer) (12/06/89)

In the referenced message, bill@twwells.com (T. William Wells) wrote:
}In article <9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
}: WHAT?  What year is this?  I don't think I've ever used a linker that
}: didn't eliminate unused routines.  Any such linker would be seriously
}: brain damaged.
}
}Most linkers will not take, from a single object file, just those
}routines needed by the rest of the program.

Tim is (almost certainly) wrong that he has never used such a brain
damaged linker, since every Unix linker is brain damaged in this fashion.

However, T. Bill is wrong that most linkers have this brain damage, since
pretty much every NON-Unix linker works correctly.
---
Jef

  Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef
     "Kirk to Enterprise -- beam down Yeoman Rand and a six-pack."

sccowan@watmsg.waterloo.edu (S. Crispin Cowan) (12/06/89)

In article <1989Dec5.115934.24535@twwells.com> bill@twwells.com (T. William Wells) writes:
>du>
>Organization: None, Ft. Lauderdale, FL
>Lines: 31
>Xref: watmath comp.software-eng:2636 comp.misc:7659
>
>I'll add my two cents about commenting. I am a big fan of
>*useful* comments. And I despise over-commenting. The latter is
>any comment that is not the former.
I don't understand the problem with over-commenting.  First of all, it
is _very_ rare (both :-) and :-(), and secondly, just skip it if it
bugs you.

>I have three simple rules:
[good list of three rules]

I also like to see ALL variables described.  I can figure out what a
for-loop is doing, but it's not at all obvious what the
trans_rec_count variable is a count of (total transactions, to-date,
how many gropple-grommits in this shipment, etc.).  Unless it's just a
scratch counter such as `i', it should be commented.

>Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
>bill@twwells.com
----------------------------------------------------------------------
(S.) Crispin Cowan, CS grad student, University of Waterloo
Office:		DC3548	x3934		Home phone: 570-2517
Post Awful:	60 Overlea Drive, Kitchener, N2M 1T1
UUCP:		watmath!watmsg!sccowan
Domain:		sccowan@watmsg.waterloo.edu

"The most important question when any new computer architecture is
introduced is `So what?'"
	-someone on comp.arch
	(if it was you, let me know & I'll credit it)

dmt@pegasus.ATT.COM (Dave Tutelman) (12/06/89)

>%In article <4290@pegasus.ATT.COM> dmt@pegasus.ATT.COM (Dave Tutelman) writes:
>%%There IS one argument, in some cases a compelling one, for "one function
>%%per file".  In general, linkers aren't smart enough to link just
>%%PART of a binary file (.OBJ or .o), when that file contains a function
>%%needed by the link.

>In article <9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>%WHAT?  What year is this?  I don't think I've ever used a linker that
>%didn't eliminate unused routines.  Any such linker would be seriously
>%brain damaged.

In article <600@fred.UUCP> bill@fred.UUCP (Bill Poitras) writes:
>Yes you have.  ANY linker you have does this.  What you are thinking of is 
>a LIBRARY, ie. .lib file. (lib*.a if you're a Unix person), which when used in
>the link process, only the functions used in the program begin linked, get 
>linked.  Although I'm not a compiler/linker expert, I almost positive that 
>this is true.

Thanks, Bill.  I generally use "dumb" linkers, though the linkers that Tim
claims to use wouldn't violate any laws of thermodynamics.  Consider:
   -	It isn't too difficult for a C compiler to demark beginning and
	end of function in a .OBJ or .o, or even just guarantee
	adjacency within a single function.  (Of course, the last function
	would have to be demarked.)
   -	The external variables (static or otherwise) ALLOCATED in that
	file could be loaded, depending on whether they are referenced
	by any of the loaded functions.

So why don't any of the linkers I use get this smart?  Because their
authors wanted to have a single linker that would handle arbitrary
object files, without depending on their being generated by their favorite
C compiler.  In particular, object files from hand-coded assembler
could also be linked in.  (This is an IMPORTANT feature of MSDOS linkers,
since a lot of programs use a little assembler for their lowest-level
routines.)

Hand-coded assembly code yields object files that CAN'T be split up
into the functions that are actually called.  Just a few of the
obvious things that make it impossible are:
   -	Self-modifying code.
   -	Gotos whose scope isn't restricted to be in the function.

This problem is solved, as Bill notes, by keeping enough information
in libraries to keep the object files separate.  If you write one
function per file, then the linker only loads the essential functions.
If you write several functions per file, then all the functions from
that file (but not all the functions in the library) get loaded if
ANY function from the file does.

When I wrote the base note, I thought about including this discussion,
but didn't because the note was long enough.  Bad decision?
+---------------------------------------------------------------+
|    Dave Tutelman						|
|    Physical - AT&T Bell Labs  -  Lincroft, NJ			|
|    Logical -  ...att!pegasus!dmt				|
|    Audible -  (201) 576 2194					|
+---------------------------------------------------------------+

bill@twwells.com (T. William Wells) (12/06/89)

In article <14836@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
: In the referenced message, bill@twwells.com (T. William Wells) wrote:
: }In article <9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
: }: WHAT?  What year is this?  I don't think I've ever used a linker that
: }: didn't eliminate unused routines.  Any such linker would be seriously
: }: brain damaged.
: }
: }Most linkers will not take, from a single object file, just those
: }routines needed by the rest of the program.
:
: Tim is (almost certainly) wrong that he has never used such a brain
: damaged linker, since every Unix linker is brain damaged in this fashion.
:
: However, T. Bill is wrong that most linkers have this brain damage, since
: pretty much every NON-Unix linker works correctly.

Eh? I've worked on a dozen or so non-Unix machines. Only a few of
them were capable of taking apart an object file and using only
the routines you needed. And those linkers could not be used with
a C compiler that did not play games with static variable names.
(They had no notion of static at all.)

I'll admit that many of those machines were used over eight years
ago, so things might be better now, but I doubt it. IBM, for
example, does not change all that quickly.

Care to name some specific systems where the linker could take
apart an object file, and for which a reasonable C compiler
exists?

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

jef@well.UUCP (Jef Poskanzer) (12/07/89)

In the referenced message, bill@twwells.com (T. William Wells) wrote:
}Care to name some specific systems where the linker could take
}apart an object file, and for which a reasonable C compiler
}exists?

Why the second requirement, Bill?

To be crystal clear about what is being discussed, it is the ability to
make a library from a single source file, and then at link time extract
only the referenced routines from that library.  No one is talking
about eliminating unreferenced routines from the main program.  Everyone
who is getting hysterical about their call-by-string hacks can stop
screaming now.

Anyway, the last time this discussion came up, I posted a transcript of
a session with the VMS FORTRAN compiler and the VMS linker.  They have
no problem at all separating a single source file into one object
module per routine.  The reaction then was, "Oh sure, FORTRAN can do
that, but we were discussing *real* languages."  "Real" languages
meaning C, of course.

So, why the second requirement, Bill?  Have you ever actually checked
whether any of the non-Unix systems you've used have this ability?  Are
you afraid of what you might find?
---
Jef

  Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef
"An object never serves the same function as its image - or its name."
                           -- Rene Magritte

bill@twwells.com (T. William Wells) (12/07/89)

In article <14850@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
: In the referenced message, bill@twwells.com (T. William Wells) wrote:
: }Care to name some specific systems where the linker could take
: }apart an object file, and for which a reasonable C compiler
: }exists?
:
: Why the second requirement, Bill?

Specifically because those few linkers I know of that permit
disassembling an object file and using just the pieces work with
object files that are, essentially, archives. That is to say, if
you compiled several functions into the one object file, each
function occupied a physically distinct part of the file; taking
the object file apart was little more complex than just copying
some particular part of the file.

None of these machines ran C. They would have had real problems
with C, since it would have been hard to implement file scope
with those linkers. (I know, I had to try to do something similar
with one of them.)

The second requirement is there, not as an absolute requirement,
but as a "reasonableness" requirement. None of those linkers
would have been useful in a modern environment. Certainly a linker
that a C compiler exists for is minimally "reasonable". I'd be
willing to entertain other linkers, so long as they aren't overly
restrictive.

: So, why the second requirement, Bill?  Have you ever actually checked
: whether any of the non-Unix systems you've used have this ability?  Are
: you afraid of what you might find?

An ad hominem deserves an ad hominem in response: fuck you, Mr.
Poskanzer. I do not appreciate personal attacks.

And, to answer your question: yes, of course I looked.

We now have one linker (a VMS linker, mentioned in a deleted part
of the article). But I'd like to see some more.

After all, the point under discussion is:

: }Most linkers will not take, from a single object file, just those
: }routines needed by the rest of the program.

Which is to say that there *are* some linkers that will. I happen
not to have used any recently, and the ones that I did were
really brain damaged, but I can see how one would do such a
linker. (BTW, it happens that I've never used VMS.)

*One* linker certainly does not negate "most". So, without at
least a few more examples, there isn't any reason to doubt the
"most". And, unless someone comes up with a few more, there is no
point in discussing this further.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

murphyn@cell.mot.COM (Neal P. Murphy) (12/07/89)

bill@polygen.uucp (Bill Poitras) writes:

>In article <9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>%In article <4290@pegasus.ATT.COM> dmt@pegasus.ATT.COM (Dave Tutelman) writes:
>%%...
>%%per file".  In general, linkers aren't smart enough to link just
>%...
>%WHAT?  What year is this?  I don't think I've ever used a linker that
>%...
>Yes you have.  ANY linker you have does this.  What you are thinking of is 
>...

Of course, if you want to get into esoterica, you might as well mention the
linker/loader on the old TOPS-10 and TOPS-20 O/S from DEC. They would read
every module from every object file linked, unless the object file was
specified as name.rel/LIB, whereupon it would be treated as a library. Of
course, this action requires more memory, ... Strike that, the TOPS-10
compiler ran in 30k words. If I might opine, such selective action usually
requires extra thought on the part of linker designers, and if they DGAS
(dontgivea...), they don't design it in.

NPN

perry@apollo.HP.COM (Jim Perry) (12/07/89)

bill@polygen.uucp (Bill Poitras) writes:
>In article <9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>%In article <4290@pegasus.ATT.COM> dmt@pegasus.ATT.COM (Dave Tutelman) writes:
>%%There IS one argument, in some cases a compelling one, for "one function
>%%per file".  In general, linkers aren't smart enough to link just
>%%PART of a binary file (.OBJ or .o), when that file contains a function
>%%needed by the link.
>%
>%WHAT?  What year is this?  I don't think I've ever used a linker that
>%didn't eliminate unused routines.  Any such linker would be seriously
>%brain damaged.
>%-- 
>%Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com
>Yes you have.  ANY linker you have does this.  What you are thinking of is 
>a LIBRARY, ie. .lib file. (lib*.a if you're a Unix person), which when used in
>the link process, only the functions used in the program begin linked, get 
>linked.  Although I'm not a compiler/linker expert, I almost positive that 
>this is true.

This is true only in the very narrow context of UNIX; in more advanced
systems (both predating and postdating UNIX) the output of the compilers
(foo.o or a.out in UNIX terms) is what you think of as a library, and
the associated linkers/loaders/library editors can do the right thing,
for instance doing type-checking on cross-module function calls, but
certainly including only procedures and data that are actually referenced.

On another point, speaking of a.out, it's relatively rare in the world at
large for compilers to generate *assembly source*; that's another UNIXism.
(They compile directly to the appropriate machine language -- in a
relocatable library form, of course).  People on both sides are surprised
by this, for some reason.  Anyway, I suspect there may be some relation
between this issue and the object module thing, but I don't know for sure. 

It's also true that the traditional UNIX philosophy calls for multiple
simple tools rather than complex tools, and ease of writing over ease or
efficiency of use; thus rather than a cc that knows about libraries, you
have cpp/cc/as/ar/ranlib.  Easier to write and almost as good.  (Or, if you
will, "brain damaged" :-).

-
Jim Perry   perry@apollo.hp.com    HP/Apollo, Chelmsford MA
This particularly rapid unintelligible patter 
isn't generally heard and if it is it doesn't matter.

scott@bbxsda.UUCP (Scott Amspoker) (12/08/89)

In article <4748ed31.20b6d@apollo.HP.COM> perry@apollo.HP.COM (Jim Perry) writes:
>On another point, speaking of a.out, it's relatively rare in the world at
>large for compilers to generate *assembly source*; that's another UNIXism.
>(They compile directly to the appropriate machine language -- in a
>relocatable library form, of course).  People on both sides are surprised
>by this, for some reason.  Anyway, I suspect there may be some relation
>between this issue and the object module thing, but I don't know for sure. 

I think it has more to with the fact that an assembler already exists.
why re-invent the wheel?  The code generation phase of a compiler is
much simpler if outputs assembly source resulting in a compiler that
is more portable.

-- 
Scott Amspoker
Basis International, Albuquerque, NM
(505) 345-5232
unmvax.cs.unm.edu!bbx!bbxsda!scott

dplatt@coherent.com (Dave Platt) (12/08/89)

In article <14850@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
> Anyway, the last time this discussion came up, I posted a transcript of
> a session with the VMS FORTRAN compiler and the VMS linker.  They have
> no problem at all separating a single source file into one object
> module per routine.  The reaction then was, "Oh sure, FORTRAN can do
> that, but we were discussing *real* languages."  "Real" languages
> meaning C, of course.

The Honeywell CP-6 object language and linker supports this sort of
feature, as well.  Object-language records are stored in a "keyed file"
(a B-tree file, in effect);  it's possible to store many object-language
packages in a single keyed file with no interference.

The linker brings in the necessary object-language modules, by searching
the "external functions defined" record for each module, and loading only
those modules which define a function that's actually needed.

All external (global) data variables are, by definition, contained
within an object module.. and hence within a function.  This isn't
entirely consistent with the C model, which defines globals as being
those variables which lie _outside_ of any function.

One way in which a savvy C compiler could resolve this, would be to bundle
all of the global variables into a dummy module.  Each real module
(function) in the source-file would access the global variables as if
they had been declared "extern".  If the linker fetched a function-module
from the object file, it would "see" that the module was accessing some
extern variables, would search the "external variables defined" records
in the object file, and link in the dummy module (which defined the
globals) as a result.

There's a cost to this approach, though.  It requires that all global
variables be accessed as "externs", even if they were defined in the
same source-file as they're being used.  This makes it difficult to
have "static" variables of file scope... because the very use of the
"static" keyword requires that the variables' names not be exported
(for reasons of data-hiding, name-space pollution, etc.)

An effective C compiler can get around this problem by hashing the names
of the static variables into some horrible string that's guaranteed not
to collide with any other name.
-- 
Dave Platt                                             VOICE: (415) 493-8805
  UUCP: ...!{ames,apple,uunet}!coherent!dplatt   DOMAIN: dplatt@coherent.com
  INTERNET:       coherent!dplatt@ames.arpa,  ...@uunet.uu.net 
  USNAIL: Coherent Thought Inc.  3350 West Bayshore #205  Palo Alto CA 94303

bill@twwells.com (T. William Wells) (12/08/89)

In article <32359@watmath.waterloo.edu> sccowan@watmsg.waterloo.edu (S. Crispin Cowan) writes:
: In article <1989Dec5.115934.24535@twwells.com> bill@twwells.com (T. William Wells) writes:
: >I'll add my two cents about commenting. I am a big fan of
: >*useful* comments. And I despise over-commenting. The latter is
: >any comment that is not the former.
: I don't understand the problem with over-commenting.  First of all, it
: is _very_ rare (both :-) and :-(), and secondly, just skip it if it
: bugs you.

This is the syndrome where someone writes:

	/* Increment a. */

	++a;

Such comments don't help, but they do waste the time needed to
read them. You don't know ahead of time whether the comment is
important, so you have to read it. To no purpose.

This particular kind of commenting is all too common.

: >I have three simple rules:
: [good list of three rules]
:
: I also like to see ALL variables described.  I can figure out what a
: for-loop is doing, but it's not at all obvious what the
: trans_rec_count variable is a count of (total transactions, to-date,
: how many gropple-grommits in this shipment, etc.).  Unless it's just a
: scratch counter such as `i', it should be commented.

     1) Comments should explain the purpose of the code and its
	relationship to the rest of the program.

applies to variables as well as executable code. BTW, I follow a
consistent rule when commenting variables. If the variable needs a
complex description, that goes before the declaration. On the
declaration goes a short comment which indicates what the variable
is; that comment is always a noun phrase with any leading
determiner deleted.

I'll add a fourth rule to my list:

     4) Comments are literary objects; write them with your
	audience in mind. Write clear and standard English (or
	whatever). Avoid unnecessary abbreviations and ellipses.
	Read a few good books on writing and take at least *some*
	of their suggestions to heart; Strunk & White's _The
	Elements of Style_ is, at least, short and entertaining
	(not to mention useful) and there are many others.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

dricejb@drilex.UUCP (Craig Jackson drilex1) (12/08/89)

In article <14836@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
>In the referenced message, bill@twwells.com (T. William Wells) wrote:
>}In article <9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>}: WHAT?  What year is this?  I don't think I've ever used a linker that
>}: didn't eliminate unused routines.  Any such linker would be seriously
>}: brain damaged.
>}
>}Most linkers will not take, from a single object file, just those
>}routines needed by the rest of the program.
>
>Tim is (almost certainly) wrong that he has never used such a brain
>damaged linker, since every Unix linker is brain damaged in this fashion.

Tim has been on the Mac for a while; I think Mac linkers *may* be
different in this regard.

>However, T. Bill is wrong that most linkers have this brain damage, since
>pretty much every NON-Unix linker works correctly.
>  Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef

I know for a fact that MS-DOS linkers have this 'brain damage', even though
real librarians are available.  The semantics of C make it somewhat
harder to eliminate the dead code.  (Not impossible. The problem is making
sure the static stuff is handled correctly.)

I suspect that the MIPS linker can do this, because it can do link-time
optimization.  I haven't seen it, but the linker for Unisys A-Series C may be
able to do this, because it can do this for other languages on that machine.
-- 
Craig Jackson
dricejb@drilex.dri.mgh.com
{bbn,axiom,redsox,atexnet,ka3ovk}!drilex!{dricej,dricejb}

tim@hoptoad.uucp (Tim Maroney) (12/08/89)

In article <4290@pegasus.ATT.COM> dmt@pegasus.ATT.COM (Dave Tutelman) writes:
>There IS one argument, in some cases a compelling one, for "one function
>per file".  In general, linkers aren't smart enough to link just
>PART of a binary file (.OBJ or .o), when that file contains a function
>needed by the link.

In article <9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>WHAT?  What year is this?  I don't think I've ever used a linker that
>didn't eliminate unused routines.  Any such linker would be seriously
>brain damaged.

bill@polygen.uucp (Bill Poitras) wrote:
> Yes you have.  ANY linker you have does this.  What you are thinking of is 
> a LIBRARY, ie. .lib file. (lib*.a if you're a Unix person), which when used in
> the link process, only the functions used in the program begin linked, get 
> linked.

The linkers I use every day on the Macintosh routinely remove all
unreferenced routines from output files.  On the other hand, it turns
out that very few, if any, UNIX linkers do this.  I *have* used UNIX
linkers (I used to use them all the time, in fact), so my impression
was incorrect, as a number of people have kindly informed me by
e-mail.  I was going to apologize, but I've just read all ten pages of
the ld(1) manual page, and it never explicitly says this, so I feel
justified in the error.  I'm not used to Mac development tools being
smarter than UNIX!
-- 
Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com

FROM THE FOOL FILE:
"The men promise to provide unconditionally for their wives.  The women in turn
 serve unconditionally to provide the other household services necessary for the
 men to fulfill their obligations to the women.  The women are satisfied because
 they have the men working for THEM." -- Colin Jenkins, soc.women

murphyn@cell.mot.COM (Neal P. Murphy) (12/08/89)

bill@twwells.com (T. William Wells) writes:

>...
>*One* linker certainly does not negate "most". So, without at
>least a few more examples, there isn't any reason to doubt the
>"most". And, unless someone comes up with a few more, there is no
>point in discussing this further.
>...

A few more examples?

DEC10/TOPS10 and DEC20/TOPS20 linking loader would extract only the
functions that were referenced, provided it was informed that it
should use the object file as a library, e.g.,

.algol fubar,jnil,myfncs      ; compile the three ALGOL sources
.                             ; creating fubar.rel, jnil.rel, and
.                             ; myfncs.rel
.load fubar,jnil,myfncs/LIB   ; load the desired functions into memory
.save fubar                   ; save the image in fubar.run

The procedure on a DEC20 would be similar. It's been ten years
since I used this DEC10, so there could be a minor error in my
syntax. This was a KA-10 processor, 196k words memory, 45 jobs,
swapping drum and 602 monitor, so memory size is no reason for
not performing selective linking.

NPN

peter@ficc.uu.net (Peter da Silva) (12/09/89)

In article <41413@improper.coherent.com> dplatt@coherent.com (Dave Platt) writes:
> One way in which a savvy C compiler could resolve this, would be to bundle
> all of the global variables into a dummy module.
...
> An effective C compiler can get around this problem by hashing the names
> of the static variables into some horrible string that's guaranteed not
> to collide with any other name.

Just maintain the address of the start of the dummy module and access the
variables as __module_start+offset. This is a common trick for dealing
with Cs "file common" scope.

No, there's really no reason that you can't have C compiler in a module-
oriented system with a function-scope linker. It's just that in general
the savings from this aren't very great. For example, look at the common
situation where you have two routines to allocate and deallocate a resource:
they might as well go in one file... and you're unlikely to be using one of
them without the other. And besides, if you're designing a library you
really should be doing more than sticking all the modules in a file
and compiling them anyway.
-- 
`-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
 'U`  Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org>.

      "If you want PL/I, you know where to find it." -- Dennis

throopw@sheol.UUCP (Wayne Throop) (12/09/89)

> tim@hoptoad.uucp (Tim Maroney)
> The linkers I use every day on the Macintosh routinely remove all
> unreferenced routines from output files.  On the other hand, it turns
> out that very few, if any, UNIX linkers do this.

There are many different reasons for this, but it is perhaps worth
noting that the problem isn't purely a linker problem.  If the compiler
doesn't generate object code with the separable portions marked out,
the linker simply can't separate them.

There are valid reasons for generating unravelable object files, having
to do with just what tradeoffs one wishes to make between compile, link,
and runtime efficency.  The general Mac environment makes the motives
for making this tradeoff differently than traditional mainframe and
minicomputer language systems compelling.

> but I've just read all ten pages of
> the ld(1) manual page, and it never explicitly says this, so I feel
> justified in the error.

I think the real meat of it is in the a.out(5) (or is that (6)) man page,
that is, the executable/object-file data format definition.  Reading
between the lines of this, it becomes apparent that it would be difficult
(though not impossible) for a compiler to generate separable object files,
and thus at least as difficult for a linker to separate them.  (This is,
of course, still not an explicit statement of functional deficit.)


Question: is it possible to convince the Mac language environment to
leave the unreferenced routines and data in the executable image
equivalent? If not, unit testing from a good debugger becomes more
difficult.  (Mind you, not impossible....  one could simply artificially
reference every routine you want to unit test, but this could be tedious
or even problematical.)
--
Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org

tim@hoptoad.uucp (Tim Maroney) (12/09/89)

In article <1989Dec6.154103.2078@twwells.com> bill@twwells.com (T.
William Wells) writes:
>Eh? I've worked on a dozen or so non-Unix machines. Only a few of
>them were capable of taking apart an object file and using only
>the routines you needed. And those linkers could not be used with
>a C compiler that did not play games with static variable names.
>(They had no notion of static at all.)
>
>Care to name some specific systems where the linker could take
>apart an object file, and for which a reasonable C compiler
>exists?

MPW for the Apple Macintosh.  I just elaborately verified that the
linker does this for a skeptical friend.  The C compiler is nearly a
full ANSI C, and it certainly does include "statics".  I just did
another test, and it even deletes unused "static" functions.  It's
actually a pretty strong development system overall; I've been praising
it on the net ever since I was one of the original beta testers.
-- 
Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com

This message does represent the views of Eclectic Software.

perry@apollo.HP.COM (Jim Perry) (12/09/89)

In article <41413@improper.coherent.com> dplatt@coherent.com (Dave Platt) writes:
>[in one implementation,]
>All external (global) data variables are, by definition, contained
>within an object module.. and hence within a function.  This isn't
>entirely consistent with the C model, which defines globals as being
>those variables which lie _outside_ of any function.
>
>One way in which a savvy C compiler could resolve this, would be to bundle
>all of the global variables into a dummy module.  Each real module
>(function) in the source-file would access the global variables as if
>they had been declared "extern".  If the linker fetched a function-module
>from the object file, it would "see" that the module was accessing some
>extern variables, would search the "external variables defined" records
>in the object file, and link in the dummy module (which defined the
>globals) as a result.
>
>There's a cost to this approach, though.  It requires that all global
>variables be accessed as "externs", even if they were defined in the
>same source-file as they're being used.  This makes it difficult to
>have "static" variables of file scope... because the very use of the
>"static" keyword requires that the variables' names not be exported
>(for reasons of data-hiding, name-space pollution, etc.)
>
>An effective C compiler can get around this problem by hashing the names
>of the static variables into some horrible string that's guaranteed not
>to collide with any other name.

One scenario based on linkers I've used (which now support C but
didn't when I used them, so this is hypothetical but doable for C):

For every external symbol a module defines (the code of the non-static
functions, and initialized extern variables), a linkage segment is
generated.

File-scope static variables are put in a separate segment, with the
compiled code referencing variables in it as offsets into the segment
(i.e., in C terms, as if all file-scope variables were gathered
together in a single extern struct).  This segment has a name for
linking purposes (all the functions in the file refer to it), but it's
not accessible from the language space.

For every external symbol a function uses, its linkage segment
includes a reference to that symbol.  An external integer reference is
not much different from an external function, as far as linking them
together.  The linker flags multiply-defined symbols, and generates
storage to back uninitialized external variables (referenced but never
defined).  Undefined functions generally are flagged as an error. 
Incidentally, the system I have in mind tags symbol definitions and
references with their datatype, so type conflicts (including function
signatures) are detected.
-
Jim Perry   perry@apollo.hp.com    HP/Apollo, Chelmsford MA
This particularly rapid unintelligible patter 
isn't generally heard and if it is it doesn't matter.

decot@hpisod2.HP.COM (Dave Decot) (12/09/89)

The questions I want answered by comments are:

	What do the possible values of this variable or type represent,
	within the user's abstraction?

    and

	What does this code expect, and what does it assume, about the
	values of its arguments and surrounding variables?

    and

	What, abstractly, does this code guarantee when it's done?

For instance,

	int status;	/* GOOD if the laser hit the target, BAD if not */

	status = zap(plane2);	/* try to blast the enemy plane */

	if (status == GOOD)	/* the airplane was successfully destroyed */
	{
	    ++k;			/* bump the death toll */
	    kill(SIGUSR1, pid2);	/* notify the air traffic controller */
	}

I don't care what bits you're twiddling, I want to know what it's supposed
to be for, and what it means abstractly.

The interesting thing about comments like this is you can grab them straight
out of a specification document.  >> HINT!!! <<

Dave Decot
hpda!decot

hue@netcom.UUCP (Johathan Hue) (12/09/89)

In article <473ae701.20b6d@apollo.HP.COM>, perry@apollo.HP.COM (Jim Perry) writes:
> Again, you're throwing out the baby with the bathwater.  External
> documentation has a fundamental flaw alluded to by dopey (no offense): it's
> not generally there, and it's out of date.  "Most code comments" are also
> missing or out of date, but only because most code is poorly documented.
> As Fred Brooks says in The Mythical Man-Month:

One of my rules is "You shouldn't have to look at the code to understand
what it does and how it works".  If the external documentation isn't
there, then your programmers have no discipline and your managers are a
bunch of whimps.  Maybe HP will be able to whip you Apollo boys into shape. :-)

> Well, of course this is the heart of the matter.  A few-hundred or few-ten
> line program tells us very little about real life software engineering
> situations.  Actually, if the code is properly self-documenting, then the
> new hire *can* just sit down with the code and learn from the code itself. 
> Documentation, like code, is hierarchical.  At the beginning of each
> program, library, whatever, is a broad overview of that unit.  More
> specific comments would be associated with modules, functions, algorithms,
> etc.

I'm sorry, this just doesn't work for me.  If you're going to read the
code you're going to have several hundred to several thousand page
listing, and are going to be forever flipping through it trying to trace
the flow of control and read your comment boxes which describe what your
functions do.

I once worked on a device driver and in the external documentation I drew a
state machine of how one part of it worked.  Are you going to use ascii text
graphics to draw boxes and arrows and put that in your comments?

-Jonathan

hue@netcom.UUCP (Johathan Hue) (12/09/89)

In article <WAYNE.89Dec3140323@dsndata.uucp>, wayne@dsndata.uucp (Wayne Schlitt) writes:
> hmmm...  one of the first things i usually do to code that i get off
> the net is break it up into one function per file.  i am not that
> dogmatic about it, i just it because it seems to me to be easier to
> work with.

You're giving up some features of the language if you do that.  You
no longer have static functions, or static variables outside of functions,
so everything becomes global, and good luck if someone decided to have
a global and a static with the same name.

You also lose some compiler optimizations.  For instance, most C compilers
can't do inline functions if caller and callee aren't in the same file.

-Jonathan

bill@twwells.com (T. William Wells) (12/10/89)

In article <9228@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
: In article <1989Dec6.154103.2078@twwells.com> bill@twwells.com (T.
: William Wells) writes:
: >Care to name some specific systems where the linker could take
: >apart an object file, and for which a reasonable C compiler
: >exists?
:
: MPW for the Apple Macintosh.  I just elaborately verified that the
: linker does this for a skeptical friend.  The C compiler is nearly a
: full ANSI C, and it certainly does include "statics".  I just did
: another test, and it even deletes unused "static" functions.  It's
: actually a pretty strong development system overall; I've been praising
: it on the net ever since I was one of the original beta testers.

When I had to work on the Mac, my development environment
consisted of a cross compiler from a VAX and a downloader
originally written in *machine code* to be executed from BASIC!
(My first program was a shell with a download command. Surprised?)

There *was* no C compiler for the Mac then. I considered it a
major revolution to get Manx C, with its tiny shell and a native
compiler.

I'm glad that things are better now. :-)

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

tim@hoptoad.uucp (Tim Maroney) (12/10/89)

In article <14850@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
>No one is talking
>about eliminating unreferenced routines from the main program.

Yes, we are.  Get a clue.
-- 
Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com

"There's a real world out there, with real people.  Go out and play there for
 a while and give the Usenet sandbox a rest.  It will lower your stress
 levels and make the world a happier place for us all." -- Gene Spafford

jef@well.UUCP (Jef Poskanzer) (12/11/89)

In the referenced message, tim@hoptoad.UUCP (Tim Maroney) wrote:
}In article <14850@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
}>No one is talking
}>about eliminating unreferenced routines from the main program.
}
}Yes, we are.  Get a clue.

You are right, I was wrong.  Let me rephrase my statement.  No one
who has been paying attention to the discussion is talking about
eliminating unreferenced routines from the main program.
---
Jef

  Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef
 "...for DEATH awaits you all, with nasty sharp pointy teeth!" -- Tim

tim@hoptoad.uucp (Tim Maroney) (12/13/89)

In article <14913@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
>In the referenced message, tim@hoptoad.UUCP (Tim Maroney) wrote:
>}In article <14850@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
>}>No one is talking
>}>about eliminating unreferenced routines from the main program.
>}
>}Yes, we are.  Get a clue.
>
>You are right, I was wrong.  Let me rephrase my statement.  No one
>who has been paying attention to the discussion is talking about
>eliminating unreferenced routines from the main program.

Wrong again.  This has been the subject of every message bearing on
the topic.  Only you are saying that this is not what the discussion
is about.
-- 
Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com

"Americans will buy anything, as long as it doesn't cross the thin line
 between cute and demonic." -- Ian Shoales

psrc@pegasus.ATT.COM (Paul S. R. Chisholm) (12/13/89)

In articles <4290@pegasus.ATT.COM> dmt@pegasus.ATT.COM (Dave Tutelman),
<9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney), <600@fred.UUCP>
bill@fred.UUCP (Bill Poitras), and <4304@pegasus.ATT.COM>
dmt@pegasus.ATT.COM (Dave Tutelman again) argue about whether their
linkers are smart enough to ignore unused names.

I'd like to point out that Borland's Turbo Pascal for MS-DOS lets
you do something very much like this:  "Turbo Pascal 5.0's built-in
linker automatically removes unused code and data when building an
[executable] file.  Procedures, functions, variables, and typed
constants that are part of the compilation, but never get referenced,
are removed in the [executable] file.  The removal of unused code takes
place on a per procedure basis, and the removal of unused data takes
place on a per declaration section basis."  (Source:  Turbo Pascal 5.0
Reference Guide, p. 220.)

But Borland cheated, sort of.  Turbo Pascal doesn't use the .OBJ file
format that Intel defined.  Instead, each separate compilation is a
"unit", more like a library than what a C programmer would think of as
an object file.  Since Borland defined what a unit looks like, they
could set it up to allow for smart linking.  (TP 5.0 and 5.5 can also
link in ordinary MS-DOS .OBJ files, but I think the implication is that
these are just dragged in whole hog.)  Since TP 5.5 offers object
oriented extensions, this smart linking can come in extremely handy.

I'm not sure if Pascal's block structure makes smart linking easier or
harder.

Paul S. R. Chisholm, AT&T Bell Laboratories
att!pegasus!psrc, psrc@pegasus.att.com, AT&T Mail !psrchisholm
I'm not speaking for the company, I'm just speaking my mind.

meissner@dg-rtp.dg.com (Michael Meissner) (12/14/89)

In article <600@fred.UUCP> bill@polygen.uucp (Bill Poitras) writes:

|  In article <9185@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
|  %In article <4290@pegasus.ATT.COM> dmt@pegasus.ATT.COM (Dave Tutelman) writes:
|  %%There IS one argument, in some cases a compelling one, for "one function
|  %%per file".  In general, linkers aren't smart enough to link just
|  %%PART of a binary file (.OBJ or .o), when that file contains a function
|  %%needed by the link.
|  %
|  %WHAT?  What year is this?  I don't think I've ever used a linker that
|  %didn't eliminate unused routines.  Any such linker would be seriously
|  %brain damaged.
|  %-- 
|  %Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com
|  Yes you have.  ANY linker you have does this.  What you are thinking of is 
|  a LIBRARY, ie. .lib file. (lib*.a if you're a Unix person), which when used in
|  the link process, only the functions used in the program begin linked, get 
|  linked.  Although I'm not a compiler/linker expert, I almost positive that 
|  this is true.

I suspect that most real world linkers work the way UNIX does (with
regard to loading the entire contents of an object file, instead of
just the 'functions' that are needed).

However, if the linker is this 'helpful', it breaks a debugging
paradim that we in DG langauges and also GNU C use.  Mainly, you
include some functions that are otherwise unused in the program that
take a pointer to some internal datatype, and prints the datatype out
in an implementation defined manner.  For example here is a fragment
of a dbx debugging session on GCC that calls the function 'debug_rtx'
to print out an RTL tree (reformatted to fit in 80 columns):

(2) Stopped in final (first=(rtx) 0x412010,
	file=(struct FILE *) 0x405820,
	write_symbols=NO_DEBUG,
	optimize=0,
	prescan=0), file final.c, line 537

537	    insn = final_scan_insn (insn, file, write_symbols, optimize,
(dbx) print insn
insn = (rtx) 0x4121b8
(dbx) print debug_rtx(insn)

(insn 7 6 8 (set (reg:SI 2)
       (symbol_ref:SI ("*@LC0"))) 89 (nil)
   (nil)
   (nil))
debug_rtx(insn) = 1
(dbx) 

--
--
Michael Meissner, Data General.
Until 12/15:	meissner@dg-rtp.DG.COM
After 12/15:	meissner@osf.org

ts@cup.portal.com (Tim W Smith) (12/31/89)

In article <9157@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>Absolutely.  Clear code shouldn't *need* a lot of comments; a
>programmer should be able to read it and understand what's going on
>from the routine names, the variable names, and the flow of control,
>with just a few added comments if any.  A lot of extraneous comments
>about things that would be perfectly clear just from reading the code
>actually damages code readability; the control structures become much
>harder to follow.

I tend to agree.  When I code, I tend to comment on the difficult parts,
and leave the obvious parts to speak for themselves.  On the other hand,
something that may be obvious to me because of my several years of
experience may not be obvious to a junior member of the staff who is
getting his first look at whatever arcane thing is being dealt with.

So, what should one do?  Include a lot of "extraneous" stuff to help
the jr. guys, or should one be minimalist to avoid distracting the
other sr. engineers when they look at the code?

I think we came up with a pretty good solution to this problem
where I work.

The comments in the source code are aimed at a person who has a
detailed knowledge of the field that the code is part of.  For
example, if the code was a Unix disk driver, it is assumed that
the person reading the code is an expert on Unix disk drivers.

Part of the final documentation for any code we write is something
we call "left hand pages".  This is a detailed commentary on the
code written by the implementor of the code.  We call it "left hand
pages" because we print a copy of the source code and put it in a
binder together with the "left hand pages" in a way that causes
one to have the source code on the right and the commentary on the
left.

The LHPs are aimed at a much lower level.  For example, if the code
is for a Unix disk driver, the LHPs would assume that you can spell
"Unix" and know that by "disk" we don't mean "Frisbee".  Well, perhaps
not quite this low a level...:-).

For example, a recent Unix disk driver we did had about 45k of source
code.  The LHPs came to about 120k.

As a side effect, we've discovered that quite a few bugs are found
when the implementor is producing the LHPs.

						Tim Smith

adw@otter.hpl.hp.com (David Wells) (01/04/90)

>decot@hpisod2 (Dave Decot) writes:
>The questions I want answered by comments are:
>	What do the possible values of this variable or type represent,
>	within the user's abstraction?
>
>[additional good comments and example deleted]
>
>The interesting thing about comments like this is you can grab them straight
>out of a specification document.  >> HINT!!! <<

Exactly. Only a tiny fraction of software can be effectively maintained without
a knowledge of its intended purpose. I look to comments to provide the
following information:

i) How the implementation relates to the next-higher level of specification.
For simple software, this may be the requirements spec., or it may be a
design based on the requirements, etc.

Examples: /* This algorithm positions the dialog box so as not to obscure the
             focussed window (S-12-35 para 3.32) */
          /* Doubly-linked list used here to achieve necessary browsing
             performance (section 5.3.1) */

ii) How the *actual* implementation relates to the *ideal* implementation. Many
languages don't have extensive (or any) support for complex types,
preconditions, postconditions, encapsulation, etc. (If the software is being
developed particularly carefully, there may be a specification at this
"ideal implementation" level, e.g. in pseudocode, and these comments will
reduce to i) above).

Examples: /* "token" must have been obtained from xxAllocate() */
          /* This structure is intended as a link in a doubly-linked list */
          /* This type is PRIVATE to module xx */

iii) WHY the implementation design decisions documented in i) and ii) were
taken. This is most-often-omitted, I think, and often causes frustration and
delay while the maintainer wonders "*why* did he/she do it like *that*!? Was
it a mistake, or was the original developer anticipating a spec. change I
haven't thought of..?" These comments are particularly useful when (as often)
the specification is tightened after "release 1".

Examples: /* singly-linked list used because backward scan speed unimportant */
          /* This table could be compressed, but will probably be small */
          /* Use green menus because it's my favourite colour */

In an ideal world, the comments would be coupled to the code via a tool
(e.g. WEB, folding editors) rather than simply plonked into it, and coupled
to the specification via a hypertext-like tool to facilitate crossreferencing
and allow parallel version-control.

Dave Wells