[comp.software-eng] Soft-Eng Digest V4 #16

MDAY@XX.LCS.MIT.EDU ("Mark S. Day") (03/11/88)
Soft-Eng Digest             Thu, 10 Mar 88       Volume 4 : Issue  16 

Today's Topics:
                   Configuration Management (5 msgs)
                           Linkers (7 msgs)
          Configuration Management and Language Choice (5 msgs)

----------------------------------------------------------------------

Date: 17 Feb 88 17:29:03 GMT
From: wor-mein!pete@uunet.uu.net  (Pete Turner)
Subject: Configuration Management 

>[W]ouldn't
>you want the insert, delete, and search functions for a hash table
>implementation in a single file "hash.c" for readability and maintenance?

No, personally I wouldn't. I would put the files insert.c, delete.c
and search.c (or HS_insert.c, HS_delete.c and HS_search.c) in the directory
"hash". I just don't see any advantage in putting more than one function
in each file.

In this case, I think the boss has a good point. I've dealt with CM issues
on large projects ( > 100K lines), involving a dozen or more developers,
and things were a lot easier once we decided to have only one function
per file. Also, it is a good idea to provide the "client" with a separate 
include file for each interface to a given "service". For example, if you're 
using a storage service (maybe a hash table implementation, maybe some other,
you don't need to know as long as it performs to your reqirements) and you
want to use the delete function, assuming you're writing in C, just include
ST_delete.h and call ST_delete(....). ST_delete() may be a function call or
it may be a macro - why should you care, as long as it works?

------------------------------

Date: 21 Feb 88 17:02:49 GMT
From: linus!philabs!gcm!dc@husc6.harvard.edu  (Dave Caswell)
Subject: Configuration Management

=The basic problem with keeping several procedures in one physical file
=is that it becomes more difficult (both conceptually and physically) to
=manipulate individual procedures.  If you know that you never have
=to treat a particular procedure as an individual unit, then placing
=a group in one file makes more sense (eg, as with Hash_get, Hash_put, etc).

The computer system we just finished is 73,000 lines.  It is in 128 source
files and has 1874 functions.  I couldn't imagine the complexity of having
1874 separate files.   How could a person possibly tell what is related to 
what?   Each file reads from top to bottom like a book.  We weren't at all
concerned with time to load the editor.    The project took 3 people-years
or over 250,000$.    The time was spent debugging, designing, and learning
the application.  It wasn't spent waiting for emacs to startup.

------------------------------

Date: 22 Feb 88 12:02:52 GMT
From: ihnp4!homxb!whuts!mtune!akgua!sortac!pls@ucbvax.Berkeley.EDU  (Pat Sullivan)
Subject: Configuration Management

 >The library contains object modules, possibly with multiple entry points,
 >and if you reference one you get them all.
                                  ^^^^^^^^
This is true, but the statement is not entirely clear: if you reference
one of the entry points in an *OBJECT*, even if you just refer to a global
variable declared in an object, you get the entire object.  You do not
automatically get all the objects in the library (archive).
This is one more reason to limit the contents of an object to only those
that are tightly related.

Pat Sullivan - {gatech|akgua|ihnp4}!sortac!pls - voice 404-257-7382

------------------------------

Date: 26 Feb 88 16:18:09 GMT
From: ptsfa!jmc@AMES.ARC.NASA.GOV  (Jerry Carlin)
Subject: Configuration Management

>Recap: I pointed out that the Unix linker includes all the routines that
>were compiled from one file if any single routine from that file is referenced.
>Since other linkers are smart enough to include just the desired routine...

Actually, since this group is comp.software-eng, I'd like to state my
opinion that one function per source file is a good way to go. If multiple
functions per source file are useful for a given situation, they should
all be strongly related so that if you are planning to use one you would
typically use all of them rendering the problem moot.

Given the UNIX V.3 shared library facility where only one copy of the
routines is needed and is not present in all executables, the desire to 
limit size of binaries to the minimum has another way of being solved.

Jerry Carlin (415) 823-2441 {ihnp4,lll-crg,ames,qantel,pyramid}!ptsfa!jmc
soon: {ihnp4,lll-crg,ames,qantel,pyramid}!pacbell!ptsfa!jmc

------------------------------

Date: 5 Mar 88 03:26:29 GMT
From: trwrb!aero!venera.isi.edu!raveling@ucbvax.Berkeley.EDU  (Paul Raveling)
Subject: Configuration Management

	I believe the best software engineering criterion to use
	is to organize functions into modules in such a way as to
	minimize overall complexity.  

	The IBM (Itty Bitty Modules) approach leads to excess complexity
	and lack of appropriate structure in any but the simplest
	software.  I've worked with two sets of software that used
	this approach and paid for it in maintainability -- 
	One was OS/360, the other was various MIL spec software,
	mainly for air data computers.

	For example, one function per file eliminates the ability
	to share a set of data among related functions, but to
	encapsulate it within this set.  The biggest maintenance
	headaches I've encountered have tended to be tracking down
	accesses to public data.

	On the other hand, lumping too many functions into the
	same source file produces the same kind of lack of structure
	and lack of encapsulation as one function per source file.

	In my experience the best-organized software probably has
	averaged 4-6 functions per module, but it's inappropriate
	to look for a general rule of "n functions per module".
	Minimizing complexity sometimes DOES dictate one function
	in many files, but sometimes it may be 10.


Paul Raveling
Raveling@vaxa.isi.edu

------------------------------

Date: 23 Feb 88 14:35:43 GMT
From: uh2@psuvm.bitnet  (Lee Sailer)
Subject: Linkers

How does this "smart linker" business tie into the "shared libraries"
in Unix V.3.  As I understand it, (1) when I need a module, the whole
library is loaded, but (2) when another program needs a module from
the library, it shares the core image that is already in memory.

So, for example, at any moment, there is only one copy of all the stdio
(that's standard input-output in Unix-speak) stuff in memory at any given
moment, and all programs that need it share.  (This also makes the
executables smaller and saves disk space and load time.)

                                     Just asking,       Lee

------------------------------

Date: 1 Mar 88 20:43:39 GMT
From: mcvax!enea!sommar@uunet.uu.net  (Erland Sommarskog)
Subject: Linkers

Well, I know nothing of shared libraries or even System V.3 as such.
But I guess it looks much like shareable images in VMS. 
  If you really want to save space for your binaries under VMS, you put 
them in a shareable image. No matter how many of these procedure you 
call, none will be included. Mere references to the shared image. 
  Slowly I am beginning to realize that this concept is not standard
under Unix. Well, that explains why even the simplest of programs
exceeds 100 kbytes when linked. (Pascal, f77 and Ada) Library routines,
or even entire libraries, in the langauge environment are included in
my private executeable. Needless to say, all such routines are provided
in shareable images in VMS, unless you explcitly tell the linker not to 
use them.

To make it even more fun, VMS permits you to install these images
just like other heavily used programs like compilers, editors and usual 
utilities are. My exact notion of this "installation" is uncertain,
but if I'm right, but I belive that it is the file header is constantly
loaded into physical memory. (To INSTALL may also involve other things,
such as privileges, but that is out of the subject.) Does Unix have 
such a concept?

As a whole: Many Unix-fans have reacted on the critics on the Unix
linker with: "It does what you want, just if you use in the right
way." Remember that strikes back on you, the occassion you flame another
OS. Some manouvers are the way to go under Unix, but meets problems
under VMS. And vice verca. Often because you don't know the best way
under the another operating system. But if you look, you very often find
out that you can easily do what you like, "just if you use it the right 
way." But sometimes you fall flat. And depending where you stumble, you 
pick your favourite system, which doesn't have to be Unix by necessity. 
It's not mine.

Erland Sommarskog       
ENEA Data, Stockholm        
sommar@enea.UUCP

------------------------------

Date: 24 Feb 88 13:53:55 GMT
From: mnetor!utzoo!yunexus!geac!daveb@uunet.uu.net  (David Collier-Brown)
Subject: Linkers

[...] the Unix linker was a conscious cheap-and-dirty.  The Multics
system avoided the whole IDEA of static linkers[1], and most if not
all commercial systems not derived from Unix have better linkers.
Good Lord, the IBM /360 had a better linker than Unix! (And I
wouldn't recommend the /3sickly and its linker to my worst enemy).  

   In order to learn C, a non-unix programmer of my acquaintance
ported a subset compiler (Ron Cain's Small C), and taught it to
generate code for his assembler/linker set, placing each function in
a linkable "procedure record", and emitting "symref records" for all
externally required datums of the function, including a symref to a
(specially named) record which contained the static data for the
module (ie, the file-level statics).  

  Not hard at all.  A suitable project for learning the language...

In pseudo-linkeese:
DSECT _.filename
	DW 1	    ; static int foo; /* A file-level static */
SYMDEF _function
CSECT _function
	LD A1,Sp    ; function(p,q) char *p, *q; {
	LD D1,_.filename+0 ; if (foo) {
	...
SYMREF _.filename

--dave (those who know not history.... piss me off) c-b

[1] It had a thing called "binder", which produced
    almost-fully-resolved modules, more or less for use as
    efficiently-loadable public libraries.
-- 
 David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb
 Geac Computers International Inc.,   
 350 Steelcase Road,Markham, Ontario, 
 CANADA, L3R 1B3 (416) 475-0525 x3279 

------------------------------

Date: 24 Feb 88 22:34:46 GMT
From: mnetor!utzoo!utgpu!water!watmath!dvlmarv!alanm@uunet.uu.net  (Alan Matsuoka)
Subject: Linkers


>Now, why can't you do this with a single file such as what
>	cc -c ugh.c
>gives you?  Not because the linker is stupid, but because it is in
>general impossible.  Suppose you have
>
>	cat <<EOF >foo.c
>	static f(...){...}
>	static g(...){...}
>	h(...){... f() ...}
>	i(...){... g() ...}
>	EOF
>
>If you use h(), you'd like just h() and f(), right?  But how is the
>linker supposed to know that h() uses f()?  The compiler has to tell
>it, and UNIX compilers don't do that.  

Yes, but only when the loader text  is defined in the UNIX tradition.
The problem here is the fact that all symbols and their references
are defined relative to a single compilation unit. If the loader text
contained directives ( like many other systems ) that would allow
you to define seprarately named sections, then it isn't too
hard to write a linker that can  accomplish the same thing as having
separate files. The problem is really one of granularity.

I suppose that it wouldn't too hard to then build the static calling
graph of the code ( why not ? Gprof can do it ), sort the addresses
and rearrange the code during the final writing phase to
allow for better locality.

The other issue is that in the context of UNIX systems a lot
of people don't really care if there is some dead code loaded or not.
In the case of a heavily loaded system on a small machine, I WOULD
care but in view of the fact that horsepower is getting cheaper and
memory even more so it becomes less of an issue.

On the other hand, I can remember somebody else pointing out that
if you can improve the execution time of a program running on something
like a heavily loaded 3090 by 1% , then you free up enough MIPS to
equal 100 PC's ( or something like that).

> In fact, there's no law that
>says a smart compiler can't notice that h() and i() look pretty similar
>and decide to share some code between them.  Indeed, in the special
>case of string literals, there are lots of C compilers that DO this.

And in some cases code as well. As I can recall, some experimental
code space optimizers can look at the entire Hierarchical directed
graph of a compilation unit, find the common subgraphs, create
the appropriate functions and procedures and their calls
and build a program that runs in a smaller code space.
Sorry, I don't know of any existing C compilers that will do this.
The one that I saw was written at a university for Pascal and it was nowhere
near to becoming a production compiler.

------------------------------

Date: 25 Feb 88 22:01:57 GMT
From: metavax!chris@umix.cc.umich.edu  ( PSA)
Subject: Linkers

[...]

Ok, so now I have a new question: Unix has been around a few years, 
I would hope that those who are developing new versions of Unix know 
something of its history.  Therefore, they know that the linker is 
"cheap and dirty", and does not contain functionality clearly evident
in linkers on other systems, so where is the non-"cheap and dirty" 
linker?  

Is there another way to do what these non-Unix linkers do?  Just to
clarify, here's exactly what I want to be able to do:  I have libraries
with multiple subprograms which are related in functionality.  I want
to be able to compile each of these libraries all at once, and then 
when linking an executable extract only the individual subprograms
the executable requires.  I now do this on VM/CMS, VAX/VMS, and MVS/TSO.
Others at my organization claim this can be done on Unisys 1100 machines.
Other posters on this group have stated this works on RSTS and PDP-11's.

    ------ 
   /MM/\MM\          META SYSTEMS, LTD.
  /MM/  \MM\         315 E. Eisenhower
 /MM/ /\ \MM\            Suite 200
 ===  ==  ===       Ann Arbor, MI  48108
 \SS\ \/ /SS/
  \SS\  /SS/        Chris Collins, Senior Programmer
   \SS\/SS/
    ------

------------------------------

Date: 26 Feb 88 10:26:04 GMT
From: quintus!ok@unix.sri.com  (Richard A. O'Keefe)
Subject: Linkers

I suggest that you take a look at some of the limits of the /360 linker
(e.g. number of entry points per load module).  If you want overlays,
it may well be just what you want.  (The BSD ld(1) is definitely not
state-of-the-art with respect to overlays.  Thank goodness.)

The problem is not the UNIX **linker**.  ld(1) is perfectly capable of
pulling out just the pieces it needs *IF IT IS GIVEN THE RIGHT KIND OF
FILE*.  The problem is the UNIX **compilers**, which don't generate
that sort of file.  There is no reason in principle why the Fortran
compiler, for argument's sake, couldn't generate a '.a' file instead
of a '.o' file.  In fact you can hack that with a shell script:

	#!/bin/sh
	#   NAME:  fca
	#   SYNOPSIS:  fca x.f y.f ....
	#   DESCRIPTION: much the same as f77 -c x.f y.f ...
	#   except that it generates .a files rather than .o files
	#   BUGS: this thing does NO error checking at all!
	#
	Directory=tmp$$
	mkdir $Directory
	cd $Directory
	for File in $*
	    do
		(cd .. ; cat $File) | fsplit
		f77 -c *.f
		Archive=../`basename $File .f`.a
		if -f $Archive 
		    then
			rm $Archive
		    fi
		ar q $Archive *.o
		rm *.o
		ranlib $Archive
	    done
	cd ..
	rm -r $Directory
	exit

What's the problem with doing this to C?  Well, what do you do with
static variables and functions?  (If you can solve this, you can solve
the "shared literals" problem.)  The problem is that if a C source
file is split into pieces (separately loadable segments), the static
variables and functions must be visible to other pieces *from the
same source file*.  So you need THREE levels of names:

    --	names which are strictly local to a single segment (e.g. labels,
	static variables inside functions)

    --	names which are visible within a cluster of segments, but
	not outside that cluster (e.g. shared literals, static
	variables and functions at file level)

    --	names which are visible between clusters.

ld(1) only provides two levels of names.  The missing level could be
simulated by taking a timestamp and the cpuid and using them as a
prefix.  For example,
	static int fred;
might turn into
	M.1200005b22254953.fred
{that's what
	/* do this once per source file */
	sprintf(prefix, "M.%8lx%8lx.", gethostid(), time((long*)0));
	/* do this once per file-level static symbol */
	printf("%s%s\n", prefix, "fred");
 just printed on my terminal.
}
This is not entirely satisfactory, and requires a loader without any
stupid restrictions on the lengths of names (NOT one of the /360's
features...), but combining this with the fca script above shows that
there is no reason why a UNIX compiler could not provide the required
feature without requiring any change to the loader.  Of course, the
debuggers might give you some trouble too...

How about someone providing this as an option in GNU CC?

I know that some other loaders also support only two levels of names.
It wouldn't surprise me if the VMS loader supported three.  But since
most old loaders were developed with Fortran and such in mind, and since
Fortran only needs two levels of names, I'd be surprised if many of them
had the three levels that C needs.

We could profitably turn this into a survey of what the linker
requirements of various languages are:  could an ADA compiler easily
use the UNIX / VMS / MVS / PR1MOS linker?

------------------------------

Date: 1 Mar 88 21:01:04 GMT
From: necntc!linus!philabs!pwa-b!mmintl!franka@AMES.ARC.NASA.GOV  (Frank Adams)
Subject: Linkers

|could an ADA compiler easily
|use the UNIX / VMS / MVS / PR1MOS linker?

In a word, no.  A linker for Ada must check for matching of user-defined
types in different modules.


Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

------------------------------

Date: 23 Feb 88 13:33:00 GMT
From: apollo!marc@eddie.mit.edu  (Marc Gibian)
Subject: Configuration Management and Language Choice

>The natural package in C
>is one where the implementation "secrets" are kept as static global
>data structures (and internal support routines are static as well, to
>avoid name clashes with the client).  To do this, of course, the visible
>operations must be in the same file at compile time -- and I argue
>that they form a unified abstraction that *should* be in one file.  If

This is simply one of the limitations of the c language which should be
considered when going through the language selection process.  If
strict configuration management is highly valued in your project, then
this becomes a significant vote -AGAINST- c.

It seems that very few projects go through a language selection process.
Instead, a language is chosen because it is the -IN- language.  I believe
that if more attention were paid to this issue, there would be fewer
cases of people trying to make particular languages do what they just
plain can not do, or can not do well.

c is a good language for many projects.  But it tends to get into trouble
as the size of a project grows.  There have been many fine articles on
this subject and I do not intend to write my own here.  I simply want
to point out that there are many projects out there using c that
probably should be using some other language.  And this results in
a great deal of agony for the engineers working on these projects.

Marc S. Gibian
email:  marc@apollo.UUCP

------------------------------

Date: 25 Feb 88 11:55:15 GMT
From: mcvax!enea!sommar@uunet.uu.net  (Erland Sommarskog)
Subject: Configuration Management and Language Choice

As this discussion have continued, I'm getting quite convinced that the
issue is quite related to the language in use. Some language may support
one of the different approaches better than the other. First of all the
language must be modular at least in some sense. In standard Pascal you
would have to put the whole project in one file :-)

I have worked a good deal with EriPascal, which is a Pascal extention
by a famous Swedish telephone company with modulariztion and real-time
features, reminding of Modula-2. Anyway, EriPascal does not allow variables
to be exported, so the hash table must in be one module.

In Ada both flavours are available, although I would recommend to have
them in all in one file as a package, this is the natural Ada approach.
Specifically this is very useful, when constructing a generic unit.
Note, however, that Ada permits procedures in a package to be separate,
which is the way to go when the procedure gets big.

As for C, which have been mentioned most, one procedure - one file
may be better, but since I don't know C I don't really have any opinion.

Generally I think that modules should be kept small in size, just
as they are structured in some way. Typing DIR and getting 120
source files listed is just a night mare. As Pete pointed out: put
related files in a directory.

Erland Sommarskog       
ENEA Data, Stockholm        
sommar@enea.UUCP           

------------------------------

Date: 25 Feb 88 14:16:54 GMT
From: uh2@psuvm.bitnet  (Lee Sailer)
Subject: Configuration Management and Language Choice

SAS is a million lines of source, and is written in C, though I believe
it is a traslation from the original PL/1.  They must have some extra
tools they use--are they homebrew or commercial?

------------------------------

Date: 28 Feb 88 04:59:21 GMT
From: quintus!ok@unix.sri.com  (Richard A. O'Keefe)
Subject: Configuration Management and Language Choice

The story I heard is that SAS was originally written in /370 assembly
code.  In 1979, if you asked them "when can we get it for other machines"
their joke was "wait 5 years, and we'll send you a /370 chip with the
tape."  I believe that the rewrite into PL/I enabled the VAX/VMS and
PR1ME/PRIMOS ports, and that the rewrite into C was motivated by the
fact that PL/I isn't all that common on micros and workstations...

SAS is to VM/CMS what AWK is to UNIX, only more so.

------------------------------

Date: Fri, 4 Mar 88 04:05:47 PST
From: metavax!john@umix.cc.umich.edu  (John Mitchell )
Subject: Configuration Management and Language Choice

My understanding is that SAS develops on IBM systems.  A few years ago they
decided to convert their software to C, but there was not an adequate C
compiler available on IBM mainframes.  So they ported the Lattice C compiler
to IBM VM/CMS and MVS/TSO,  this compiler was so successful internally that
SAS decided to market it as a product.

This compiler creates object code and has its own link editor that allows
you to link in one subroutine from a source file that contains many 
subroutines.  I am not affiliated with SAS in any way, other than as a 
customer.  These are my own views, and are not necessarily shared by my
employer.

john      

------------------------------

End of Soft-Eng Digest
******************************
-------