MDAY@XX.LCS.MIT.EDU ("Mark S. Day") (03/11/88)
Soft-Eng Digest Thu, 10 Mar 88 Volume 4 : Issue 16 Today's Topics: Configuration Management (5 msgs) Linkers (7 msgs) Configuration Management and Language Choice (5 msgs) ---------------------------------------------------------------------- Date: 17 Feb 88 17:29:03 GMT From: wor-mein!pete@uunet.uu.net (Pete Turner) Subject: Configuration Management >[W]ouldn't >you want the insert, delete, and search functions for a hash table >implementation in a single file "hash.c" for readability and maintenance? No, personally I wouldn't. I would put the files insert.c, delete.c and search.c (or HS_insert.c, HS_delete.c and HS_search.c) in the directory "hash". I just don't see any advantage in putting more than one function in each file. In this case, I think the boss has a good point. I've dealt with CM issues on large projects ( > 100K lines), involving a dozen or more developers, and things were a lot easier once we decided to have only one function per file. Also, it is a good idea to provide the "client" with a separate include file for each interface to a given "service". For example, if you're using a storage service (maybe a hash table implementation, maybe some other, you don't need to know as long as it performs to your reqirements) and you want to use the delete function, assuming you're writing in C, just include ST_delete.h and call ST_delete(....). ST_delete() may be a function call or it may be a macro - why should you care, as long as it works? ------------------------------ Date: 21 Feb 88 17:02:49 GMT From: linus!philabs!gcm!dc@husc6.harvard.edu (Dave Caswell) Subject: Configuration Management =The basic problem with keeping several procedures in one physical file =is that it becomes more difficult (both conceptually and physically) to =manipulate individual procedures. If you know that you never have =to treat a particular procedure as an individual unit, then placing =a group in one file makes more sense (eg, as with Hash_get, Hash_put, etc). The computer system we just finished is 73,000 lines. It is in 128 source files and has 1874 functions. I couldn't imagine the complexity of having 1874 separate files. How could a person possibly tell what is related to what? Each file reads from top to bottom like a book. We weren't at all concerned with time to load the editor. The project took 3 people-years or over 250,000$. The time was spent debugging, designing, and learning the application. It wasn't spent waiting for emacs to startup. ------------------------------ Date: 22 Feb 88 12:02:52 GMT From: ihnp4!homxb!whuts!mtune!akgua!sortac!pls@ucbvax.Berkeley.EDU (Pat Sullivan) Subject: Configuration Management >The library contains object modules, possibly with multiple entry points, >and if you reference one you get them all. ^^^^^^^^ This is true, but the statement is not entirely clear: if you reference one of the entry points in an *OBJECT*, even if you just refer to a global variable declared in an object, you get the entire object. You do not automatically get all the objects in the library (archive). This is one more reason to limit the contents of an object to only those that are tightly related. Pat Sullivan - {gatech|akgua|ihnp4}!sortac!pls - voice 404-257-7382 ------------------------------ Date: 26 Feb 88 16:18:09 GMT From: ptsfa!jmc@AMES.ARC.NASA.GOV (Jerry Carlin) Subject: Configuration Management >Recap: I pointed out that the Unix linker includes all the routines that >were compiled from one file if any single routine from that file is referenced. >Since other linkers are smart enough to include just the desired routine... Actually, since this group is comp.software-eng, I'd like to state my opinion that one function per source file is a good way to go. If multiple functions per source file are useful for a given situation, they should all be strongly related so that if you are planning to use one you would typically use all of them rendering the problem moot. Given the UNIX V.3 shared library facility where only one copy of the routines is needed and is not present in all executables, the desire to limit size of binaries to the minimum has another way of being solved. Jerry Carlin (415) 823-2441 {ihnp4,lll-crg,ames,qantel,pyramid}!ptsfa!jmc soon: {ihnp4,lll-crg,ames,qantel,pyramid}!pacbell!ptsfa!jmc ------------------------------ Date: 5 Mar 88 03:26:29 GMT From: trwrb!aero!venera.isi.edu!raveling@ucbvax.Berkeley.EDU (Paul Raveling) Subject: Configuration Management I believe the best software engineering criterion to use is to organize functions into modules in such a way as to minimize overall complexity. The IBM (Itty Bitty Modules) approach leads to excess complexity and lack of appropriate structure in any but the simplest software. I've worked with two sets of software that used this approach and paid for it in maintainability -- One was OS/360, the other was various MIL spec software, mainly for air data computers. For example, one function per file eliminates the ability to share a set of data among related functions, but to encapsulate it within this set. The biggest maintenance headaches I've encountered have tended to be tracking down accesses to public data. On the other hand, lumping too many functions into the same source file produces the same kind of lack of structure and lack of encapsulation as one function per source file. In my experience the best-organized software probably has averaged 4-6 functions per module, but it's inappropriate to look for a general rule of "n functions per module". Minimizing complexity sometimes DOES dictate one function in many files, but sometimes it may be 10. Paul Raveling Raveling@vaxa.isi.edu ------------------------------ Date: 23 Feb 88 14:35:43 GMT From: uh2@psuvm.bitnet (Lee Sailer) Subject: Linkers How does this "smart linker" business tie into the "shared libraries" in Unix V.3. As I understand it, (1) when I need a module, the whole library is loaded, but (2) when another program needs a module from the library, it shares the core image that is already in memory. So, for example, at any moment, there is only one copy of all the stdio (that's standard input-output in Unix-speak) stuff in memory at any given moment, and all programs that need it share. (This also makes the executables smaller and saves disk space and load time.) Just asking, Lee ------------------------------ Date: 1 Mar 88 20:43:39 GMT From: mcvax!enea!sommar@uunet.uu.net (Erland Sommarskog) Subject: Linkers Well, I know nothing of shared libraries or even System V.3 as such. But I guess it looks much like shareable images in VMS. If you really want to save space for your binaries under VMS, you put them in a shareable image. No matter how many of these procedure you call, none will be included. Mere references to the shared image. Slowly I am beginning to realize that this concept is not standard under Unix. Well, that explains why even the simplest of programs exceeds 100 kbytes when linked. (Pascal, f77 and Ada) Library routines, or even entire libraries, in the langauge environment are included in my private executeable. Needless to say, all such routines are provided in shareable images in VMS, unless you explcitly tell the linker not to use them. To make it even more fun, VMS permits you to install these images just like other heavily used programs like compilers, editors and usual utilities are. My exact notion of this "installation" is uncertain, but if I'm right, but I belive that it is the file header is constantly loaded into physical memory. (To INSTALL may also involve other things, such as privileges, but that is out of the subject.) Does Unix have such a concept? As a whole: Many Unix-fans have reacted on the critics on the Unix linker with: "It does what you want, just if you use in the right way." Remember that strikes back on you, the occassion you flame another OS. Some manouvers are the way to go under Unix, but meets problems under VMS. And vice verca. Often because you don't know the best way under the another operating system. But if you look, you very often find out that you can easily do what you like, "just if you use it the right way." But sometimes you fall flat. And depending where you stumble, you pick your favourite system, which doesn't have to be Unix by necessity. It's not mine. Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP ------------------------------ Date: 24 Feb 88 13:53:55 GMT From: mnetor!utzoo!yunexus!geac!daveb@uunet.uu.net (David Collier-Brown) Subject: Linkers [...] the Unix linker was a conscious cheap-and-dirty. The Multics system avoided the whole IDEA of static linkers[1], and most if not all commercial systems not derived from Unix have better linkers. Good Lord, the IBM /360 had a better linker than Unix! (And I wouldn't recommend the /3sickly and its linker to my worst enemy). In order to learn C, a non-unix programmer of my acquaintance ported a subset compiler (Ron Cain's Small C), and taught it to generate code for his assembler/linker set, placing each function in a linkable "procedure record", and emitting "symref records" for all externally required datums of the function, including a symref to a (specially named) record which contained the static data for the module (ie, the file-level statics). Not hard at all. A suitable project for learning the language... In pseudo-linkeese: DSECT _.filename DW 1 ; static int foo; /* A file-level static */ SYMDEF _function CSECT _function LD A1,Sp ; function(p,q) char *p, *q; { LD D1,_.filename+0 ; if (foo) { ... SYMREF _.filename --dave (those who know not history.... piss me off) c-b [1] It had a thing called "binder", which produced almost-fully-resolved modules, more or less for use as efficiently-loadable public libraries. -- David Collier-Brown. {mnetor yunexus utgpu}!geac!daveb Geac Computers International Inc., 350 Steelcase Road,Markham, Ontario, CANADA, L3R 1B3 (416) 475-0525 x3279 ------------------------------ Date: 24 Feb 88 22:34:46 GMT From: mnetor!utzoo!utgpu!water!watmath!dvlmarv!alanm@uunet.uu.net (Alan Matsuoka) Subject: Linkers >Now, why can't you do this with a single file such as what > cc -c ugh.c >gives you? Not because the linker is stupid, but because it is in >general impossible. Suppose you have > > cat <<EOF >foo.c > static f(...){...} > static g(...){...} > h(...){... f() ...} > i(...){... g() ...} > EOF > >If you use h(), you'd like just h() and f(), right? But how is the >linker supposed to know that h() uses f()? The compiler has to tell >it, and UNIX compilers don't do that. Yes, but only when the loader text is defined in the UNIX tradition. The problem here is the fact that all symbols and their references are defined relative to a single compilation unit. If the loader text contained directives ( like many other systems ) that would allow you to define seprarately named sections, then it isn't too hard to write a linker that can accomplish the same thing as having separate files. The problem is really one of granularity. I suppose that it wouldn't too hard to then build the static calling graph of the code ( why not ? Gprof can do it ), sort the addresses and rearrange the code during the final writing phase to allow for better locality. The other issue is that in the context of UNIX systems a lot of people don't really care if there is some dead code loaded or not. In the case of a heavily loaded system on a small machine, I WOULD care but in view of the fact that horsepower is getting cheaper and memory even more so it becomes less of an issue. On the other hand, I can remember somebody else pointing out that if you can improve the execution time of a program running on something like a heavily loaded 3090 by 1% , then you free up enough MIPS to equal 100 PC's ( or something like that). > In fact, there's no law that >says a smart compiler can't notice that h() and i() look pretty similar >and decide to share some code between them. Indeed, in the special >case of string literals, there are lots of C compilers that DO this. And in some cases code as well. As I can recall, some experimental code space optimizers can look at the entire Hierarchical directed graph of a compilation unit, find the common subgraphs, create the appropriate functions and procedures and their calls and build a program that runs in a smaller code space. Sorry, I don't know of any existing C compilers that will do this. The one that I saw was written at a university for Pascal and it was nowhere near to becoming a production compiler. ------------------------------ Date: 25 Feb 88 22:01:57 GMT From: metavax!chris@umix.cc.umich.edu ( PSA) Subject: Linkers [...] Ok, so now I have a new question: Unix has been around a few years, I would hope that those who are developing new versions of Unix know something of its history. Therefore, they know that the linker is "cheap and dirty", and does not contain functionality clearly evident in linkers on other systems, so where is the non-"cheap and dirty" linker? Is there another way to do what these non-Unix linkers do? Just to clarify, here's exactly what I want to be able to do: I have libraries with multiple subprograms which are related in functionality. I want to be able to compile each of these libraries all at once, and then when linking an executable extract only the individual subprograms the executable requires. I now do this on VM/CMS, VAX/VMS, and MVS/TSO. Others at my organization claim this can be done on Unisys 1100 machines. Other posters on this group have stated this works on RSTS and PDP-11's. ------ /MM/\MM\ META SYSTEMS, LTD. /MM/ \MM\ 315 E. Eisenhower /MM/ /\ \MM\ Suite 200 === == === Ann Arbor, MI 48108 \SS\ \/ /SS/ \SS\ /SS/ Chris Collins, Senior Programmer \SS\/SS/ ------ ------------------------------ Date: 26 Feb 88 10:26:04 GMT From: quintus!ok@unix.sri.com (Richard A. O'Keefe) Subject: Linkers I suggest that you take a look at some of the limits of the /360 linker (e.g. number of entry points per load module). If you want overlays, it may well be just what you want. (The BSD ld(1) is definitely not state-of-the-art with respect to overlays. Thank goodness.) The problem is not the UNIX **linker**. ld(1) is perfectly capable of pulling out just the pieces it needs *IF IT IS GIVEN THE RIGHT KIND OF FILE*. The problem is the UNIX **compilers**, which don't generate that sort of file. There is no reason in principle why the Fortran compiler, for argument's sake, couldn't generate a '.a' file instead of a '.o' file. In fact you can hack that with a shell script: #!/bin/sh # NAME: fca # SYNOPSIS: fca x.f y.f .... # DESCRIPTION: much the same as f77 -c x.f y.f ... # except that it generates .a files rather than .o files # BUGS: this thing does NO error checking at all! # Directory=tmp$$ mkdir $Directory cd $Directory for File in $* do (cd .. ; cat $File) | fsplit f77 -c *.f Archive=../`basename $File .f`.a if -f $Archive then rm $Archive fi ar q $Archive *.o rm *.o ranlib $Archive done cd .. rm -r $Directory exit What's the problem with doing this to C? Well, what do you do with static variables and functions? (If you can solve this, you can solve the "shared literals" problem.) The problem is that if a C source file is split into pieces (separately loadable segments), the static variables and functions must be visible to other pieces *from the same source file*. So you need THREE levels of names: -- names which are strictly local to a single segment (e.g. labels, static variables inside functions) -- names which are visible within a cluster of segments, but not outside that cluster (e.g. shared literals, static variables and functions at file level) -- names which are visible between clusters. ld(1) only provides two levels of names. The missing level could be simulated by taking a timestamp and the cpuid and using them as a prefix. For example, static int fred; might turn into M.1200005b22254953.fred {that's what /* do this once per source file */ sprintf(prefix, "M.%8lx%8lx.", gethostid(), time((long*)0)); /* do this once per file-level static symbol */ printf("%s%s\n", prefix, "fred"); just printed on my terminal. } This is not entirely satisfactory, and requires a loader without any stupid restrictions on the lengths of names (NOT one of the /360's features...), but combining this with the fca script above shows that there is no reason why a UNIX compiler could not provide the required feature without requiring any change to the loader. Of course, the debuggers might give you some trouble too... How about someone providing this as an option in GNU CC? I know that some other loaders also support only two levels of names. It wouldn't surprise me if the VMS loader supported three. But since most old loaders were developed with Fortran and such in mind, and since Fortran only needs two levels of names, I'd be surprised if many of them had the three levels that C needs. We could profitably turn this into a survey of what the linker requirements of various languages are: could an ADA compiler easily use the UNIX / VMS / MVS / PR1MOS linker? ------------------------------ Date: 1 Mar 88 21:01:04 GMT From: necntc!linus!philabs!pwa-b!mmintl!franka@AMES.ARC.NASA.GOV (Frank Adams) Subject: Linkers |could an ADA compiler easily |use the UNIX / VMS / MVS / PR1MOS linker? In a word, no. A linker for Ada must check for matching of user-defined types in different modules. Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108 ------------------------------ Date: 23 Feb 88 13:33:00 GMT From: apollo!marc@eddie.mit.edu (Marc Gibian) Subject: Configuration Management and Language Choice >The natural package in C >is one where the implementation "secrets" are kept as static global >data structures (and internal support routines are static as well, to >avoid name clashes with the client). To do this, of course, the visible >operations must be in the same file at compile time -- and I argue >that they form a unified abstraction that *should* be in one file. If This is simply one of the limitations of the c language which should be considered when going through the language selection process. If strict configuration management is highly valued in your project, then this becomes a significant vote -AGAINST- c. It seems that very few projects go through a language selection process. Instead, a language is chosen because it is the -IN- language. I believe that if more attention were paid to this issue, there would be fewer cases of people trying to make particular languages do what they just plain can not do, or can not do well. c is a good language for many projects. But it tends to get into trouble as the size of a project grows. There have been many fine articles on this subject and I do not intend to write my own here. I simply want to point out that there are many projects out there using c that probably should be using some other language. And this results in a great deal of agony for the engineers working on these projects. Marc S. Gibian email: marc@apollo.UUCP ------------------------------ Date: 25 Feb 88 11:55:15 GMT From: mcvax!enea!sommar@uunet.uu.net (Erland Sommarskog) Subject: Configuration Management and Language Choice As this discussion have continued, I'm getting quite convinced that the issue is quite related to the language in use. Some language may support one of the different approaches better than the other. First of all the language must be modular at least in some sense. In standard Pascal you would have to put the whole project in one file :-) I have worked a good deal with EriPascal, which is a Pascal extention by a famous Swedish telephone company with modulariztion and real-time features, reminding of Modula-2. Anyway, EriPascal does not allow variables to be exported, so the hash table must in be one module. In Ada both flavours are available, although I would recommend to have them in all in one file as a package, this is the natural Ada approach. Specifically this is very useful, when constructing a generic unit. Note, however, that Ada permits procedures in a package to be separate, which is the way to go when the procedure gets big. As for C, which have been mentioned most, one procedure - one file may be better, but since I don't know C I don't really have any opinion. Generally I think that modules should be kept small in size, just as they are structured in some way. Typing DIR and getting 120 source files listed is just a night mare. As Pete pointed out: put related files in a directory. Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP ------------------------------ Date: 25 Feb 88 14:16:54 GMT From: uh2@psuvm.bitnet (Lee Sailer) Subject: Configuration Management and Language Choice SAS is a million lines of source, and is written in C, though I believe it is a traslation from the original PL/1. They must have some extra tools they use--are they homebrew or commercial? ------------------------------ Date: 28 Feb 88 04:59:21 GMT From: quintus!ok@unix.sri.com (Richard A. O'Keefe) Subject: Configuration Management and Language Choice The story I heard is that SAS was originally written in /370 assembly code. In 1979, if you asked them "when can we get it for other machines" their joke was "wait 5 years, and we'll send you a /370 chip with the tape." I believe that the rewrite into PL/I enabled the VAX/VMS and PR1ME/PRIMOS ports, and that the rewrite into C was motivated by the fact that PL/I isn't all that common on micros and workstations... SAS is to VM/CMS what AWK is to UNIX, only more so. ------------------------------ Date: Fri, 4 Mar 88 04:05:47 PST From: metavax!john@umix.cc.umich.edu (John Mitchell ) Subject: Configuration Management and Language Choice My understanding is that SAS develops on IBM systems. A few years ago they decided to convert their software to C, but there was not an adequate C compiler available on IBM mainframes. So they ported the Lattice C compiler to IBM VM/CMS and MVS/TSO, this compiler was so successful internally that SAS decided to market it as a product. This compiler creates object code and has its own link editor that allows you to link in one subroutine from a source file that contains many subroutines. I am not affiliated with SAS in any way, other than as a customer. These are my own views, and are not necessarily shared by my employer. john ------------------------------ End of Soft-Eng Digest ****************************** -------