[comp.unix.wizards] Is your system polluted?

pcg@aber-cs.UUCP (Piercarlo Grandi) (12/23/89)

In article <8912211630.aa04575@ICS.UCI.EDU> rfg@ICS.UCI.EDU writes:
    
    As part of the work I'm doing on protoize/unprotoize, I decided that it would
    be a good idea to be able to find out (for any given system) what the
    names of all of the functions declared in system include files are.
    I wrote the following script to do part of the job.
    
    The results that I got from running this script on one system are very
    saddening.  It appears that (for some systems at least) there is an awful
    lot of pollution of various name spaces contained in the system include
    files.  Specifically, there are lots of clashes of names where one name
    is used for two (or more) different things in two (or more) different
    include files.  This means that you may/will get errors if particular
    pairs of include files are included into the same base file. :-(

Actually things are even worse than Ron Guilmette says. Not only a lot
of second rate hackers put duplicate names in system headers, but they do
the following things as well:

	1) internal kernel entities are declared in headers for application
	use. A very bad offender here is System V.3.2, some BSD versions
	make an attempt at least to bracket these within #ifdef KERNEL
	#endif (which is still unsatisfactory).

	2) a more generic problem is that a lot of user level packages
	declare in the headers also entities that are only used internally
	to it.

	3) even worse, a lot of libraries contain externals that are not
	declared static. This is very dangerous, because you may unwittingly
	use the same name in your program, and then all hell breaks loose. A
	particularly bad offender is curses.

In C++ this is less troublesome as you can stuff things within the walls of
a class, and their scope will then be local to it. Except for typedefs,
unfortunately, but at least C++ 2.0 allows encapsulation of enums (and class
names, but that is virtually unavoidable).

In C, where we don't have a proper modularization facility, the following
guidelines ought to be followed:

	1) All global entities declared by a module should start with a well
	advertised module prefix, including #defines, procedure, variables,
	enums, structs, typdefs,... This has already been partially done with
	existing libraries, e.g. for prefixes 'str', 'f' (stdio), 'w'
	(curses), but usually in a half baked way. As a solution it is not
	complete, in that you may have then clashes of prefixes, but at
	least the problem becomes an order of magnitude less severe. In C++
	this is done by putting as much as possible within class boundaries.

	2) File names should also start with the modules prefix, both
	headers and sources. Such names can be either of the form
	<prefix><suffix>.h (e.g. StreamIn.h, StreamOut.h, StreamRw.h) or
	<prefix>/<suffix>.h (e.g. Inet/Udp.h, Inet/Tcp.h, ...), depending
	usually on their number (or the length of the name under System V).

	3) Published headers should contain only the client interface of a
	module. Actually, for sophisticated modules, the client interface
	should be split in several headers, each containing only a subset,
	of entities likely to be used together. Eschew all inclusive header
	files (e.g. like "builtin.h" in libg++).

	4) The internal interfaces of a module should be in a separate set
	of headers that is not published.  For example, my tree library has
	two headers, "Tree.h" and "Tree/Own.h", and the latter contains the
	declarations of utility entities used by the other sources in the
	library, and is not published. Splitting the header is better than
	bracketing with #ifdef KERNEL #endif.

	5) Under Unix, published headers ought to be in /usr/include if they
	are for modules implemented at the user level, /usr/include/sys if
	they are for kernel level modules. Internal interfaces ought not to
	be in either; they ought to be in /usr/sys/h or the directory that
	holds the module sources, e.g. /usr/src/lib/libc. If there are
	multiple headers, according to rule 2),

	6) All file global entities internal to a module should be declared
	static. If they cannot, because the module is split in several
	source files, then respect of rule 1 is absolutely essential.

Naturally all these rules are palliatives; what we should really have, and
given C, C++, and Unix and other similar operating systems, we will not
have, is a tree of symbol tables. To have this the best way is to have an
object store, like in RSRE Flex or Cambridge CAP, or some Lisp machines or
systems, but this is wishful thinking... Second best would be something like
Multics, as usual.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

rfg@ics.uci.edu (Ron Guilmette) (12/23/89)

In article <1552@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>Actually things are even worse than Ron Guilmette says...

I know, but I didn't want to scare people.

>of second rate hackers put duplicate names in system headers, but they do
>the following things as well:
>
[stuff deleted]
>
>	3) even worse, a lot of libraries contain externals that are not
>	declared static. This is very dangerous, because you may unwittingly
>	use the same name in your program, and then all hell breaks loose. A
>	particularly bad offender is curses.

Since people generally seem to be so lazy about this particular aspect
of "good" coding, I was thinking of suggesting a -fdefault-static option
for GCC which would make the default linkage (or "storage-class", as you
prefer) in the absence of an explicit specification "static" rather than
"extern".  This could even be useful for old code because you could compile
a given system with it, and then try to link.  The linker would tell you
which items ought to be explicitly declared as extern, and you could then
go and fix *just* those declaration up to be explicitly extern and recompile
again with -fdefault-static, thereby minimizing extern visible symbols.

>In C, where we don't have a proper modularization facility, the following
>guidelines ought to be followed:
>
[stuff deleted]
>
>	2) File names should also start with the modules prefix...

Too late.  ANSI C mandates several include file names which do not
follow this rule.

>Naturally all these rules are palliatives; what we should really have...

What we should really do is to start all over, but I'd rather not. :-)

// rfg

pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (12/26/89)

In article <259323F0.15070@paris.ics.uci.edu> rfg@ics.uci.edu (Ron Guilmette) writes:

   Since people generally seem to be so lazy about this particular aspect
   of "good" coding, I was thinking of suggesting a -fdefault-static option
   for GCC which would make the default linkage (or "storage-class", as you
   prefer) in the absence of an explicit specification "static" rather than
   "extern".  This could even be useful for old code because you could compile
   a given system with it, and then try to link.  The linker would tell you
   which items ought to be explicitly declared as extern, and you could then
   go and fix *just* those declaration up to be explicitly extern and recompile
   again with -fdefault-static, thereby minimizing extern visible symbols.

I agree 100%. I agree so much that then I can propose the
equivalently safe, but almost 100% backward trick that makes the
ridiculous volatile keyword useless without virtually loss in
optimization ability and with abosulte safety:

have an option to make "register" the *default* storage class for
block local variables. The only variables that need to be
explicitly declared "auto" are then those whose address is taken,
and the compiler will without any problem flag them out for you.

If you have instead the equivalent trick of having variables
unvolatile by default, you need to manually tag as volatile those
that need be, and if you don't, you get nasty bugs.

I reckon that an option to disable volatile and make register the
default storage class for locals would provide virtually all the
benefits as to optimization (caching globals is virtually
irrelevant), without any risk, and would make it very easy to
develop or upgrade existing programs that relied on Classic C
semantics (e.g. the Unix kernel) in a multithreaded environment.

Consider GNU C under Mach...

--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

brooks@maddog.llnl.gov (Eugene Brooks) (12/26/89)

In article <PCG.89Dec25215018@rupert.cs.aber.ac.uk> pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) writes:
>I agree 100%. I agree so much that then I can propose the
>equivalently safe, but almost 100% backward trick that makes the
>ridiculous volatile keyword useless without virtually loss in
>optimization ability and with abosulte safety:
The volatile keyword is not ridiculous, and it is very useful.
Its orginal domain, to support device register access where memory
values would change in a spontaneous way, has expanded to shared memory
multiprocessing.  I am sure that the ANSI committee did not have multiprocessing
in mind when they hatched volatile, but they did us a big favor with it.
It is best to not statically delcare a variable as volatile, however,
it is best to declare a specific reference volatile with a cast when
you need to be sure that the compile does not screw you.

brooks@maddog.llnl.gov, brooks@maddog.uucp

swirsky@olivee.olivetti.com (Robert Swirsky) (12/27/89)

There's another place where the volatile keyword is useful--if there's
a variable whose value is set in an interrupt service routine.

Many Cs (e.g., Microsoft's C for Unix) have an interrupt keyword so that
a function saves all its registers and ends with an IRET. 

If the compiler does invarient code optimization, it can incorrectly
see code as invarient without the "volatile" keyword. For example:

do {   
	/* blah blah blah */
} while (flag==0);

If "flag" is set in an interrupt service routine, some compilers would
move the test for flag being zero out of the loop, because no statement
within the loop changes its value. The result is a loop that never exits.

I've seen this happen with MSC for DOS...



 


"All opinions are my own, and may not be those of my employer."