[comp.lang.c] C style guide

peter@ficc.uu.net (Peter da Silva) (01/07/89)

OK, does anyone have a copy of the Indian Hills Style Guide? We're in the
process of developing coding standards for 'C' and would like some input.
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.
Work: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.   `-_-'
Home: bigtex!texbell!sugar!peter, peter@sugar.uu.net.                 'U`
Opinions may not represent the policies of FICC or the Xenix Support group.

daves@hpopd.HP.COM (Dave Straker) (01/10/89)

I, also am on the lookout for _any_ info on C style, and would be terribly
grateful for any information/standards/etc. that anyone has.

Thanks in advance...

Dave Straker

Hewlett Packard,
Nine Mile Ride,
Wokingham,
Berks,
England

reggie@pdn.UUCP (George W. Leach) (01/11/89)

In article <2657@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>OK, does anyone have a copy of the Indian Hills Style Guide? We're in the
>process of developing coding standards for 'C' and would like some input.


      It was only available as a collection of Bell Labs Technical Memos (TM) 
in 1978.  Unless it has been published somewhere along the line, it is *only*
available to AT&T employees and under grandfather clauses to Bellcore and the
BOCs.



-- 
George W. Leach					Paradyne Corporation
..!uunet!pdn!reggie				Mail stop LG-129
Phone: (813) 530-2376				P.O. Box 2826
						Largo, FL  USA  34649-2826

gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/11/89)

In article <350001@hpopd.HP.COM> daves@hpopd.HP.COM (Dave Straker) writes:
>I, also am on the lookout for _any_ info on C style, and would be terribly
>grateful for any information/standards/etc. that anyone has.

The problem I have with C "style guides" (by the way, Plum Hall
publishes one) is that they tend to encourage the notion that
all one has to do is slavishly adhere to a set of rules and
good style will automatically result.  But really good code
requires careful thought, not just following rules.  For this
reason I would rather have C programmers read good tutorials
such as Kernighan & Plauger's "The Elements of Programming
Style" (2nd Ed.), even though the C language is not dealt with
directly in that book.  Koenig's "C Traps and Pitfalls" would
make a good follow-on for beginning to intermediate C coders.

Religious debates about the placement of braces may be fun,
but really, any such formatting style that is not too
outlandish can be dealt with a whole lot more easily than
code that is poorly conceived.  I don't care HOW you format
	char c;
	while ((c = getchar()) != EOF)
		putchar(c);
since this code is a bug waiting to happen from the outset.

desnoyer@Apple.COM (Peter Desnoyers) (01/12/89)

In article <350001@hpopd.HP.COM> daves@hpopd.HP.COM (Dave Straker) writes:
>I, also am on the lookout for _any_ info on C style, and would be terribly
>grateful for any information/standards/etc. that anyone has.

There is a very thorough set of C standards used by the Network
Utilities group at BBN that I saw when I worked there. It has its
problems, but most of those are just due to over-specification. It
might have been produced under government contract, and if so it might
be technically available to the public. (i.e. under the purview of the
FOI act.) Anyone from BBN care to comment?

				Peter Desnoyers

henry@utzoo.uucp (Henry Spencer) (01/16/89)

In article <5309@pdn.UUCP> reggie@pdn.UUCP (George W. Leach) writes:
>>OK, does anyone have a copy of the Indian Hills Style Guide? ...
>
>      It was only available as a collection of Bell Labs Technical Memos (TM) 
>in 1978.  Unless it has been published somewhere along the line, it is *only*
>available to AT&T employees and under grandfather clauses to Bellcore and the
>BOCs.

In fact, several years ago I typed it in (from an Nth-generation photocopy
that reached me by obscure routes), added some commentary of my own, and
posted it to the net.  I am not sure, in retrospect, that this was a proper
thing to do, but I *did* do it, and consequently the IH style guide is
widely distributed outside AT&T.  I was never hassled over it, although
I am sufficiently unsure of the propriety of my actions that I'm not
willing to post it again.

(My uncertainty is primarily over copyright issues.  I was never an
employee of the Bell System or any of its current fragments, so there
is no non-disclosure issue for me personally, although there might be
one for somebody further up the lengthy path by which it reached me.
Things were a bit more casual in those days.)
-- 
"God willing, we will return." |     Henry Spencer at U of Toronto Zoology
-Eugene Cernan, the Moon, 1972 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

pardo@cs.washington.edu (David Keppel) (12/09/89)

For your bedtime reading, if you care.
I'll be posting new versions every once and a while.
If you have flames, please flame me rather than the net.

	;-D on  ( Rogue format: turn style )  Pardo

		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

------------------- %snip ------------- %snip ------------------------
.\" ---------------- %snip --------------- % cut here -----------------
.\"
.\" Version of Indain Hill Style Manual (U of T amended) revision 5.3
.\"
.\" make with  ``... | tbl | {t,n}roff -ms ..''
.\"
.\" This document was really written with `troff' in mind.  You will
.\" need to do significant hacking to get nice output with `nroff'.
.\"
.\" You may have comments, suggestions, bug fixes, etc.  Send them to
.\" me and I will try to incorporate them (one way or another) in to
.\" a future version.   If you change this document, please add a note
.\" that it has been modified and change the minor version number
.\" (e.g., version 5.0 becomes 5.1 or 5.0.zork, or whatever) and the
.\" last date of modification (printed in the footer of each page).
.\"
.\" pardo@cs.washington.edu or
.\" {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
.\"
.\"
.\"--------------------
.\" Footnote numbering
.ds f \\u\s-2\\n+f\\s+2\d
.nr f 0 1
.ds F \\n+F.
.nr F 0 1
.\"--------------------
.\" Select a font and a format for blocks of code.
.\" If your system has fixed-width fonts, then that's
.\" probably what you want to use.  If your system doesn't
.\" support fixed-width fonts, then use the default.
.\" Really agressive hackers will want to use vgrind (grind).
.\"
.\" `Ex': start example.
.de Ex
.DS \\$1
.ft C
.\" .DS \\$1		\" Use w/ any fixed-width font!
.\" .ft C		\" Fixed-width font.
.\" .ft R		\" Default font if you don't have fixed-width.
.\" .vS			\" Use vgrind.  (BROKE?)
..
.\"
.\" `Ee': end example.
.de Ee
.DE
.\" .vE			\" End vgrind block.
.\" .DE			\" One of the fonts.
..
.\" Same idea, select a font for program text appearing `inline' in
.\" the text.  use the same selection choices as for code blocks.
.\" Prepend `\&' in case the trailing arg starts w/ a period.
.\"
.\" Usage:
.\"	.Ec foo         (`foo' in code font)
.\"	.Ec foo mp      (`foomp', `foo' in code font, `mp' not)
.\"	.Ec foo mp ka   (`kafoomp', `foo' in code font, rest not)
.\"
.de Ep
\&\\$3\fC\\$1\fP\\$2\"		fixed-width font
.\" \&\\$3\fB\\$1\fP\\$2\"	default (general) font
..
.\"	Same idea, select a font for `ideas' (concepts) that appear
.\"	in the text.
.de Ec
\&\\$3\fI\\$1\fP\\$2
..
.\"--------------------
.RP
.TL
Recommended C Style and Coding Standards
.AU
L.W. Cannon
R.A. Elliott
L.W. Kirchhoff
J.H. Miller
J.M. Milner
R.W. Mitze
E.P. Schan
N.O. Whittington
.AI
Bell Labs
.AU
Henry Spencer
.AI
Zoology Computer Systems
University of Toronto
.AU
David Keppel
.AI
EECS, UC Berkeley
CS, University of Washington
.AB
This document is an updated version of the
\fIIndian Hill C Style and Coding Standards\fR
paper,
with modifications by the last two authors.
It describes a recommended coding standard for
C
programs.
The scope is coding style, not functional organization.
.AE
.\"--------------------
.\" Headers/footers must be in double quotes because most versions
.\" of .OF, .OH, ... are BROKE.  (They work until you get 10 arguments
.\" and then silently truncate...
.\"
.OF "'Recommended C Coding Standards'Revision 5.3'18 November 1989'"
.EF "'Recommended C Coding Standards'Revision 5.3'18 November 1989'"
.nr PO 1.25i
.ta 0.5in 1.0in 1.5in 2.0in 3.0in
.\"--------------------
.NH
Introduction
.PP
This document
is a modified version of a document from
a committee formed at Indian Hill to establish
a common set of coding standards and recommendations for the
Indian Hill community.
The scope of this work is C coding style,
rather than the functional organization of programs
or general issues such as the use of \fIgoto\fRs.
We\*f
.FS
.IP \*F
The opinions in this document
do not reflect the opinions of all authors.
This is still an evolving document.
Please send comments and suggestions to
pardo@cs.washington.edu or
{rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
.FE
have tried to combine previous work [1,6,8] on C style into a uniform
set of standards that should be appropriate for any project using C,
although parts are biased towards particular systems.
Of necessity, these standards cannot cover all situations.
Experience and informed judgement count for much.
Programmers who encounter unusual situations should
consult
(1) experienced C programmers
or
(2) code written by experienced C programmers,
preferably following these rules.
.PP
The standards in this document are not of themselves required, but
individual institutions or groups may adopt part or all of them
as a part of program acceptance.
It is therefore likely that others at your institution will code in
a similar style.
Ultimately, the goal of these standards is to
increase portability, reduce maintenance, and above all
improve clarity.
.PP
Many of the style choices here are somewhat arbitrary.
Mixed coding style is harder to maintain than bad coding style.
When changing existing code it is better to conform to the
style (indentation, spacing, commenting, naming conventions)
of the existing code than it is to blindly follow this document.
.QP
``To be clear is professional; not to be clear
is unprofessional.'' \(em Sir Ernest Gowers.
.NH
File Organization
.PP
A file consists of various sections that should be separated by
several blank lines.
Although there is no maximum length limit for source files,
files with more than about 1000 lines are cumbersome to deal with.
The editor may not have enough temp space to edit the file,
compilations will go more slowly,
etc.
Many rows of asterisks, for example,
present little information compared to the time it takes to scroll past,
and are discouraged.
Lines longer than 80 columns are not handled well by all terminals
and should be avoided if possible.
Excessively long lines which result from deep indenting are often
a symptom of poorly-organized code.
.NH 2
File Naming Conventions
.PP
File names are made up of a base name,
and an optional period and suffix.
The first character of the name should be a letter
and all characters (except the period)
should be all lower-case letters and numbers.
The base name should be 8 or fewer characters and the
suffix should be 3 or fewer characters
(four, if you include the period).
These rules apply to both program files and
default files used and produced by the program
(e.g., ``rogue.sav'').
.PP
Some compilers and tools require
use certain suffix conventions for names of files [5].
The following suffixes are required:
.IP \(bu
C source file names must end in \fI.c\fR
.IP \(bu
Assembler source file names must end in \fI.s\fR
.LP
The following conventions are universally followed:
.IP \(bu
Relocatable object file names end in \fI.o\fR
.IP \(bu
Include header file names end in \fI.h\fR\*f.
.FS
.IP \*F
An alternate convention that may
be preferable in multi-language environments
is to suffix both the language type and \fI.h\fR
(e.g. ``foo.c.h'' or ``foo.ch'').
.FE
.IP \(bu
Yacc source file names end in \fI.y\fR
.IP \(bu
Lex source file names end in \fI.l\fR
.PP
C++ has compiler-dependent suffix conventions,
including \fI.c\fP, \fI..c\fP, \fI.cc\fP, \fI.c.c\fR, and \fI.C\fP.
Since much C code is also C++ code, there is no clear solution.
.PP
In addition,
it is conventional to use `Makefile' (not `makefile') for the
control file for \fImake\fR (for systems that support it)
and `README' for a summary of the contents
of the directory or directory tree.
.\"
.\" Having `README' in caps breaks the "monocase" rule, but is
.\" convention.  Same for `Makefile'.
.\"
.NH 2
Program Files
.PP
The suggested order of sections for a program file is as follows:
.IP 1.
First in the file is a prologue that tells what is in that file.
A description of the purpose of the objects in the files (whether
they be functions, external data declarations or definitions, or
something else) is more useful than a list of the object names.
The prologue may optionally contain author(s),
revision control information, references, etc.
.IP 2.
Any header file includes should be next.
If the include is for a non-obvious reason,
the reason should be commented.
In most cases, system include files like \fIstdio.h\fR should be
included before user include files.
.IP 3.
Any defines and typedefs that apply to the file as a whole are next.
One normal order is to have
``constant'' macros first,
then ``function'' macros, then typedefs and enums.
.IP 4.
Next come the global (external) data declarations,
usually in the order: externs, non-static globals, static globals.
If a set of defines applies to a particular piece of global data
(such as a flags word), the defines should be immediately after
the data declaration or embedded in structure declarations,
indented to put the \fIdefine\fRs one level
deeper than the first keyword of the declaration to which they apply.
.IP 5.
The functions come last,
and should be in some sort of meaningful order.
Like functions should appear together.
A ``breadth-first''
approach (functions on a similar level of abstraction together) is
preferred over depth-first (functions defined as soon as possible
before or after their calls).
Considerable judgement is called for here.
If defining large numbers of essentially-independent utility
functions, consider alphabetical order.
.NH 2
Header Files
.PP
Header files are files that are included in other files prior to
compilation by the C preprocessor.
Some are defined at the system level like \fIstdio.h\fR
which must be included by any program using the standard I/O library.
Header files are also used to contain data declarations and defines
that are needed by more than one program.
Header files should be functionally organized,
i.e., declarations for separate subsystems
should be in separate header files.
Also, if a set of declarations is likely to change when code is
ported from one machine to another, those declarations should be
in a separate header file.
.PP
Avoid private header filenames that are the same
as library header filenames.
The statement
.Ep #include
.Ep math.h '' ''
.\"
.\" BUG: that should be " instead of '', but there's no I know of
.\" to get double quotes to a troff macro argument.  They get snarfed
.\" up and turned in to non-printing chars, while \" gets turned in
.\" to a comment!
.\"
will include the standard library math header file
if the intended one is not
found in the current directory.
If this is what you \fIwant\fR to happen,
comment this fact.
Don't use absolute pathnames for header files.
Use the
.Ec <name>
construction for getting them from a standard
place, or define them relative to the current directory.
The ``include-path'' option of the C compiler
(\-I on many systems)
is the best way to handle
extensive private libraries of header files;  it permits reorganizing
the directory structure without having to alter source files.
.PP
Defining variables in a header file is often a poor idea.
Frequently it is a symptom of poor partitioning of code between files.
Some objects like typedefs and initialized data definitions cannot be
seen twice by the compiler in one compilation.
On some systems, repeating uninitialized declarations
without the \fIextern\fR keyword also causes problems.
Repeated declarations can happen if include files are nested
and will cause the compilation to fail.
.PP
Header files should not be nested.
.\"
.\" Many people disagree strongly with this.
.\" However, if you are to use \fIone\fR style, then this is best.
.\" The #ifndef/#define/.../#endif approach (below) often causes
.\" compilations to go much slower.
.\" A #endinput directive would be nice.
.\"
The prologue for a header file should, therefore, describe what
other headers need to be #included for the header to be functional.
In extreme cases, where a large number of header files are to be
included in several different source files,
it is acceptable to put all common #includes in one include file.
.PP
It is common to put the following into each
.Ec .h
file
to prevent accidental double-inclusion.
.Ex
#ifndef EXAMPLE_H
#define EXAMPLE_H
\&...	\fI/* body of example.h file */\fP
#endif /* EXAMPLE_H */
.Ee
.LP
This double-inclusion mechanism should not be relied upon,
particularly to perform nested includes.
.NH
Comments
.QP
.ad r
``\fIWhen the code and the comments disagree,
both are probably wrong.\fR'' \(em Norm Schreyer
.\" \fIBumper-Sticker Computer Science\fR,
.\" John Bently's \fIProgramming Pearls\fR column,
.\" Communications of the ACM (CACM),
.\" September 1985, Volume 28, Number 9.
.br
.ad b
.PP
The comments should describe \fIwhat\fR is happening,
\fIhow\fR it is being done,
what parameters mean,
which globals are used and which are modified,
and any restrictions or bugs.
Avoid, however, comments that are clear from the code.
Such information rapidly gets out of date.
Comments that disagree with the code are of negative value.
Short comments should be
\fIwhat\fR comments, such as ``compute mean value'',
rather than \fIhow\fR comments such as
``sum of values divided by n''.
C is not assembler;
putting a comment at the top of a 3\-10 line section telling what it
does overall is often more useful than a comment on each line
describing micrologic.
.PP
Comments should justify offensive code.
The justification should be that something bad will happen if
unoffensive code is used.
Just making code faster is not enough to rationalize a hack;
the performance must be \fIshown\fR to be unacceptable
without the hack.
The comment should explain the unacceptable behavior and describe why
the hack is a ``good'' fix.
.PP
Comments that describe data structures, algorithms, etc., should be
in block comment form with the opening
.Ep /*
in column one, a
.Ep *
in column 2 before each line of comment text,
and the closing
.Ep */
in columns 2-3.
An alternative is to have
.Ep **
in column 1-2, and put the closing
.Ep */
also in 1-2.
.Ex L
/*
 *	Here is a block comment.
 *	The comment text should be tabbed or spaced over uniformly.
 *	The opening slash-star and closing star-slash are alone on a line.
 */
.Ee
.Ex L
/*
** Alternate format for block comments
*/
.Ee
.PP
Note that \fIgrep ^.\e*\fR will catch all block comments in the
file\*f.
.FS
.IP \*F
Some automated program-analysis
packages use different characters before comment lines as
a marker for lines with specific items of information.
In particular, a line with a
.Ep \- ' `
in a comment preceding a function
is sometimes assumed to be a one-line summary of the function's
purpose.
.FE
Very long block comments such as drawn-out discussions and copyright
notices often start with
.Ep /*
in column one, no leading
.Ep *
before lines of text, and the closing
.Ep */
in columns 1-2.
Block comments inside a function are appropriate, and
they should be tabbed over to the same tab setting as the code that
they describe.
One-line comments alone on a line should be indented to the tab
setting of the code that follows.
.Ex
if (argc > 1) {
	/* Get input file from command line. */
	if (freopen(argv[1], "r", stdin) =\^= NULL) {
		perror (argv[1]);
	}
}
.Ee
.PP
Very short comments may appear on the same line as the code they
describe,
and should be tabbed over to separate them from the statements.
If more than one short comment appears in a block of code
they should all be tabbed to the same tab setting.
.Ex
if (a =\^= 2) {
	return(TRUE);			/* special case */
} else {
	return(isprime(a));		/* works only for odd a */
}
.Ee
.NH
Declarations
.PP
Global declarations should begin in column 1.
All external data declaration should be preceded by the
.Ec extern
keyword.
If an external variable is an array that is defined with an explicit
size, then the array bounds must be repeated in the extern
declaration unless the size is always encoded in the array
(e.g., a read-only character array that is always null-terminated).
Repeated size declarations are
particularly beneficial to someone picking up code written by another.
.PP
The ``pointer'' qualifier,
.Ep * ', `
should be with the variable name rather
than with the type.
.Ex
char		*s, *t, *u;
.Ee
instead of
.Ex
char*	s, t, u;
.Ee
.PP
Unrelated declarations, even of the same type,
should be on separate lines.
A comment describing the role of the object being declared should be
included, with the exception
that a list of
.Ec #define d
constants do not need comments
if the constant names are sufficient documentation.
The names, values, and comments
should be tabbed so that they line up underneath each other.
Use the tab character rather than blanks.
For structure and union template declarations,
each element should be alone on a line
with a comment describing it.
The opening brace
.Ep { \^\^) (\^
should be on the same line as the structure
tag, and the closing brace
.Ep } \^) (\^\^
should be in column 1.
.Ex
struct boat {
	int		wllength;	/* water line length in meters */
	int		type;		/* see below */
	long		sailarea;	/* sail area in square mm */
};

/*
 * defines for boat.type
 */
#define	KETCH	(1)
#define	YAWL		(2)
#define	SLOOP	(3)
#define	SQRIG	(4)
#define	MOTOR	(5)
.Ee
.LP
These defines are sometimes put right after the declaration of
.Ep type ,
within the
.Ep struct
declaration, with enough tabs after the
.Ep # \^' `
to indent
.Ep define
one level more than the structure member declarations.
When the actual values are unimportant,
the
.Ec enum
facility is better\*f.
.FS
.IP \*F
.Ec enum s
might be better anyway.
.FE
.Ex
enum bt_t { KETCH, YAWL, SLOOP, SQRIG, MOTOR };
struct boat {
	int		wllength;	/* water line length in meters */
	enum bt_t	type;	/* what kind of boat */
	long		sailarea;	/* sail area in square mm */
};
.Ee
.PP
Any variable whose initial value is important should be
\fIexplicitly\fR initialized, or at the very least should be commented
to indicate that C's default initialization to zero
is being relied upon.
The empty initializer,
.Ep {\^} '', ``
should never be used.
Structure
initializations should be fully parenthesized with braces.
Constants used to initialize longs should be explicitly long.
.Ex
int		x = 1;
char		*msg = "message";
struct boat	winner[] = {
	{ 40, YAWL, 6000000L },
	{ 28, MOTOR, 0L },
	{ 0 },
};
.Ee
.PP
In any file which is part of a larger whole rather than a self-contained
program, maximum use should be made of the
.Ec static
keyword to make functions and variables local to single files.
Variables in particular should be accessible from other files
only when there is a clear
need that cannot be filled in another way.
Such usages should be commented to make it clear that another file's
variables are being used; the comment should name the other file.
If your debugger hides static objects you need to see during
debugging,
declare them as
.Ep STATIC
and #define
.Ep STATIC
as needed.
.PP
The most important few types should be highlighted by typedeffing
them, even if they are only integers,
as the unique name makes the program easier to read (as long as there
are only a \fIfew\fR things typedeffed to integers!).
Structures may be typedeffed when they are declared.
Give the struct and the typedef the same name.
.Ex
typedef struct splodge_t {
	int sp_count;
	char *sp_name, *sp_alias;
} splodge_t;
.Ee
.PP
The return type of functions should always be declared.
If function prototypes are available, use them.
One common mistake is to omit the
declaration of external math functions that return
.Ec double .
The compiler then assumes that
the return value is an integer and the bits are dutifully
converted into a (meaningless) floating point value.
.NH
Function Declarations
.PP
Each function should be preceded by a block comment prologue
that gives a short description of what the function does
and (if not clear) how to use it.
Discussion of non-trivial design decisions and
side\-effects is also appropriate.
Avoid duplicating information clear from the code.
.PP
The function return value should be alone on a line,
indented one stop\*f.
.FS
.IP \*F
``Tabstops'' can be blanks (spaces) inserted by your editor in clumps
of 2, 4, or 8.
Use actual tabs where possible.
.FE
Do not default to
.Ec int ;
if the function does not return a value then it should be given
return type \fIvoid\fR\*f.
.FS
.IP \*F
.Ep "#define"
.Ep void
or
.Ep "#define"
.Ep void
.Ep int
for compilers without the
.Ec void
keyword.
.FE
If the value returned requires a long explanation,
it should be given in the prologue;
otherwise it can be on the same line as the return type, tabbed over.
The function name
(and the formal parameter list)
should be alone on a line, in column 1.
Destination (return value) parameters
should generally be first (on the left).
All formal parameter declarations,
local declarations and code within the function body
should be tabbed over one stop.
The opening brace of the function body should be alone on a line
beginning in column 1.
.PP
Each parameter should be declared (do not default to
.Ec int ).
In general each variable declaration should be on a separate line with
a comment describing the role played by the variable in the function.
Loop counters called ``i'', and string pointers called ``s''
are typically excluded.
If a group of functions all have a like parameter or local variable,
it helps to call the repeated variable by the same name in all
functions.
Like parameters should also appear in the same place in the various
argument lists.
.PP
Comments for parameters and local variables should be
tabbed so that they line up underneath each other.
Local variable declarations should be separated
from the function's statements by a blank line.
.\"
.\"	int i;				/* loop index */
.\"	struct some_really_long_type_name *tree;
.\"					/* root of the parse tree */
.\"	char *s;			/* variable name */
.\"
.\" If a variable has an extremely long definition, the comment
.\" should come \fIafter\fR the declaration.  Multiline comments
.\" for variables should be moved to the header and referenced
.\" from the comment.
.\"
.\"	* Note 1: The variable `zork' has two lives.  It first ...
.\"	*/
.\"
.\"		void *zork;		/* See header note #1. */
.\"
.PP
Be careful when you use or declare functions
that take a variable number of arguments (``varargs'').
There is no truly portable way to do varargs in C.
Better to design an interface that uses a fixed number of arguments.
If you must have varargs,
use the library macros for declaring functions with
variant argument lists.
.PP
If the function uses any external variables (or functions)
that are not declared globally in the file,
these should have their
own declarations in the function body using the
.Ec extern
keyword.
.PP
Avoid local declarations that override declarations at higher levels.
In particular, local variables
should not be redeclared in nested blocks.
Although this is valid C, the potential confusion is
enough that
\fIlint\fR will complain about it when given the \-h option.
.NH
Whitespace
.QP
.ad r
\fIint i;main(){for(;i["]<i;++i){--i;}"];read('-'-'-',i+++"hell\\
.br
o, world!\\n",'/'/'/'));}read(j,i,p){write(j/p+p,i---j,i/i);}\fR
.br
\(em Dishonorable mention, Obfuscated C Code Contest, 1984.
.br
Author requested anonymity.
.br
.ad b
.PP
Use whitespace generously, vertically and horizontally.
Indentation and spacing should reflect the block structure of the code;
e.g.,
there should be at least 2 blank lines between the end of one function
and the comments for the next.
.PP
A long string of conditional operators should be split
onto separate lines.
.Ex
if (foo->next=\^=NULL && totalcount<needed && needed<=MAX_ALLOT
	&& server_active(current_input)) { ...
.Ee
Might be better as
.Ex
if (foo->next =\^= NULL
	&& totalcount < needed && neeeded <= MAX_ALLOT
	&& server_active(current_input))
{
	...
.Ee
Similarly, elaborate
.Ec for
loops should be split onto different lines.
.Ex
for (curr = *listp, trail = listp;
	curr != NULL;
	trail = &(curr->next), curr = curr->next )
{
\&	...
.Ee
Other complex expressions, particularly those using the ternary
.Ec ?\^: ) (
operator,
are best split on to several lines, too.
.Ex
c = (a == b)
	? d + f(a)
	: f(b) - d;
.Ee
.\" .PP
.\" Finally, the closing brace of long functions and very long blocks
.\" should include an ``end function'' or ``end block'' comment:
.\" .DS
.\" \&	}	/* end  for (each list element) */
.\" \&	...
.\" }	/* end function_name() */
.Ee
.NH
Examples
.PP
.Ex
/*
 *	Determine if the sky is blue by checking that it isn't night.
 *	CAVEAT: Only sometimes right.  May return TRUE when the answer
 *	is FALSE.
 *	NOTE: Uses `hour' from `hightime.c'.  Returns `int' for
 *	compatibility with the old version.
 */
	int				/* TRUE or FALSE */
skyblue()
{
	extern int	hour;		/* current hour of the day */

	if (hour < MORNING || hour > EVENING) {
		return (FALSE);		/* black */
	} else {
		return (TRUE);		/* blue */
	}
}
.Ee
.Ex
/*
 *	Find the last element in the linked list
 *	pointed to by nodep and return a pointer to it.
 *	Return NULL if there is no last element.
 */
	node_t *
tail(nodep)
	node_t		*nodep;		/* pointer to head of list */
{
	register node_t	*np;		/* advances to NULL */
	register node_t	*lp;		/* follows one behind np */

	if (nodep =\^= NULL)
		return (NULL);
	np = lp = nodep;
	while ((np = np->next) != NULL) {
		lp = np;
	}
	return (lp);
}
.Ee
.NH
Simple Statements
.PP
There should be only one statement per line unless the statements are
very closely related.
.Ex
case FOO:	  oogle (zork);  boogle (zork);  break;
case BAR:	  oogle (bork);  boogle (zork);  break;
case BAZ:	  oogle (gork);  boogle (bork);  break;
.Ee
Always document a null body for a
.Ec for
or
.Ec while
statement so that it is clear that the null body is intentional
and not missing code.
.Ex
while (*dest++ = *src++)
	;	/* VOID */
.Ee
.PP
Do not default the test for non-zero, i.e.
.Ex
if (f(\^) != FAIL)
.Ee
is better than
.Ex
if (f(\^))
.Ee
even though
.Ep FAIL
may have the value 0 which C considers to be false.
An explicit test will help you out later when somebody decides that a
failure return should be \-1 instead of 0.
Explicit comparison should be used even if the comparison value will
never change; e.g., 
.Ep "if (!(bufsize % sizeof(int)))" '' ``
should be written instead as
.Ep "if ((bufsize % sizeof(int)) =\^= 0)" '' ``
to reflect the \fInumeric\fR (not \fIboolean\fR) nature of the test.
A frequent trouble spot is using
.Ep strcmp
to test for string equality, where the result should \fInever\fR
\fIever\fR be defaulted.
The preferred approach is to define a macro \fISTREQ\fR.
.Ex
#define STREQ(a, b) (strcmp((a), (b)) =\^= 0)
.Ee
.PP
The non-zero test \fIis\fR often defaulted for predicates
and other functions or expressions which meet the following
restrictions:
.IP \(bu
Returns 0 for false, nothing else.
.IP \(bu
Is named so that the meaning of (say) a `true' return
is absolutely obvious.
Call a predicate \fIisvalid\fR or \fIvalid\fR, not \fIcheckvalid\fR.
.PP
It is common practice to declare a boolean type
.Ep bool '' ``
in a global include file.
The special names improve readability immensely.
.Ex
typedef int	bool;
#define FALSE	0
#define TRUE	1
.Ee
or
.Ex
typedef enum { NO=0, YES } bool;
.Ee
.LP
Even with these declarations,
do not check a boolean value for equality with 1 (TRUE, YES, etc.);
instead test for inequality with 0 (FALSE, NO, etc.).
Most functions are guaranteed to return 0 if false,
but only non-zero if true.
Thus,
.Ex
if (func() =\^= TRUE) { ...
.Ee
must be written
.Ex
if (func() != FALSE) { ...
.Ee
.PP
There is a time and a place for embedded assignment statements.
In some constructs there is no better way to accomplish the results
without making the code bulkier and less readable.
.Ex
while ((c = getchar()) != EOF) {
	process the character
}
.Ee
The
.Ep ++
and
.Ep \-\^\^\-
operators count as assignment statements.
So, for many purposes, do functions with side effects.
Using embedded assignment statements to improve run-time performance
is also possible.
However, one should consider the tradeoff between increased speed and
decreased maintainability that results when embedded assignments are
used in artificial places.
For example,
.Ex
a = b + c;
d = a + r;
.Ee
should not be replaced by
.Ex
d = (a = b + c) + r;
.Ee
even though the latter may save one cycle.
In the long run the time difference between the two will
decrease as the optimizer gains maturity, while the difference in
ease of maintenance will increase as the human memory of what's
going on in the latter piece of code begins to fade.
.PP
Goto statements should be used sparingly, as in any well-structured
code.
The main place where they can be usefully employed is to break out
of several levels of
.Ec switch ,
.Ec for ,
and
.Ec while
nesting,
although the need to do such a thing may indicate
that the inner constructs should be broken out into
a separate function, with a success/failure return code.
.Ex
	for (...) {
		while (...) {
			...
			if (disaster)
				goto error;
	    
		}
	}
	\&...
error:
	clean up the mess
.Ee
When a
.Ec goto
is necessary the accompanying label should be alone
on a line and tabbed one stop to the left of the
code that follows.
The goto should be commented (possibly in the block header)
as to its utility and purpose.
.Ec Continue
should be used sparingly and near the top of the loop.
.Ec Break
is less troublesome.
.NH
Compound Statements
.PP
A compound statement is a list of statements enclosed by braces.
There are many common ways of formatting the braces.
Be consistent with your local standard, if you have one,
or pick one and use it consistently.
When editing someone else's code, \fIalways\fR use the style
used in that code.
.Ex
control {
\ \ \ \ \ \ \ \ statement;
\ \ \ \ \ \ \ \ statement;
}
.Ee
.Ee
.LP
The style above is called ``K\^&\^R style'', and is
preferred if you haven't already got a favorite.
With K&R style, the
.Ep else
part of an
\fIif-else\fR statement
and the
.Ep while
part of a \fIdo-while\fR statement
should appear on the same line as the close brace.
With most other styles, the braces are always alone on a line.
.PP
When a block of code has several labels
(unless there are a lot of them),
the labels are placed on separate lines.
The fall-through feature of the C \fIswitch\fR statement,
(that is, when there is no
.Ep break
between a code segment and the next
.Ep case
statement)
must be commented for future maintenance.
A lint-style comment/directive is best.
.Ex
switch (expr) {
	case ABC:
	case DEF:
		statement;
		break;
	case UVW:
		statement;
		/*FALLTHROUGH*/
	case XYZ:
		statement;
		break;
}
.Ee
.PP
Here, the last
.Ep break
is unnecessary, but is required
because it prevents a fall-through error if another
.Ep case
is added later after the last one.
The
.Ep default
case, if used, should be last and does not require a
.Ep break .
.PP
Whenever an
.Ec if-else
statement has more than one statement in the
.Ec if
or
.Ec else
section, the statements of both the
.Ec if
and
.Ec else
sections should both be enclosed in braces
(called \fIfully bracketed syntax\fR).
.FE
.PP
.Ex
if (expr) {
	statement;
} else {
	statement;
	statement;
}
.Ee
.PP
An \fIif-else\fR with many \fIelse if\fR statements should
be written with the \fIelse\fR conditions left-justified.
.Ex
if (STREQ (reply, "yes")) {
	statements for yes
	...
} else if (STREQ (reply, "no")) {
	...
} else if (STREQ (reply, "maybe")) {
	...
} else {
	statements for default
	...
}
.Ee
The format then looks
like a generalized \fIswitch\fR statement and the
tabbing reflects the switch between exactly one of several
alternatives rather than a nesting of statements.
.PP
The following code is very dangerous:
.Ex
#ifdef CIRCUIT
#	define CLOSE_CIRCUIT(circno)	{ close_circ(circno); }
#else
#	define CLOSE_CIRCUIT(circno)
#endif

\&	...
	if (expr)
		statement;
	else
		CLOSE_CIRCUIT(x)
	++i;
.Ee
Note that on systems where CIRCUIT is not defined
the statement
.Ep ++i; '' ``
will only
get executed when
.Ep expr
is false!
This example points out both the value
of naming macros with CAPS and
of making code fully-bracketed.
.NH
Operators
.PP
Generally, all binary operators
except
.Ep . ' `
and
.Ep \-> ' `
should be separated from their operands by blanks.
Some judgement is called for in the case of complex expressions,
which may be clearer if the ``inner'' operators are not surrounded
by spaces and the ``outer'' ones are.
In addition, keywords that are followed by expressions in parentheses
should be separated from the left parenthesis by a blank.
.Ec Sizeof "" (
is an exception.)
Blanks should also appear after commas in argument lists to help
separate the arguments visually.
On the other hand, macro definitions with arguments must
not have a blank between the name and the left parenthesis.
The C preprocessor requires the left parenthesis
to be immediately after the macro name or else the argument list
will not be recognized.
Unary operators should not be separated from their single operand.
.PP
If you think an expression will be hard to read,
consider breaking it across lines.
Splitting at the lowest-precedence operator near the break is best.
Since C has some unexpected precedence rules,
expressions involving mixed operators should be parenthesized.
Too many parenthesis, however,
can make a line \fIharder\fR to read
because humans aren't good at parenthesis-matching.
.PP
There is a time and place for the binary comma operator,
but generally it should be avoided.
The comma operator is most useful
to provide multiple initializations or operations,
as in \fIfor\fR statements.
Complex expressions,
for instance those with nested
.Ec ?\^:
(ternary) operators,
can be confusing and should be avoided if possible.
There are some macros like
.Ep getchar
where both the ternary
operator and comma operators are useful.
The logical expression operand before the
.Ec ?\^:
should be parenthesized and both return values must be the same type.
.NH
Naming Conventions
.PP
Individual projects will no doubt have their own naming conventions.
There are some general rules however.
.IP \(bu
Names with leading and trailing underscores are reserved for system
purposes and should not be used for any user-created names.
Most systems use them for names
that the user should not have to know.
If you must have your own private identifiers,
begin them with a letter or two identifying the
package to which they belong.
.IP \(bu
#define constants should be in all CAPS.
.IP \(bu
Enum tags are Capitalized or in all CAPS
.IP \(bu
Function, structure tag, typedef, and variable names should be in
lower case.
.IP \(bu
Many macro ``functions'' are in all CAPS.
Some macros (such as
.Ep getchar
and
.Ep putchar )
are in lower case
since they may also exist as functions.
Lower-case macro names are only acceptable if the macros behave
like a function call,
that is, they evaluate their parameters \fIexactly\fR once and
do not assign values to named parameters.
Sometimes it is impossible to write a macro that behaves like a
function even though the arguments are evaluated exactly once.
.IP \(bu
Avoid names that differ only in case, like \fIfoo\fR and \fIFoo\fR.
Similarly, avoid \fIfoobar\fR and \fIfoo_bar\fR.
The potential for confusion is considerable.
.PP
In general, global names (including
.Ec enum s)
should have a
common prefix identifying the module that they belong with.
They may alternatively be grouped in a global structure.
Typedeffed names often have
.Ep _t '' ``
appended to their name.
.PP
Avoid names that might conflict with various standard
library names.
Some systems will include more library code than you want.
Also, your program may be extended someday.
.NH
Constants
.PP
Numerical constants should not be coded directly.
Symbolic constants make the code
easier to change and easier to read.
At the very least, any directly-coded numerical constant must have a
comment explaining the derivation of the value.
.PP
The
.Ec #define
feature of the C preprocessor should be used to
give constants meaningful names.
Defining the value in one place
also makes it easier to administer large programs since the
constant value can be changed uniformly by changing only the
#define.
The enumeration data type is a better way to declare variables
that take on only a discrete set of values, since
additional type checking is often available.
.PP
Constants should be defined consistently with their use;
e.g. use
.Ep 540.0
for a float instead of
.Ep 540
with an implicit float cast.
There are some cases where the constants 0 and 1 may appear as
themselves instead of as defines.
For example if a
.Ec for
loop indexes through an array, then
.Ex
for (i = 0; i < ARYBOUND; i++)
.Ee
is reasonable while the code
.Ex
qval = opens(door[i], 7);
if (qval =\^= 0)
	error("can't open %s\\\\n", door[i]);
.Ee
is not.
In the last example
.Ep qval
is a pointer.
When a value is a pointer it should be compared to
.Ep NULL
instead of 0.
.Ec NULL
is available
either as part of the standard I/O library's header file \fIstdio.h\fR
or in \fIstdlib.h\fR for newer systems.
Even simple values like 1 or 0 are often better expressed using
defines like
.Ec TRUE
and
.Ec FALSE
(sometimes
.Ec YES
and
.Ec NO
read better).
.PP
Simple character constants should be defined as character literals
rather than numbers.
Non-text characters are discouraged as non-portable.
If non-text characters are necessary,
particularly if they are used in strings,
they should be written using a escape character of three octal digits
rather than one
(e.g.
.Ep '\\\\\&007' ).
Such usage should be considered machine-dependent and treated as such.
.NH
Macros
.PP
Complex expressions can be used as macro parameters,
and operator-precedence problems can arise unless all occurrences of
parameters have parentheses around them.
There is little that can be done about the problems caused by side
effects in parameters
except to avoid side effects in expressions (a good idea anyway)
and, when possible,
to write macros that evaluate their parameters exactly once.
There are times when it is impossible to write macros that act exactly
like functions.
.\" .PP
.\" Here are some classic macros.
.\" .DS
.\" #define INV(val)	1/val
.\" \&...
.\" 	y = INV(*x);		/* turns in to ``start comment''! */
.\" .DE
.\" .DS
.\" #define MAX(a,b)	(((a)>(b)) ? (a) : (b) )
.\" \&...
.\" 	k = MAX(i++,j++);
.\" .DE
.PP
Some macros also exist as functions (e.g.,
.Ep getc
and
.Ep fgetc ).
The macro should be used in implementing the function
so that changes to the macro
will be automatically reflected in the function.
Care is needed when interchanging macros and functions since functions
pass their parameters by value whereas macros pass their arguments by
name substitution.
Carefree use of macros requires care when they are defined.
.PP
Macros should avoid using globals, since the global name may be
covered by a local declaration.
Macros that change named parameters (rather than the storage they
point at) or may be used as the left-hand side of an assignment
should mention this in their comments.
Macros that take no parameters but reference variables,
are long,
or are aliases for function calls
should be given an empty parameter list, e.g.,
.Ex
#define	OFF_A(\^\^)	(a_global+OFFSET)
#define	BORK(\^\^)	(zork(\^))
#define	SP3(\^\^)	if (b) { av+=1; bv+=1; cv+=1; }
.Ee
.PP
Macros save function call/return overhead,
but when a macro gets long, the effect of the call/return
becomes negligible, so a function should be used instead.
.PP
In some cases it is appropriate to make the compiler
insure that a macro is terminated with a semicolon.
.Ex
if (x==3)
    SP3(\^\^);
else
    BORK(\^\^);
.Ee
If the semicolon is omitted after the call to
.Ep SP3 ,
then the
.Ep else
will (silently!) become associated with the
.Ep if
in the
.Ep SP3
macro.
With the semicolon, the
.Ep else
doesn't match \fBany\fR
.Ep if !
The macro
.Ep SP3
can be written safely as
.Ex
#define SP3(\^\^)	do { av+=1; bv+=1; cv+=1; } while (0)
.Ee
Writing out the enclosing
.Ec do-while
by hand is awkward and some compilers and tools
may complain that there is a constant in the
.Ep while '' ``
conditional.
A macro for declaring statements may make programming easier.
.Ex
#ifdef lint
	static int ZERO;
#else
#	define ZERO 0
#endif
#define STMT( stuff )		do { stuff } while (ZERO)
.Ee
Declare
.Ep SP3
with
.Ex
#define SP3(\^\^)	STMT( if (bool) { av+=1; bv+=1; cv+=1; } )
.Ee
Using
.Ep STMT
will help prevent small typos from silently changing programs.
.PP
Except for hacks such as the above,
macros should contain keywords only if the entire macro
is surrounded by braces.
.NH
Debugging
.PP
If you use
.Ec enum s,
the first tag should have a non-zero value,
or the first tag should indicate an error.
.Ex
enum { STATE_ERR, STATE_START, STATE_NORMAL, STATE_END } state_t;
enum { VAL_NEW=1, VAL_NORMAL, VAL_DYING, VAL_DEAD } value_t;
.Ee
Uninitalized values will then often ``catch themselves''.
.PP
Check for error return values, even from functions that ``can't''
fail.
Consider that
.Ep close(\^)
and
.Ep fclose(\^)
can and do fail, even when all prior file operations have succeeded.
Write your own functions so that they test for errors
and return error values or abort the program in a well-defined way.
Include a lot of debugging and error-checking code
and leave most of it in the finished product.
Check even for ``impossible'' errors. [8]
.PP
Use the
.Ec assert
facility to insist that
each function is being passed well-defined values,
and that intermediate results are well-formed.
.PP
Build in the debug code using as few #ifdefs as possible.
For instance, if
.Ep mm_malloc '' ``
is a debugging memory allocator, then
.Ep MALLOC
will select the appropriate allocator,
avoids littering the code with #ifdefs,
and makes clear the difference between allocation calls being debugged
and extra memory that is allocated only during debugging.
.Ex
#ifdef DEBUG
#	define MALLOC(size)  (mm_malloc(size))
#else
#	define MALLOC(size)  (malloc(size))
#endif
.Ee
.PP
Check bounds even on things that ``can't'' overflow.
A function that writes on to variable-sized storage
should take an argument
.Ep maxsize
that is the size of the destination.
If there are times when the size of the destination is unknown,
some `magic' value of
.Ep maxsize
should mean ``no bounds checks''.
When bound checks fail,
make sure that the function does something useful
such as abort or return an error status.
.Ex
/*
 * INPUT: A null-terminated source string `src' to copy from and
 * a `dest' string to copy to.  `maxsize' is the size of `dest'
 * or UINT_MAX if the size is not known.  `src' and `dest' must
 * both be shorter than UINT_MAX, and `src' must be no longer than
 * `dest'.
 * OUTPUT: The address of `dest' or NULL if the copy fails.
 * `dest' is modified even when the copy fails.
 */
	char *
copy (dest, maxsize, src)
	char *dest, *src;
	unsigned maxsize;
.\"
.\" That should be `size_t', rather than `unsigned'?
.\"
{
	char *retval = dest;

	while (*dest++ = *++src && maxsize-- > 0)
		;		/* VOID */

	if (maxsize == 0)
		retval = NULL;

	return (retval);
}
.Ee
.PP
In all, remember that
a program that produces wrong answers twice as fast is infinitely
slower.
The same is true of programs that crash occasionally
or clobber valid data.
.QP
.ce
``\fIC Code.  C code run.  Run, code, run...  PLEASE!!!\fR'' \(em Barbara Toungue
.NH
Conditional Compilation.
.PP
Conditional compilation is useful for things like
machine-dependencies,
debugging,
and for setting certain options at compile-time.
Beware of conditional compilation.
Various controls can easily combine in unforseen ways.
If you #ifdef machine dependencies,
make sure that when no machine is specified,
the result is an error, not a default.
If you #ifdef optimizations,
the default should be the unoptimized code
rather than an uncompilable program.
Be sure to test the unoptimized code.
.PP
Put #ifdefs in header files instead of source files when possible.
Use the #ifdefs to define macros
that can be used uniformly in the code.
For instance, a header file for checking memory allocation
might look like (omitting definitions for
.Ep REALLOC
and
.Ep FREE ):
.Ex
#ifdef DEBUG
	extern char *mm_malloc();
#	define MALLOC(size) (mm_malloc(size))
#else
	extern char *malloc();
#	define MALLOC(size) (malloc(size))
#endif
.Ee
.PP
Conditional compilation should generally be
on a feature\-by\-feature basis.
Machine or operating system dependencies
should be avoided in most cases.
.Ex
#ifdef BSD4
	long t = time(((long *)NULL);
#endif
.Ee
The preceding code is poor for two reasons:
there may be 4BSD systems for which there is a better choice,
and there may be non-4BSD systems for which the above \fIis\fR the
best code.
Instead, use \fIdefine\fR symbols
such as
.Ep TIME_LONG
and
.Ep TIME_STRUCT
and define the appropriate one
in a configuration file such as \fIconfig.h\fR.
.NH
Portability
.QP
.ad r
``\fIC combines the power of assembler with
the portability of assembler.\fR''
.br
\(em Bill Thacker, misquoted by anonymous.
.br
.ad b
.PP
The advantages of portable code are well known.
This section gives some guidelines for writing portable code.
Here, ``portable'' means that a source file
can be compiled and executed on different machines
with the only change being the inclusion of possibly
different header files and the use of different compiler flags.
The header files will contain #defines and typedefs that may vary from
machine to machine.
In general, a new ``machine'' is different hardware,
a different operating system, a different compiler,
or any combination of these.
Reference [1] contains useful information on both style and portability.
.\" Does it really?
The following is a list of pitfalls to be avoided and recommendations
to be considered when designing portable code:
.IP \(bu
Write portable code first,
worry about detail optimizations only on machines where they
prove necessary.
Optimized code is often obscure.
Optimizations for one machine may produce worse code on another.
Document performance hacks and localize them as much as possible.
Documentation should explain \fIhow\fR it works and \fIwhy\fR
it was needed (e.g., ``loop executes 6 zillion times'').
.IP \(bu
Recognize that some things are inherently non-portable.
Examples are code to deal with particular hardware registers such as
the program status word,
and code that is designed to support a particular piece of hardware,
such as an assembler or I/O driver.
Even in these cases there are many routines and data organizations
that can be made machine independent.
.IP \(bu
Organize source files so that the machine-independent
code and the machine-dependent code are in separate files.
Then if the program is to be moved to a new machine,
it is a much easier task to determine what needs to be changed.
Comment the machine dependence in the headers of the appropriate
files.
.IP \(bu
Any behavior that is described as ``implementation defined''
should be treated as a machine (compiler) dependency.
Assume that the compiler or hardware does it some completely screwy
way.
.IP \(bu
Pay attention to word sizes.
Objects may be non-intuitive sizes,
Pointers are not always the same size as \fIint\fRs,
the same size as each other,
or freely interconvertible.
The following table shows bit sizes for basic types in C for various
machines and compilers.
.br
.ne 2i
.TS
center;
l c c c c c c c
l c c c c c c c
l r r r r r r r.
type	pdp11	VAX/11	68000	Cray-2	Unisys	Harris	80386
	series		family		1100	H800	
_
char	8	8	8	8	9	8	8
short	16	16	8/16	64(32)	18	24	8/16
int	16	32	16/32	64(32)	36	24	16/32
long	32	32	32	64	36	48	32
char*	16	32	32	64	72	24	16/32/48
int*	16	32	32	64(24)	72	24	16/32/48
int(*)(\^\^)	16	32	32	64	576	24	16/32/48
.TE
.\" 
.\" blarson%skat.usc.edu@oberon.usc.edu (Bob Larson) sez for a pr1me
.\" int=16 is a compile-time option.  pointer size depends on which
.\" instruction set you generate code for, only 32 bits are significant
.\" on non-char* pointers (extra 16 bits allocated but not used.)
.\" 
.\" beaver.cs.washington.edu!cornell!calvin!johns (John Sahr) sez
.\" the Harris H800/H100 series has 3-byte words.  Float and double
.\" are the same bit-size but are different precision; two bytes are
.\" thrown away for floats.  Int* and char* are same size but 2 bits
.\" are reserved for the byte pointer within a word.  H1000/H12000
.\" have software triple and quad precision for FORTRAN, 9 & 12 bytes.
.\"
.\" Theodore Stevens Norvell <norvell@csri.toronto.edu> on the Control
.\" Data Cyber-180 (aka Cyber 900).  Pointers hold only 48 bits of
.\" useful data (44 for virtual byte address, 4 for security) but are
.\" padded to make them more interchangeable with ints.
.\" 
.\" DEEBE@SCIENCE.UTAH.EDU (Nelson H.F. Beebe) on 36-bit DEC-20:
.\" 4 compilers, including PCC-20 (Johnson's PCC ported to TOPS-20 by
.\" Lereau@cs.utah.edu), KCC-20 (Kok Chen at Stanford, Ken Harrenstien
.\" and Ian Macky at SRI), New Mexico Tech C, Sargasso C compiler
.\" (from BBN, he thinks).  Most still using DEC-20's use KCC.
.\" [*] Note that KCC-20 has 4 pointer formats based on local/global
.\" and char*/non-char* usage.  The following fails:
.\"     int *p = malloc( sizeof(int) );
.\"	free( p );
.\" It works correctly with casts to int* from malloc and to char* for
.\" free.
.\" 
.\" type	pr1me	H800	Cyber	PCC-20	KCC-20
.\"				900
.\"
.\" char	8	8	8	36	9
.\" short	16	24	32	36	18
.\" int		16/32	24	64	36	36
.\" long	32	48	64	36	36
.\" char*	32(48)	24	64	36	36[*]
.\" int*	32(48)	?	64	36	36[*]
.\" int(*)()	32(48)	24	64	36	36[*]
.\" float	?	48	64	36	36
.\" double	?	48	64	36	72
.\" long double	?	?	128	<none>	<none>
.\" 
Some machines have more than one possible size for a given type.
The size you get can depend both on the compiler
and on various compile-time flags.
The following table shows ``safe'' type sizes on the majority of
systems.
Unsigned numbers are the same bit size as signed numbers.
.TS
center;
c c c
l r c.
Type	Minimum	No Smaller
	# Bits	Than
_
char	8
short	16	char
int	16	short
long	32	int
float	24
double	38	float
any *	14
char *	15	any *
void *	15	any *
.TE
.IP \(bu
The
.Ec void*
type is guaranteed to have enough bits
of precision to hold a pointer to any data object.
The
.Ec void(*)(\^\^)
type is guaranteed to be able to hold a pointer to any function.
Use these types when you need a generic pointer.
(Use
.Ec char*
and
.Ec char(*)(\^\^) ,
respectively, in older compilers).
.\"
.\" Any return value should do; `int(*)()' makes more sense,
.\" but then it's hard to #define back and forth between dpANS
.\" (void means void) and older compilers (#define void ...).
.\" You still bite the farm if the compiler understands void but
.\" not void*.
.\"
Be sure to cast pointers back to the correct type before using them.
.IP \(bu
Even when, say, a
.Ec void*
and a
.Ec char*
are the same \fIsize\fR, they may have different \fIformats\fR.
For example, the following will fail on some machines that have
.Ep sizeof(int*)
equal to
.Ep sizeof(char*) .
The code fails because
.Ep free
expects an
.Ec char*
and gets passed an
.Ec int* .
.\" See the comment above about the KCC compiler for DEC-20s
.Ex
int *p = (int *) malloc (sizeof(int));
free (p);
.Ee
.\"
.\" Another example:
.\" Consider the \fBqsort\fR routine, which takes a pointer to an array
.\" of `things', the size of each element, and a comparison function.
.\" Sorting an array of \fBchar*\fR, you may be tempted to say
.\" .DS
.\" qsort ((void*)argv, argc, sizeof(char*), strcmp);
.\" .DE
.\" This will surely bomb on some machines, however.
.\" \fBStrcmp(\^)\fR takes pointers to two \fBchar*\fRs,
.\" while \fBqsort(\^)\fR will \fIcall\fR it with two \fBvoid*\fRs.
.\"
.IP \(bu
Note that
the \fIsize\fP of an object does not guarantee the \fIprecision\fP of
that object.
The Cray-2 may use 64 bits to store an
.Ec int ,
but a \fIlong\fR cast into an
.Ec int
and back to a
.Ec long
may be truncated to 32 bits.
.IP \(bu
The integer
.Ec constant
zero may be cast to any pointer type.
The resulting pointer is called a
\fInull pointer\fR
for that type, and is different from any other pointer of that type.
A null pointer always compares equal to the constant zero.
A null pointer might \fInot\fR compare equal with a variable
that has the value zero.
Null pointers are \fInot\fR always stored with all bits zero.
Null pointers for two different types are sometimes different.
A null pointer of one type cast in to a pointer of another
type will be cast in to the null pointer for that second type.
.IP \(bu
On ANSI compilers, when to pointers of the same type access
the same storage, they will compare as equal.
When non-zero integer constants are cast to pointer types,
they may become identical to other pointers.
On non-ANSI compilers, pointers that
access the same storage may compare as different.
The following two pointers, for instance,
may or may not compare equal,
and they may or may not access the same storage.
.Ex
((int *) 2 )
((int *) 3 )
.Ee
.\"
.\" This is true, for instance, on the 8086, where the least-
.\" -significant bit is always ignored, except when accessing
.\" byte-sized values.  The pointer comparison (==) uses \fIall\fR
.\" bits, so the two pointers do \fInot\fR compare the same.
.\"
If you need `magic' pointers other than NULL,
either allocate some storage or treat the pointer as
a machine dependence.
.Ex
extern int x_int_dummy;		/* in x.c */
#define X_FAIL	(NULL)
#define X_BUSY	(&x_int_dummy)
.Ee
.Ex
#define X_FAIL	(NULL)
#define X_BUSY	MD_PTR1		/* MD_PTR1 from "machine.h" */
.Ee
.IP \(bu
Floating-point numbers have both a \fIprecision\fR and a \fIrange\fR.
These are independent of the size of the object.
Thus, overflow (underflow) for a 32-bit floating-point number will
happen at different values on different machines.
Also,
4.99999999999
times
5.00000000001
will yield
two different numbers on two different machines.
Differences in rounding and truncation can give surprisingly
different answers.
.IP \(bu
On some machines,
a
.Ec double
may have \fIless\fR range or precision than a
.Ec float .
.IP \(bu
On some machines the first half of a
.Ec double
may be a
.Ec float
with similar value.
Do \fInot\fR depend on this.
.IP \(bu
Watch out for signed characters.
On the VAX, for instance,
characters are sign extended when used in expressions,
which is not the case on many other machines.
Code that assumes signed/unsigned is unportable.
For example,
.Ep a[c]
won't work if
.Ep c
is supposed to be positive and is instead signed and negative.
If you must assume signed or unsigned characters, comment them as
.Ep SIGNED
or
.Ep UNSIGNED .
.IP \(bu
Avoid assuming \s-1ASCII\s+1.
If you must assume, document and localize.
Remember that characters may hold (much) more than 8 bits.
.IP \(bu
Code that takes advantage of the two's complement representation of
numbers on most machines should not be used.
Optimizations that replace arithmetic operations with equivalent
shifting operations are particularly suspect.
If absolutely necessary, machine-dependent code should be #ifdeffed
or operations should be performed by #ifdeffed macros.
You should weigh the time savings with the potential for obscure
and difficult bugs when your code is moved.
.IP \(bu
In general, if the word size or value range is important,
typedef ``sized'' types.
Large programs should have a central header file which supplies
typedefs for commonly-used width-sensitive types, to make
it easier to change them and to aid in finding width-sensitive code.
Unsigned types other than
.Ec "unsigned int"
are highly compiler-dependent.
If a simple loop counter is being used where either 16 or 32 bits will
do, then use
.Ec int ,
since it will get the most efficient (natural)
unit for the current machine.
.\"
.\" <side comment>
.\" Actually, there are many machines that use ``unnatural''
.\" int sizes to cope with ``the world is a VAX'' problems.
.\" The rule int == natural is still true, though.
.\" Modern compilers have a switch that lets you select either
.\" efficiency or bogus-VAX-code-compatibility.
.\" On the other hand, this is still a lie, because the libraries
.\" must work in any event.
.\" On the other (third?) hand, modern systems are being fixed.
.\"
.IP \(bu
Data \fIalignment\fR is also important.
For instance,
on various machines a 4-byte integer may start at any address,
start only at an even address, or start only at a multiple-of-four
address.
Thus, a particular structure may have its elements
at different offsets on different machines,
even when given elements are the same size on all machines.
Indeed, a structure of a 32-bit pointer and an 8-bit character may be
3 sizes on 3 different machines.
As a corollary, pointers to objects may not be interchanged freely;
saving an integer through a pointer
to 4 bytes starting at an odd address
will sometimes work,
sometimes cause a core dump,
and sometimes fail silently (clobbering other data in the process).
.\"
.\" In particular, the VAX will work, the 68000 (tho' not necessarily
.\" other family members) will dump, and the 8086 (tho' not
.\" necessarily other members) will ignore the lowest bit.
.\" The IBM RT will silently round the address down to the nearest
.\" multiple of four.
.\"
Pointer-to-character is a particular trouble spot on machines which
do not address to the byte.
Alignment considerations and loader peculiarities make it very rash
to assume that two consecutively-declared variables are together
in memory, or that a variable of one type is aligned appropriately
to be used as another type.
.IP \(bu
The bytes of a word are of increasing significance with increasing
address on machines such as the VAX (little-endian)
and of decreasing significance with increasing address on other
machines such as the 68000 (big-endian).
Hence any code that depends on the left-right orientation of bits
in a word deserves special scrutiny.
Bit fields within structure members will only be portable so long as
two separate fields are never concatenated and treated as a unit. [1,3]
Actually, it is nonportable to concatenate \fIany\fR two variables.
.IP \(bu
There may be unused holes in structures.
Suspect unions used for type cheating.
Specifically, a value should not be stored as one type and retrieved as
another.
An explicit tag field for unions may be useful.
.IP \(bu
Different compilers use different conventions for returning
structures.
This causes a problem when libraries return structure values
to code compiled with a different compiler.
Structure pointers are not a problem.
.IP \(bu
Do not make assumptions about the parameter passing mechanism.
especially pointer sizes and parameter evaluation order, size, etc.
The following code, for instance, is \fIvery\fR nonportable.
.Ex
	c = foo (*cp++, *cp++);

	char
foo (c1, c2, c3)
	char c1, c2, c3;
{
	char bar = *(&c1 + 1);
	return (bar);			/* often won't return c2 */
}
.\"
.\" It can be argued that if this *does* return c2, then
.\" sizeof(char) == sizeof(int).
.\"
.Ee
This example has lots of problems.
The stack may grow up or down
(indeed, there need not even be a stack!).
Parameters may be widened when they are passed,
so a
.Ec char
might be passed as an
.Ec int ,
for instance.
Arguments may be pushed left-to-right, right-to-left,
in arbitrary order, or passed in registers (not pushed at all).
The order of evaluation may differ from the order in which
they are pushed.
One compiler may use several (incompatible) calling conventions.
.\"
.\" <side comment>
.\" One machine (??), for instance pushes R-to-L for compatibility
.\" with Pascal, except for varargs, which are passed L-to-R to
.\" make varargs work.  This always works since Pascal functions
.\" are never called varargs.
.\"
.IP \(bu
On some machines, the null character pointer
.Ep "((char *)0)"
is treated the same way as a pointer to a null string.
Do \fInot\fR depend on this.
.IP \(bu
Do not modify string constants\*f.
.FS
.IP \*F
Some libraries attempt to modify and then restore read-only
string variables.
Programs sometimes won't port because of these broken libraries.
The libraries are getting better.
.FE
One particularly notorious (bad) example is
.Ex
s = "/dev/tty??";
strcpy (&s[8], ttychars);
.Ee
.IP \(bu
The address space may have holes.
Simply \fBcomputing\fR the address
of an unallocated element in an array
(before or after the actual storage of the array)
may crash the program.
If the address is used in a comparison,
sometimes the program will run but clobber data, give wrong answers,
or loop forever.
The only exception is that a pointer into an array of objects may
legally point to the first element after the end of the array.
This ``outside'' pointer may not be dereferenced.
.IP \(bu
Only the
.Ep =\^=
and
.Ep !\^=
comparisons are defined for all pointers of a given type.
It is only portable to use
.Ep < ,
.Ep <= ,
.Ep > ,
or
.Ep >= 
to compare pointers when they both point in to
(or to the first element after) the same array.
It is likewise only portable to use arithmetic operators on pointers
that both point into the same array or the first element afterwards.
.IP \(bu
Word size also affects shifts and masks.
The following code will clear only the three rightmost bits of an
\fIint\fR on \fIsome\fR 68000s.
On other machines it will also clear the upper two bytes.
.Ex
x &= 0177770
.Ee
Use instead
.Ex
x &= ~07
.Ee
which works properly on all machines\*f.
.FS
.IP \*F
The or operator (\ |\ ) does not have these problems,
nor do bitfields.
.FE
.IP \(bu
Side effects within expressions can result in code
whose semantics are compiler-dependent, since C's order of evaluation
is explicitly undefined in most places.
Notorious examples include the following.
.Ex
a[i] = b[i++];
.Ee
In the above example, we know only that
the subscript into
.Ep b
has not been incremented.
The index into
.Ep a
could be the value of
.Ep i
either before or after the increment.
.Ex
struct bar_t { struct bar_t *next; } bar;
bar->next = bar = tmp;
.Ee
In the second example, the address of
.Ep bar->next '' ``
may be computed before the value is assigned to
.Ep bar ''. ``
Compilers do differ.
.IP \(bu
Be suspicious of numeric values appearing in the code (``magic
numbers'').
.IP \(bu
Avoid preprocessor tricks.
Tricks such as using
.Ep /**/
for token pasting
and macros that rely on argument string expansion will break reliably.
.Ex
#define FOO(string)	(printf("string = %s",(string)))
\&...
FOO(filename);
.Ee
Will only sometimes be expanded to
.Ex
(printf("filename = %s",(filename)))
.Ee
Be aware, however, that tricky preprocessors may cause macros to break
\fIaccidentally\fR on some machines.
Consider the following two versions of a macro.
.Ex
#define LOOKUP(c)	(a['c'+(c)])	/* Sometimes breaks. */
#define LOOKUP(chr)	(a['c'+(chr)])	/* Works. */
.Ee
The first version of
.Ep LOOKUP
can be expanded in two different ways
and will cause code to break mysteriously.
.IP \(bu
Become familiar with existing library functions and defines.
(But not \fItoo\fR familiar.
The internal details of library facilities, as opposed to their
external interfaces, are subject to change without warning.
They are also often quite unportable.)
You should not be writing your own string compare routine,
terminal control routines, or making
your own defines for system structures.
``Rolling your own'' wastes your time and
makes your code less readable, because another reader has to
figure out whether you're doing something special in that reimplemented
stuff to justify its existence.
It also prevents your program
from taking advantage of any microcode assists or other
means of improving performance of system routines.
Furthermore, it's a fruitful source of bugs.
If possible, be aware of the \fIdifferences\fR between the common 
libraries (such as ANSI, POSIX, and so on).
.IP \(bu
Use \fIlint\fR\*f.
.FS
.IP \*F
\fILint\fR is not available on many systems.
.FE
It is a valuable tool for finding machine-dependent constructs as well
as other inconsistencies or program bugs that pass the compiler.
If your compiler has switches to turn on warnings, use them.
.IP \(bu
Suspect labels inside blocks with the
associated
.Ec switch
or
.Ec goto
outside the block.
.IP \(bu
Wherever the type is in doubt,
parameters should be cast to the appropriate type.
Always cast NULL when it appears in non-prototyped function calls.
Do not use function calls as a place to do type cheating.
C has confusing promotion rules, so be careful.
.IP \(bu
Use explicit casts when doing arithmetic
that mixes signed and unsigned values.
.IP \(bu
The inter-procedural goto,
.Ec longjmp ,
should be used with caution.
Many implementations ``forget'' to restore values in registers.
Declare critical values as
.Ep volatile
if you can or comment them as
.Ep VOLATILE .
.IP \(bu
Some linkers convert names to lower-case
and
some only recognize the first six letters as unique.
Programs may break quietly on these systems.
.IP \(bu
Beware of compiler extensions.
If used, document and
consider them as machine dependencies.
.IP \(bu
.\"
.\" <interesting, but most folks don't care?>
.\"
A program cannot generally execute code in the data
segment or write in to the code segment.
Even when it can, there is no guarantee that it can do so reliably.
.\"
.\" Examples: the 80386 default protection won't let you write to the
.\" code segment or execute from the data segment.  An 88000 will let
.\" you execute from the data segment, but unless the I-cache is told
.\" \fIexplicitly\fR to watch for invalidations, there is no way to
.\" tell when the I-cache will be updated.  And some of the bytes may
.\" be updated while others are left unchanged!
.\"
.NH
ANSI C
.PP
Modern C compilers support some or all of the ANSI proposed standard C.
Write code to run under standard C whenever possible and use
features such as function prototypes, constant storage, and volatile
storage.
Standard C improves program performance by giving better information
to optimizers.
Standard C improves portability by insuring that all compilers
accept the same input language and by providing mechanisms
that try to hide machine dependencies or emit warnings about
code that may be machine-dependent.
.NH 2
Compatibility
.PP
Write code that is easy to port to older compilers.
For instance,
conditionally #define new (standard) keywords such as
.Ec const
and
.Ec volatile
in a global \fI.h\fR file.
Standard compilers pre-define the preprocessor symbol
.Ep _\^\^_STDC_\^\^_ .
The
.Ec void*
type is hard to get right simply,
since some older compilers understand
.Ep void
but not
.Ep void* .
It is easiest to create a new
(machine- and compiler-dependent)
.Ep VOIDP
type, usually
.Ec char*
on older compilers.
.Ex
#ifdef _\^\^_STDC_\^\^_
	typedef void *VOIDP;
#	define COMPILER_SELECTED
#endif
#ifdef A_TARGET
#	define const
#	define volatile
#	define void int
	typedef char *VOIDP;
#	define COMPILER_SELECTED
#endif
#ifdef ...
	\&...
#endif
#ifdef COMPILER_SELECTED
#	undef COMPILER_SELECTED
#else
	{ NO TARGET SELECTED! }
#endif
.\"
.\" Alternatively, we could do
.\"
.\" #ifdef __STDC__
.\"	..
.\" #	define CONST const
.\" #	define VOLATILE volatile
.\" #else
.\"	..
.\" #endif
.\"
.\" Is one of these better?  Probably not, it will be a strange
.\" anacronism when everybody has forgotten that const once didn't
.\" exist.
.\"
.Ee
.NH 2
Formatting
.PP
The style for ANSI C is the same as for regular C,
with two notable exceptions: storage qualifiers
and parameter lists.
.PP
Because
.Ep const
and
.Ep volatile
have strange binding rules,
.\"
.\" In particular, "char const *s, *t" means both `t' and `s' point
.\" to constant storage, while "char * const s, *t" means that s is
.\" a constant, but `t' isn't.
.\"
.\" I think.
.\"
.\" `*' binds differently.
.\"
each
.Ec const
or
.Ec volatile
object should have a separate declaration.
.Ex
int const *s;		/* YES */
int const *s, *t;	/* NO */
.Ee
.PP
Prototyped functions merge parameter declaration
and definition in to one list.
Parameters should be commented in the function comment.
.Ex
/*
 * `bp': boat trying to get in.
 * `stall': a list of stalls, never NULL.
 * returns stall number, 0 => no room.
 */
	int
enter_pier (boat_t const *bp, stall_t *stall)
{
	\&...
.Ee
.\" .NH 2
.\" Storage Qualifiers
.NH 2
Prototypes
.PP
Function prototypes should be used
to make code more robust and to make it run faster.
Unfortunately, the prototyped \fBdeclaration\fR
.Ex
extern void bork (char c);
.Ee
is incompatible with the \fBdefinition\fR
.Ex
	void
bork (c)
	char c;
\&...
.Ee
The prototype says that
.Ep c
is to be passed as the most natural type for the machine,
probably a byte.
The non-prototyped (backwards\-compatible) definition implies that
.Ep c
is always passed as an
.Ec int \*f.
.FS
.IP \*F
Such automatic type promotion is called
.Ec widening .
For older compilers, the widening rules require that
all
.Ec char
and
.Ec short
parameters are passed as
.Ec int s
and that
.Ec float
parameters are passed as
.Ec double s.
.FE
If a function has promotable parameters then 
the caller and callee must be compiled identically.
Either both must use function prototypes
or neither can use prototypes.
The problem can be avoided if parameters are promoted when the program
is designed.
For example,
.Ec bork
can be defined to take an
.Ec int
parameter.
.PP
The above declaration works if the definition is prototyped.
.Ex
	void
bork (char c)
{
	\&...
.Ee
Unfortunately,
the prototyped syntax will cause non-ANSI compilers to reject the
program.
.\"
.\" There is no obvious way to define the function so that
.\" prototypes are used only when an ANSI compiler is used.
.\" Prototyped and nonprototyped declarations can be #ifdeffed on
.\" .Ep _\^\^_STDC_\^\^_ ,
.\" but the extra #ifdeffing causes maintainance problems
.\" and makes the code hard to read.
.\"
.\" Oh yeah, try
.\"
.\"	int DEFUN (foo, (a, p), int a AND char *p)
.\"	int foo FUN2(int, a, char *, p)
.\"
.\" But beware: ``Don't change the syntax via macro substitution.''
.\"
.PP
It \fIis\fR easy to write external declarations that work with both
prototyping and with older compilers\*f.
.FS
.IP \*F
Note that using
.Ep PROTO
violates the rule ``don't change the syntax via macro substitution.''
It is regrettable that there isn't a better solution.
.FE
.Ex
#ifdef _\^\^_STDC_\^\^_
#	define PROTO\^(x) x
#else
#	define PROTO\^(x) (\^)
#endif

extern char **ncopies PROTO((char *s, short times));
.Ee
Note that
.Ep PROTO
must be used with \fIdouble\fR parenthesis.
.PP
In the end,
it may be best to write in only one style (e.g., with prototypes).
When a non-prototyped version is needed, it is generated using an
automatic conversion tool.
.NH 2
Pragmas
.PP
Pragmas
are used to introduce machine-dependent code in a controlled way.
Obviously, pragmas should be treated as machine dependencies.
Unfortunately, the syntax of ANSI pragmas
makes it impossible to isolate them in machine-dependent headers.
.\"
.\" <side note>
.\" Because it is of the form ``#pragma'' instead of ``pragma(args)''.
.\" You can't put the #pragma in an include file, as it will get
.\" interpreted there.
.\"
.\" You are also prevented from embedding pragmas in macros.
.\"
.PP
Pragmas are of two classes.
.Ec Optimizations
may safely be ignored.
Pragmas that change the system behavior (``required pragmas'')
may not.
Required pragmas should be #ifdeffed so that compilation will abort if
no pragma is selected.
.PP
Two compilers may use a given pragma in two very different ways.
For instance, one compiler may use
.Ep haggis '' ``
to signal an optimization.
Another might use it to indicate that a given statement,
if reached, should terminate the program.
Thus, when pragmas are used,
they must always be enclosed in machine-dependent #ifdefs.
Pragmas must always be #ifdefed out for non-ANSI compilers.
Be sure to indent the octothorpe (#) on the
.Ep #pragma ,
as older preprocessors will halt on it otherwise.
.Ex
#if defined(_\^\^_STDC_\^\^_) && defined(USE_HAGGIS_PRAGMA)
	#pragma (HAGGIS)
#endif
.Ee
.QP
``\fIThe `#pragma' command is specified in the ANSI standard to have an
arbitrary implementation-defined effect.
In the GNU C preprocessor, `#pragma' first attempts to run the game
`rogue'; if that fails, it tries to run the game `hack'; if that
fails, it tries to run GNU Emacs displaying the Tower of Hanoi; if
that fails, it reports a fatal error.
In any case, preprocessing does not continue.\fR''
.br
.ad r
\(em Manual for the GNU C preprocessor for GNU CC 1.34.
.br
.ad b
.\" NEED MORE STUFF!!
.NH
Special Considerations
.PP
This section contains some miscellaneous do's and don'ts.
.\"
.\" This should probably be either "dos and don'ts" or
.\" "do's and don't's", but neither looks quite right.
.\"
.IP \(bu
Don't change syntax via macro substitution.
It makes the program unintelligible to all but the perpetrator.
.IP \(bu
Don't use floating-point variables where discrete values are needed.
Using a
.Ec float
for a loop counter is a great way to shoot yourself in the foot.
Always test floating-point numbers as \fB<=\fR or \fB>=\fR,
never use an exact comparison (\fB=\^=\fR or \fB!=\fB\&).
.IP \(bu
Compilers have bugs.
Common trouble spots include structure assignment and bitfields.
You cannot generally predict which bugs a compiler has.
You \fIcould\fR write a program that avoids all constructs that are
known broken on all compilers.
You won't be able to write anything useful,
you might still encounter bugs,
and the compiler might get fixed in the meanwhile.
Thus, you should write ``around'' compiler bugs only when you are
\fIforced\fR to use a particular buggy compiler.
.IP \(bu
Do not rely on automatic beautifiers.
The main person who benefits from good program style is the
programmer him/herself,
and especially in the early design of handwritten algorithms
or pseudo-code.
Automatic beautifiers can only be applied to complete, syntactically
correct programs and hence are not available when the need for
attention to white space and indentation is greatest.
Programmers can do a better job of making clear
the complete visual layout of a function or file, with the normal
attention to detail of a careful programmer
(in other words, some of the visual layout is dictated by intent
rather than syntax
and beautifiers cannot read minds).
Sloppy programmers should learn to be careful programmers instead of
relying on a beautifier to make their code readable.
Finally, since beautifiers are non-trivial programs
that must parse the source,
a sophisticated beautifier
is not worth the benefits gained by such a program.
Beautifiers are best for gross formatting of machine-generated code.
.IP \(bu
Accidental omission of the second
.Ep = '' ``
of the logical compare is a problem.
Use explicit tests.
Avoid assignment with implicit test.
.Ex
abool = bbool;
if (abool) { ...
.Ee
When embedded assignment \fIis\fR used, make the test explicit
so that it doesn't get ``fixed'' later.
.Ex
while ((abool = bbool) != FALSE) { ...
.Ee
.Ex
while (abool = bbool) { ...	/* VALUSED */
.Ee
.Ex
while (abool = bbool, abool) { ...
.Ee
.IP \(bu
Comment explicitly
variables that are changed out of the normal control flow,
or other code that is likely to break during maintenance.
.IP \(bu
Modern compilers will put variables in registers automatically.
Use the
.Ec register
sparingly to indicate the variables that you think are most critical.
In extreme cases, mark the 2-4 most critical values as
.Ep register
and mark the rest as
.Ep REGISTER.
The latter can be #defined to
.Ep register
on those machines with many registers.
.NH
Lint
.PP
\fILint\fR is a C program checker [2] that examines C source files to
detect and report type incompatibilities, inconsistencies between
function definitions and calls,
potential program bugs, etc.
The use of \fIlint\fR on all programs is strongly recommended,
and it is expected that most projects will require programs to use
\fIlint\fR as part of the official acceptance procedure.
.PP
It should be noted that the best way to use \fIlint\fR is not as a
barrier that must be overcome before official acceptance of a program,
but rather as a tool to use during and after changes or additions to
the code.
\fILint\fR
can find obscure bugs and insure portability before problems occur.
Many messages from \fIlint\fR really do indicate something wrong.
One fun story is about is about a program that was missing
an argument to
.Ep fprintf '. `
.Ex
fprintf ("Usage: foo -bar <file>");
.Ee
The \fIauthor\fR never had a problem.
But the program dumped core every time an ordinary user made a mistake
on the command line.
Many versions of \fIlint\fR will catch this.
.PP
The \-h, \-p, \-a, \-x, and \-c options are worth learning.
All of them will complain about some legitimate things, but they will
also pick up many botches.
Note that \-p checks function-call type-consistency for only a subset
of Unix library routines, so programs should be linted both with and
without \-p for the best ``coverage''.
.PP
\fILint\fR also recognizes several special comments in the code.
These comments both shut up \fIlint\fR when the code
otherwise makes it complain,
and they also document special code.
.NH
Make
.PP
One other very useful tool is \fImake\fR [7].
During development,
\fImake\fR recompiles only those modules that have been changed
since the last time \fImake\fR was used. 
Some common conventions include:
.TS
center;
r l.
all	\fIalways\fP makes all binaries
clean	remove all intermediate files
debug	make a test binary 'a.out' or 'debug'
depend	make transitive dependencies
install	install binaries
lint	run lint
print/list	make a hard copy of all source files
shar	make a shar of all source files
spotless	make clean, use revision control to put away sources.
	Note: doesn't remove Makefile, although it is a source file
sources	undo what spotless did
tags	run ctags, (using the -t flag is suggested)
rdist	distribute sources to other hosts
\fIfile.c\fR	check out the named file
.TE
In addition, command-line defines
can be given to define either Makefile values
(such as ``CFLAGS'')
or values in the program
(such as ``DEBUG'').
.NH
Project Dependent Standards
.PP
Individual projects may wish to establish additional standards beyond
those given here.
The following issues are some of those that should be addressed by
each project program administration group.
.IP \(bu
What additional naming conventions should be followed?
In particular, systematic prefix conventions for functional grouping
of global data and also for structure or union member names can be
useful.
.IP \(bu
What kind of include file organization is appropriate for the
project's particular data hierarchy?
.IP \(bu
What procedures should be established for reviewing \fIlint\fR
complaints?
A tolerance level needs to be established in concert with the \fIlint\fR
options to prevent unimportant complaints from hiding complaints about
real bugs or inconsistencies.
.IP \(bu
If a project establishes its own archive libraries, it should plan on
supplying a lint library file [2] to the system administrators.
The lint library file allows \fIlint\fR to check for compatible use of
library functions.
.IP \(bu
What kind of revision control needs to be used?
.NH
Conclusion
.PP
A set of standards has been presented for C programming style.
Among the most important points are:
.IP \(bu
The proper use of white space and comments
so that the structure of the program is evident from
the layout of the code.
The use of simple expressions, statements, and functions
so that they may be understood easily.
.IP \(bu
To keep in mind that
you or someone else will likely be asked to modify code or make
it run on a different machine sometime in the future.
Craft code so that it is portable to obscure machines.
Localize optimizations since they are often confusing
and may be ``pessimizations'' on other machines.
.IP \(bu
Many style choices are arbitrary.
Having a style that is consistent
(particularly with group standards)
is more important than following absolute style rules.
Mixing styles is worse than using any single bad style.
.PP
As with any standard, it must be followed if it is to be useful.
If you have trouble following any of these standards
don't just ignore them.
Talk with your local guru,
or an experienced programmer at your institution.
.bp
.ce 1
\fBReferences\fR
.sp 2
.IP [1]
B.A. Tague, \fIC Language Portability\fP, Sept 22, 1977.
This document issued by department 8234 contains three memos by
R.C. Haight, A.L. Glasser, and T.L. Lyon dealing with style and
portability.
.IP [2]
S.C. Johnson, \fILint, a C Program Checker\fP,
USENIX
.UX
Supplementary Documents, November 1986.
.IP [3]
R.W. Mitze, \fIThe 3B/PDP-11 Swabbing Problem\fP, Memorandum for File,
1273-770907.01MF,
September 14, 1977.
.IP [4]
R.A. Elliott and D.C. Pfeffer, \fI3B Processor Common Diagnostic
Standards- Version 1\fP,
Memorandum for File, 5514-780330.01MF, March 30, 1978.
.IP [5]
R.W. Mitze,
\fIAn Overview of C Compilation of
.UX
User Processes on the 3B\fR,
Memorandum for File, 5521-780329.02MF, March 29, 1978.
.IP [6]
B.W. Kernighan and D.M. Ritchie,
\fIThe C Programming Language\fR, 
Prentice-Hall 1978.
.IP [7]
S.I. Feldman,
\fIMake \(em A Program for Maintaining Computer Programs\fR,
USENIX
.UX
Supplementary Documents, November 1986.
.IP [8]
Ian Darwin and Geoff Collyer,
\fICan't Happen or /* NOTREACHED */ or Real Programs Dump Core\fR,
USENIX Association Winter Conference, Dallas 1985 Proceedings.
.IP [9]
Brian W. Kernighan and P. J. Plaugher
\fIThe Elements of Programming Style\fP.
McGraw-Hill, 1974
.IP [10]
J. E. Lapin
\fIPortable C and Unix System Programming\fR
Prentice-Hall 1987
.Ee
.bp
\s+1
.ce
\fBThe Ten Commandments for C Programmers\fR
\s-1
.sp 2
.ce
\fIHenry Spencer\fR
.sp 2
.IP 1
Thou shalt run \fIlint\fR frequently and study its pronouncements with
care, for verily its perception and judgement oft exceed thine.
.IP 2
Thou shalt not follow the NULL pointer,
for chaos and madness await thee at its end.
.IP 3
Thou shalt cast all function arguments to the expected type
if they are not of that type already,
even when thou art convinced that this is unnecessary,
lest they take cruel vengeance upon thee when thou least expect it.
.IP 4
If thy header files fail to declare the return types
of thy library functions,
thou shalt declare them thyself with the most meticulous care,
lest grievous harm befall thy program.
.IP 5
Thou shalt check the array bounds of all strings (indeed, all arrays),
for surely where thou typest ``foo'' someone someday shall type
``supercalifragilisticexpialidocious''.
.IP 6
If a function be advertised to return an error code in the event of
difficulties,
thou shalt check for that code, yea, even though the checks
triple the size of thy code and produce aches in thy typing fingers,
for if thou thinkest ``it cannot happen to me'',
the gods shall surely punish thee for thy arrogance.
.IP 7
Thou shalt study thy libraries and strive not to re-invent them
without cause,
that thy code may be short and readable and thy days pleasant and
productive.
.IP 8
Thou shalt make thy program's purpose and structure
clear to thy fellow man by using the
One True Brace Style,
even if thou likest it not,
for thy creativity is better used in solving problems than in creating
beautiful new impediments to understanding.
.IP 9
Thy external identifiers shall be unique in the first six characters,
though this harsh discipline be irksome and the years of its necessity
stretch before thee seemingly without end,
lest thou tear thy hair out and go mad on that fateful day when
thou desirest to make thy program run on an old system.
.IP 10
Thou shalt foreswear, renounce,
and abjure the vile heresy which claimeth
that ``All the world's a VAX'', and have no commerce with the
benighted heathens who cling to this barbarous belief,
that the days of thy program may be long even though the days of thy
current machine be short.
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo

msb@sq.sq.com (Mark Brader) (12/17/89)

The revised Indian Hill style guide recently posted here is excellent
in many respects, but the example function copy() in section 14 contains
no fewer than four bugs, of which three are in the same line.  Here's a
patch which I have tested and which fixes all of them.  Finding them
without reading the patch is left as an exercise for the reader.

Patch follows:


*** ihill.old	Sun Dec 17 12:55:21 1989
--- ihill	Sun Dec 17 12:57:38 1989
***************
*** 1478,1490 ****
  .\"
  {
! 	char *retval = dest;
  
! 	while (*dest++ = *++src && maxsize-- > 0)
! 		;		/* VOID */
  
! 	if (maxsize == 0)
! 		retval = NULL;
! 
! 	return (retval);
  }
  .Ee
--- 1478,1488 ----
  .\"
  {
! 	char *p = dest;
  
! 	while (maxsize-- > 0)
! 		if ((*p++ = *src++) == '\\\\0')
! 			return dest;
  
! 	return (char *) NULL;
  }
  .Ee


-- 
Mark Brader, SoftQuad Inc., Toronto, utzoo!sq!msb, msb@sq.com
	"I'm a little worried about the bug-eater," she said.  "We're embedded
	in bugs, have you noticed?"		-- Niven, "The Integral Trees"

This article is in the public domain.