[comp.lang.c] Requests for nominations of Great C Code

jeffrey@algor2.UUCP (Jeffrey Kegler) (04/01/89)

In article <354@cbnewsc.ATT.COM> isaac@cbnewsc.ATT.COM (isaac.j.champagne) writes:
> It seems like the "how to" of programming may be more important than
> lots of theories.

To become a "great" C programmer, I feel the best way is to study
the works of the masters.

I am very interested in the answers others have to the following:  What ts
the best C code you have ever read?  Let me impose these restrictions:

1.) The code should be published, public domain or freerighted (in the GNU
style), so that anyone wishing to study it has reasonable access to it.

2.) Of course, we are all too modest to start posting our own code.

3.) It should be large and complex enough so that someone who has waded
through it and understood can feel he now "knows" C.  A collection of small
programs qualifies, but not a 3 page programming pearl.

4.) Please nominate only stuff you have closely read from beginning to end.
I have worked for some time on code for which I cannot make this claim
(not that I would nominate it anyway).  This requirement in combination
with the previous means the best of us will only know of a handful of such
examples, at most.

5.) I want nominations, but not votes.  One or two authoritative and
reasoned postings from some of our net mavens (we know who they are) will
carry a lot more weight with me, at least, that a lot of people saying "I
have heard John Doe's _The_Nerd's_Complete_Database_Library_in_C_ is good."
-- 

Jeffrey Kegler, President, Algorists,
jeffrey@algor2.UU.NET or uunet!algor2!jeffrey
1788 Wainwright DR, Reston VA 22090

hascall@atanasoff.cs.iastate.edu (John Hascall) (04/02/89)

In article <433@algor2.UUCP> jeffrey@algor2.UUCP (Jeffrey Kegler) writes:
>In article <354@cbnewsc.ATT.COM> isaac@cbnewsc.ATT.COM (isaac.j.champagne) writes:
>> It seems like the "how to" of programming may be more important than
>> lots of theories.

>To become a "great" C programmer, I feel the best way is to study
>the works of the masters.

>I am very interested in the answers others have to the following:  What ts
>the best C code you have ever read?  Let me impose these restrictions:
 
   I'd like to cast a negative vote for
   the 4.2 BSD Un*x source, wholly unreadable.  :-)
 
   John Hascall
   ISU Comp Center

libes@cme.nbs.gov (Don Libes) (04/04/89)

>>I am very interested in the answers others have to the following:  What ts
>>the best C code you have ever read?
> 
>   I'd like to cast a negative vote for
>   the 4.2 BSD Un*x source, wholly unreadable.  :-)

I'd like to cast both positive and negative votes for Gosling's emacs.
And who could forget display.c?  I have reproduced the beginning of it
here for your amusement.

/********************************************************\
* 							 *
* 	Ultra-hot screen management package		 *
* 							 *
\********************************************************/

/****************************************************************

          		 /-------------\
			/		\
		       /		 \
		      /			  \
		      |	  XXXX	   XXXX	  |
		      |	  XXXX	   XXXX	  |
		      |	  XXX	    XXX	  |
		      \		X	  /
		       --\     XXX     /--
			| |    XXX    | |
			| |	      | |
			| I I I I I I I |
			|  I I I I I I	|
			 \	       /
			  --	     --
			    \-------/
		    XXX			   XXX
		   XXXXX		  XXXXX
		   XXXXXXXXX	     XXXXXXXXXX
			  XXXXX	  XXXXX
			     XXXXXXX
			  XXXXX	  XXXXX
		   XXXXXXXXX	     XXXXXXXXXX
		   XXXXX		  XXXXX
		    XXX			   XXX

			  **************
			  *  BEWARE!!  *
			  **************

			All ye who enter here:
		    Most of the code in this module
		       is twisted beyond belief!

			   Tread carefully.

		    If you think you understand it,
			      You Don't,
			    So Look Again.

 ****************************************************************/


Needless to say, the comment was not an exageration.

Don Libes          libes@cme.nbs.gov      ...!uunet!cme-durer!libes

djones@megatest.UUCP (Dave Jones) (04/04/89)

From article <433@algor2.UUCP>, by jeffrey@algor2.UUCP (Jeffrey Kegler):
> 
> To become a "great" C programmer, I feel the best way is to study
> the works of the masters.
> 
> I am very interested in the answers others have to the following:  What ts
> the best C code you have ever read?  Let me impose these restrictions:
> 

   [...]
> 
> 2.) Of course, we are all too modest to start posting our own code.
> 

Whhhh???  Have you been reading this group very long?

I would be surprised if there were ONE contributer to this group whom
modesty would prevent from informing you that he wrote the best code
he had ever read -- the best damn code ANYBODY ever read, dammit!
I mean, one simply can't allow unrestrained modesty to hinder the
furthering the One True Way, can one?  What vanity that would be!

If you allowed people to submit their own code, you would be overwhelmed
with entries.  By restricting it to OPC (Other People's Code) you will
only get entries from cliques.  (I'll show him yours if you'll show
him mine.)

Now if you really want to see an example of the best code possible,
send me a private email request.  You see, I have some things in my
personal library...



		Dave J.

mouse@mcgill-vision.UUCP (der Mouse) (04/14/89)

In article <1115@muffin.cme.nbs.gov>, libes@cme.nbs.gov (Don Libes) writes:
[the >>> lines are someone else; I forget who]
>>> I am very interested in the answers others have to the following:
>>> What ts the best C code you have ever read?

Very hard question.  I've seen so little good code, and I remember the
bad code for longer.  On thought, I'm not sure I've seen any really
good code on the size scale you're looking for (and that includes my
own code, too :-).

> I'd like to cast both positive and negative votes for Gosling's
> emacs.  And who could forget display.c?  I have reproduced the
> beginning of it here for your amusement.

> [skull-and-crossbones comment from display.c.  Following words have
> been reformatted to save lines.  Orignal was much prettier.]

> All ye who enter here:  Most of the code in this module is twisted
> beyond belief!  Tread carefully.  If you think you understand it,
> 			      You Don't,
> 			    So Look Again.

He wasn't kidding, either.  I once tried to rewrite it preparatory to,
I think, making the cost formulas conform more closely to reality for
the Sun console (where, for example, insert and delete operations have
high overhead and slightly *negative* per-count cost).  All I succeeded
in doing was slowing it down by about a factor of four, and introducing
some bugs to boot.

I ended up retrofitting the new cost formulas into the old code, and
fudging to pretend that insert/delete operations were purely overhead.
Someday when I have a spare month or two, perhaps....

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

chris@mimsy.UUCP (Chris Torek) (04/16/89)

In article <1500@mcgill-vision.UUCP> mouse@mcgill-vision.UUCP (der Mouse)
writes:
[re jag's display.c skull-and-crossbones:]
>>All ye who enter here:  Most of the code in this module is twisted
>>beyond belief!  Tread carefully.  If you think you understand it,
>>			      You Don't,
>>			    So Look Again.

>He wasn't kidding, either.  I once tried to rewrite it preparatory to,
>I think, making the cost formulas conform more closely to reality for
>the Sun console (where, for example, insert and delete operations have
>high overhead and slightly *negative* per-count cost).  All I succeeded
>in doing was slowing it down by about a factor of four, and introducing
>some bugs to boot.

Part of the problem was that the description of the algorithm was
contained in a CACM article; display.c had only a few hints in comments
as to what was going on.  But it was not actually all that hard to
work out.  My version of his display.c does have separate overhead
and per-count costs; it turns out, though, that on the Sun console,
if you have direct access to the frame buffer, it is faster just to
rewrite.

If anyone is curious: what is going on with the `M' matrix is
surprisingly simple.  You fill the thing with a bunch of arrows
pointing either to the left, up, or backwards diagonally.  Each
arrow thus points at a neighbour arrow, except for those along the
left edge and top row.  To fill the matrix, set up the arrows along
the left edge to point up; set those along the top to point left.
Set the one at (0,0) to `done':

	* < < < <
	^ . . . .
	^ . . . .
	^ . . . .
	^ . . . .

(This represents a four-line screen.)  An up arrow represents a line
deletion; a left arrow is an insert; and a back arrow is a rewrite in
place.  The row index (called `i') represents the contents of the
current screen at line `i', and the column index (called `j')
represents the contents of the desired screen at line `j'.  Then fill
in each `.' with the arrow that gives the least cost for transforming
the screen according to an insert, delete, or rewrite given i and j.
If the filled-in matrix looks like this:

	*  <  <  <  <

	^  \  <  \  \

	^  ^  \  ^  \

	^  \  ^  ^  \

	^  ^  ^  \  \

then we would (start at the bottom right) move back, up, up, back,
left, left, and done, rewriting line 4 in place, deleting line 3,
deleting line 2, writing line 1 in place, inserting (moving 1 down to
2), writing 1 in place again, inserting again (moving 1 and 2 to 2 and
3), and finally writing line 1 in place again.  All those other unused
arrows are there only because we did not know, before we finished,
whether they would be used.  The inserts and deletes we gather
together along the way, since some terminals can do multi-line
operations, so we really get `rewrite 4, goto 2, 2*delete, goto 1,
rewrite, goto 2, 2*insert, rewrite 2, goto 3, rewrite 3, done'.
(This is actually a stupid sequence and would never occur in
practise.)

What makes it hard is getting the right arrows into the cells.  Each
cell also has a cost (in terms of `characters sent to the terminal')
attached to it.  The cost for a delete is simply the cost of that
delete: if an ESC-M deletes a line, the cost for delete is 2.  (This
ignores padding, which complicates matters; more on that later.)  The
cost for an insert is the cost of that insert (2 for, e.g., ESC-L) plus
the cost for drawing line j (the one you want to show on the screen).
The cost for moving straight back (diagonally upward) is just the cost
for rewriting line i into line j.  If the text on line i matches that
on line j, this is free; otherwise it should be the number of
characters required, but as an optimisation, the code just uses the
number of characters on line j (otherwise it might have to compare n^2
lines; as it is, the comparison is `if the hashes match, the lines are
the same and the rewrite is free').

As an optimisation, instead of having each `up' or `left' arrow point
to its neighbour cell, if that cell is also an up, or is also a left,
we have it point to where its neighbour points (leapfrogging over all
the same sort of arrows, so to speak).  This gives a nice way to see
how many inserts or deletes are to be used.

On some terminals (ANS X3.64), insert or delete operations have a fixed
overhead no matter how many lines are inserted or deleted, but have a
variable additional factor.  For instance, ESC [ 4 L might insert four
lines, with essentially the same cost as ESC [ 1 L (inserts 1 line),
except for padding.  The padding cost depends on how far above the
bottom of the screen the cursor is at the time of the operation: to
insert four lines, the terminal must move (rows-4) lines downward, and
then must clear the four new lines.  On a 30-line screen, this affects
at least eight lines (23,24,25,26 move down to 27,28,29,30; then clear
23,24,25,26) and at most 30 (1,2,3,...,26 move to 5,6,7,...,30; then
clear 1,2,3,4).  The same rule applies to padding for deletion.  On
some stupidly-coded terminals (H19 in ANSI mode), padding is remarkably
significant: the H19 inserts 6 lines by inserting 1 line 6 times!
This can take quite a while and must be accounted for.  It is simplest
to define a fixed cost, a variable (per n lines) cost, with each
split into `overhead' and `padding due to being above the bottom of
the screen':

	let
		c_f = <fixed cost>
		c_n = <per-operation cost>
		n = <number of lines to insert or delete>
		k = <rows on screen> + 1 - <position of operation>

	then
		cost = n * (c_n.overhead + k * c_n.padding) +
			    c_f.overhead + k * c_f.padding

der Mouse's Sun costs are then

		c_f = (overhead = <large>, padding = ?)
		c_n = (overhead = <slightly negative>, padding = 0)

while those on a fast terminal that uses ESC-L and ESC-M for each
insert and delete are

		c_f = (overhead = 0, padding = 0)
		c_n = (overhead = 2, padding = 0)

Within Emacs, the padding cost represents actual NULs sent to the terminal
to keep it from sending ^S/^Q flow control; but even if flow control
is available, the pad costs should still be counted, as they represent
time taken by the terminal to do the operation.

So: a comment in my display.c:

   The algorithm used is a dynamic programming one, where
      M[i,j] = min (M[i-1,j]+dcost,		# UP,   implies delete
		    M[i,j-1]+icost+redraw cost,	# LEFT, implies ins+redraw
		    M[i-1,j-1]+rewrite cost)	# BACK, implies rewrite

   (It is not necessary to count a redraw cost for UPs as they force an
    equivalent number of LEFTs at some point.)

   The rewrite cost for a line is its redraw cost, unless line i matches
   line j, where it is free.  The UP and LEFT cases must also add the
   cost for doing an insert or delete.  This is actually rather tricky:
   the cost for an ID varies with the position above the bottom of the
   screen, and sometimes with the number of line IDs.  As it turns out
   we can just look at the UP'th or LEFT'th cell to see whether this is
   the first of a sequence.

   After we have finished putting UPs, LEFTs, and BACKs in the matrix,
   we start at the lower right corner and work backward, following the
   arrows up, left, or back, eventually reaching the top left.  Going
   up implies a deletion; going left implies insertion and rewrite; and
   every back implies a rewrite of old line i to new line j.  There is
   an exception:  UPs along the right edge of the matrix, and LEFTs
   along the bottom, are merely establishing a starting position and
   do not cause I/D operations (doing I/D there would be pointless
   anyway).  (LEFTs do still cause redraws.)

   Note that the elements along the left edge (j=0) are constant, and
   that the elements along the top (i=0) are constants plus the redraw
   sum for all lines <= j.

As it happens, we can precompute a great deal of these factors.  The
code to fill in the M cost matrix is order n^2, so getting the constant
factor down helps (n is typically between 24 and 60).  A new algorithm
would help more, but this one is provably optimal if the cost factors
are correct (which, alas, they are not).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris