[comp.unix.programmer] looking for sysv sum

herbie@dec07.cs.monash.edu.au (Andrew Herbert) (06/03/91)

Hello all.

Can anyone tell me where I can find a description of the SysV sum(1)
checksum algorithm, or some code which implements it?  I am using
SysVR4, but couldn't find anything to do this in the standard libraries.

Thanks,
Andrew Herbert

tchrist@convex.COM (Tom Christiansen) (06/03/91)

From the keyboard of herbie@dec07.cs.monash.edu.au (Andrew Herbert):
:Hello all.
:
:Can anyone tell me where I can find a description of the SysV sum(1)
:checksum algorithm, or some code which implements it?  I am using
:SysVR4, but couldn't find anything to do this in the standard libraries.

I think I can tell you.  I have no SysVr4 source code, so had to reverse
engineer what's going on by taking a working emulation of a SysV sum(1)
program written in perl (after confirming it really does give the same
output as sum(1)) and then looking to see what perl's doing inside.  But I
get the same results, so something here must be right.

To start with, this perl code seems to emulate the sum(1) command fairly 
well, as found on a SysV system I have lying around here:

    while (<>) {
	$checksum += unpack("%31C*", $_);
	$checksum %= 65535;
	$bytes += length;
	if (eof) {
	    printf "%d %d %s\n", $checksum, ($bytes+511/512, $ARGV;
	    $checksum = $bytes = 0;
	}
    }

Speed freaks might take note that the following rendition actually
faster than the C code!  Big buffers pay off.

    while ($ARGV = shift) {
	warn("can't open $ARGV: $!"), next unless open ARGV;
	while (read(ARGV,$_,16 * 512)) {
	    $checksum += unpack("%31C*", $_);
	    $checksum %= 65535;
	    $bytes += length;
	} 
	printf "%d %d %s\n", $checksum, ($bytes+511)/512, $ARGV;
	$checksum = $bytes = 0;
    } 


Of course, this doesn't really help you to know what's going on until you
know what unpack() is doing.  Looking in perl/src/doio.c, in the function
do_unpack(), you find that what's happening is basically the following
(loosely transcribed):

    checksum = 31;  /* from the %31C in unpack */
    sum = 0;
    unsigned char *sp = string; /* string is a (char *) pointing to $_
    while (*sp) sum += *sp++; 
    sum &= (1 << checksum) - 1;
    return sum;

That's what happening for each record.  If you look at the above perl
code, we add in this sum to our running $checksum variable each time
through the perl while loop, and then modulo it by 65535 each time (not
65536) to keep it small.  Then when each file runs out, we output this
value, the number of 512-byte blocks, and the file's name.

Hope this helps.  

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
	    "Perl is to sed as C is to assembly language."  -me

thad@public.BTR.COM (Thaddeus P. Floryan) (06/03/91)

In article <herbie.675920143@dec07> herbie@dec07.cs.monash.edu.au (Andrew Herbert) writes:
>[...]
>Can anyone tell me where I can find a description of the SysV sum(1)
>checksum algorithm, or some code which implements it?  I am using
>SysVR4, but couldn't find anything to do this in the standard libraries.
>[...]

Around mid to late 1987 a program named "vitals" appeared on the net, most
likely comp.sources.misc, which will do what you want.  The program works
fine on the following systems I've tested: 3b1 (3.51 & 3.51m), SunOS (4.0.3
and 4.1.1), MightyFrame (CTIX 5.* and 6.*), VAX/VMS (4.7 and 5.*), AmigaDOS
(1.3.*, 2.*), MS-DOS (3.3), PS/2 (A/IX 1.2.1), Mac A/UX (1.* and 2.*), SGI
IRIX (various), and even on HP-UX where it was developed (though I haven't
(yet) seen it as a standard part of HP-UX distribution as its author claims
it would be :-)

Enclosed is the relevant part of the original posting to aid you finding it;
it should be available at ANY archive site.  The "sum" result of vitals is
the same as every UNIX "sum" I've tested:

+ Submitted-by: Alan Silverstein <hpda!hpfcla!hpfcdt!ajs>
+ Posting-number: Volume 11, Issue 66
+ Archive-name: vitals
+ 
+ This program was developed by Hewlett-Packard and will be part of our
+ HP-UX product offering.  We have found it useful.  It is most useful
+ when most widely and commonly shared, so here it is.
+ 
+ We have not tested it except on HP-UX, which is mainly AT&T-compatible.
+ However, it should be pretty portable.  Caveat emptor.  Oh, and the
+ usual disclaimer that I'm not really an official HP spokesperson.
+ 
+ Alan Silverstein, Hewlett-Packard Systems Software Operation, Fort Collins,
+ Colorado; {ihnp4 | hplabs}!hpfcla!ajs; 303-229-3053; (lat-long on request :-)

and some misc. extractions from the vitals.1 file:

[...]
	.SH NAME
	vitals \- crc, sum, line, word, and character counts
[...]
	.SH DESCRIPTION
	.I Vitals
	checks data integrity by
	computing vital statistics related to the data in the given
	file(s) or standard input (by default).
	The statistics include a four-digit hex CRC, a 16-bit byte sum (similar
	to
	.IR sum (1)
	without the block count)
[...]
	.SH AUTHOR
	.I Vitals
	was developed by Hewlett-Packard.
	.SH "SEE ALSO"
	sum(1), wc(1).

Thad Floryan [ thad@btr.com (OR) {decwrl, mips, fernwood}!btr!thad ]