[comp.lang.perl] Calculating XOR checksums? Fast splitting?

gnb@bby.oz.au (Gregory N. Bond) (06/03/91)

Given a string, what is the fastest way to calculate the XOR of all
bytes?  here is what I used:

    $bcc = 0;
    grep ($bcc ^= $_, unpack("C" x length($sc_data), $sc_data));

But is there a batter way without having to construct and destroy the
array? (I need this to be fairly quick...).

And a second point, what is likely to be faster to split a string into
several subfields?  A regexp:
	$str =~ /(.{32})(.{24})(.{8})/;
	($a, $b, $c) = ($1, $2, $3);
or unpack:
	($a, $b, $c) = unpack("c32 c24 c8", $str);
or substr:
	$a = substr($str, 0, 32);
	$b = substr($str, 32, 24);
	$c = substr($str, 32 + 24, 8);

Or something I haven't considered?

Greg, performance weenie (at least for this application!)
--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb

tchrist@convex.COM (Tom Christiansen) (06/04/91)

From the keyboard of gnb@bby.oz.au (Gregory N. Bond):
:Given a string, what is the fastest way to calculate the XOR of all
:bytes?  here is what I used:
:
:    $bcc = 0;
:    grep ($bcc ^= $_, unpack("C" x length($sc_data), $sc_data));
:
:But is there a batter way without having to construct and destroy the
:array? (I need this to be fairly quick...).

Well, you should use "C*" instead of what you have to save about 15%.
The real shame is that it has to be an XOR checksum; if you could
tolerate an additive one, then you could just use this:

    $bcc = unpack('%31C*', $sc_data);

and run it about 5% of the time that your current loop takes.  If
XOR checksums are that common, maybe you might prevail upon Larry
to add such a feature.  Maybe "^31C*" or some such.

:And a second point, what is likely to be faster to split a string into
:several subfields?  A regexp:
:	$str =~ /(.{32})(.{24})(.{8})/;
:	($a, $b, $c) = ($1, $2, $3);
:or unpack:
:	($a, $b, $c) = unpack("c32 c24 c8", $str);
:or substr:
:	$a = substr($str, 0, 32);
:	$b = substr($str, 32, 24);
:	$c = substr($str, 32 + 24, 8);



You could save about 11% if you did a direct assignment for the regexp:

    ($a, $b, $c) = $str =~ /(.{32})(.{24})(.{8})/;

By using unpack, you save an additional 25%.  Your unpack, by the way,
is wrong.  You need to be using an A not a C format there.

Perhaps counterintuitively, it is a wee bit faster in this
case to use the 3 substr()s over the unpack().  However, when 
you have 10 fields, it is faster to use unpack() than either
of the other methods, with substr()s taking ~7% longer and
regexp taking ~20% longer.


It's pretty easy to divine these yourself.  I basically did this
on all these cases to find what the differences were:

    $COUNT = 10000;
    ($u, $s) = times;
    for ($I = 0; $I < $COUNT; $I++) {
	# some operation
    }
    ($nu, $ns) = times;
    printf "%8.4fu %8.4fs\n", ($nu - $u), ($ns - $s);



--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
	    "Perl is to sed as C is to assembly language."  -me

markb@agora.rain.com (Mark Biggar) (06/04/91)

In article <GNB.91Jun3135656@leo.bby.oz.au> gnb@bby.oz.au (Gregory N. Bond) writes:
>Given a string, what is the fastest way to calculate the XOR of all
>bytes?  here is what I used:
>
>    $bcc = 0;
>    grep ($bcc ^= $_, unpack("C" x length($sc_data), $sc_data));

try folding the string in half.  Assumeing ther is a vec() in the program
somewhere the following should work:

$i = length($sc_data);
while ($j = int(($i+1)/2)) {
	($sc_data, $a) = unpack("A$j A$j", $sc_data);
	$sc_data ^= $a;
	$i = $j;
}

--
Perl's maternal uncle
Mark Biggar
markb@agora.rain.com