Checksum Algorithms

RADAI1@HBUNOS.BITNET (Y. Radai) (01/04/90)

  In V2 #258 Ralph Merkle referred to previous discussions in this
group on authentication algorithms, and suggested his own one-way
hash function Snefru which has the advantage of not requiring secret
information and which has better performance than a DES-based system.
It's my understanding that the advantages of Snefru are its much
faster software implementation and possibility of larger key size than
DES, and several ways of choosing between greater security and greater
speed.
  So Ralph Merkle recommends Snefru and Bob Bosen recommends the DES-
based X9.9 (he also mentions ISO 8731-2).  And going outside of this
forum we find that Fred Cohen recommends an RSA-based algorithm, Ro-
bert Jueneman recommends his Quadratic Congruential MDC, Prof. Rabin
recommends the CRC algorithm using a randomly selected irreducible
polynomial, and so forth.  Interestingly, none of these authors seems
to give any argumentation why his algorithm is any better than the
others (except for Merkle's comparison of Snefru with DES).

  Bob Bosen says in Issue 265 that a sophisticated authentication al-
gorithm is clearly better than some programmer's guess at an authenti-
cation algorithm (he now adds the condition that both implementing
programs are well-written and convenient and "apply the algorithm
across all program code without exception").
  I can't say that this is incorrect, but even with this addition, I
still don't share his emphasis on sophistication of the algorithm.
The added sophistication of (say) X9.9, compared to CRC with a person-
ally chosen generator, may be well worth the added time in the case of
authentication of bank transfers and other conventional applications.
But have you considered the possibility, Bob, that this sophistication
may be wasted when dealing with viruses?
  If we were to try to draw a graph showing sophistication on the X-
axis and security on the Y-axis, I think everyone would agree that the
curve "levels off", i.e. there is a certain point, beyond which making
the algorithm more sophisticated has negligible value in increasing
security.  The difference between the positions of Bob, Ross Green-
berg, and myself is partly where we think the curve levels off.  Bob's
position is that this happens only at a late stage of sophistication,
while Ross seems to think it happens at a very early stage.  My opin-
ion is that for *anti-viral purposes* this point is reached much ear-
lier than for more ordinary uses of authentication algorithms, though
not quite as early as Ross believes.
  The reason I think the anti-viral situation is special is that in
this environment there are considerations which probably have no coun-
terpart in ordinary message authentication.  For example:
  (1) A virus must perform all its work, including circumvention of
anti-viral protection (e.g. forging of checksums) if that's what its
author desires, on the computer which it's attacking and in a very
short time (short enough so that it will not be noticed by the user).
  (2) By its very nature, a virus is designed to attack a large per-
centage of the users of a particular system.  From the point of view
of the virus writer, any technique which results in circumventing the
anti-viral protection of only a single user (or a very few users) is
not worthwhile, since if that were what he desired, he could achieve
the same thing much more easily by means of an immediate-acting Tro-
jan, and against that *no* alteration-detecting program, no matter how
sophisticated, can warn us in time.
  (3) Ordinarily, a virus writer is not in a position to know what
checksum algorithms may be in use on the computers on which his virus
is unleashed.  And any technique for forging the checksums of one al-
gorithm will probably not work against any other checksum algorithm.
Therefore a checksum-forging technique would increase the percentage
of disks infected by only a very small amount, making it a waste of
time and effort from his point of view.  (An exception would occur if
the virus author is satisfied with attacking an environment which is
sufficiently limited so that most computers in it use the same check-
sum program.)

  Taking these conditions, mainly (1), into account, it seems to me
that a checksum algorithm, expressed as a function F on files, will
provide sufficient security for anti-viral purposes if the following
two conditions are satisfied:
  (A) For any given file f, F(f) is (in general) different for each
computer.  (This condition amounts essentially to requiring a key
unique to each computer.)
  (B) Given a file f, it is computationally infeasible (without know-
ledge of the key) to construct a file f' from it containing the de-
sired addition of (or replacement of ordinary code by) viral code such
that F(f') = F(f).
  In making this claim, it is assumed that the checksum base, key and
program are kept offline (i.e. not located on an accessible disk or in
memory while a virus may be active), thus preventing discovery or com-
putation of the user's key or circumvention of the checksum program.

  If the above opinion is correct, then I think that all the algori-
thms mentioned in the second paragraph of this posting are adequate
for anti-viral purposes, including the relatively unsophisticated CRC
algorithm if the generator is selected randomly or personally, and
hence the added sophistication of algorithms such as X9.9 is wasted in
view of the extra run time which they require.
  Concerning speed, it can be argued that we shouldn't exclude slower
algorithms, since the more algorithms which are in use, the less
worthwhile it will be for anyone to try checksum-forging techniques.
On the other hand, if one algorithm has a clear superiority over the
others in speed, users are hardly likely to choose a slower algorithm
because of such an argument.  And when speed is the issue it seems to
me that CRC, because of its greater simplicity, is better than any of
the more sophisticated algorithms satisfying conditions (A) and (B).

  It's possible, of course, that I'm wrong on one or more opinions ex-
pressed above:  Perhaps Dr. Merkle or some other crypto-expert can
show that conditions (A) and (B) are not sufficient even in the viral
environment expressed by (1)-(3) above, so that CRC, even with a ran-
dom generator, is insecure.  Or that I'm wrong in assuming that such
a CRC satisfies these conditions.  Or perhaps some other algorithm is
faster than CRC and no less secure.  But what is needed is evidence
and argumentation relative to the *viral environment*, and this I
have not yet seen in Bob's writings.

  I now come to Ross Greenberg's posting in Issue 266.  Now I fully
agree with him that sophisticated checkers may go unused because of
the time required.  But Ross implies that users will always prefer a
"good enough" fast checker like that of FluShot+ over a slow sophisti-
cated one.  But can we be so sure that FluShot+ is really good enough?
How many of its users have the slightest idea how its security com-
pares with that of other programs?  I don't know whether his algorithm
satisfies condition (B) above, but it certainly does not satisfy (A),
i.e. for any given file all users will get the same checksum, and
that's a potential security hole, at least in the "limited environment"
situation mentioned at the end of (3) above.  But since this hole can
be plugged very simply and at no cost in speed, why not do so, Ross?

  One final remark.  Bob Bosen writes:
>In his mailing of Dec 07 '89, Y. Radai seems to be taking the position
>that since I am in favor of sophisticated authentication algorithms, I
>must be against sophisticated program implementations. Nothing could
>be further from the truth.

  Such a position may indeed be far from the truth.  But so is the no-
tion that I was taking such a position!  I was not assuming that you
were *against* sophisticated program implementations; I was merely
concerned by your not having *mentioned* the need for very careful im-
plementation in a given system.  That's a big difference.

                                     Y. Radai
                                     Hebrew Univ. of Jerusalem, Israel
                                     RADAI1@HBUNOS.BITNET

rsa@lll-crg.llnl.gov (RSA Data Security) (01/06/90)

In response to Y. Radai's post:

To protect against viruses, the best protection can be obtained by
using a fast hashing algorithm together with an assymetric
cryptosystem (like RSA).  This is also by far the most cost-effective
(based on compute-time) approach.

A good "message digest" should be a one-way function: it should be
impossible to invert the digest and it should be computationally
infeasible to find two useful messages producing the same digest in
any reasonable amount of time.  The algorithm must read every bit of
the message.  Therefore, the best one is the fastest one deemed to be
secure.  This should not be left to individual users to develop as
Jeunemann and Coppersmith, among others, have shown that this is not
a trivial undertaking.  Let's use Snefru and MD2 (Internet RFC 1113)
as examples of good ones.

The digest attached to a program or message should then be encrypted
with the private half of a public-key pair.  What is the
computational cost of enhancing this process with public-key?

Since RSA can be securely used with small public-key exponents such
as 3 (see Internet RFC's 1113-1115 and/or CCITT X.509) a small number
of multiplies is required to perform a public-key operation such as
*signature verification*, where one decrypts an encrypted digest with
the public key of the sender (and then compares it to a freshly
computed digest).  Therefore, the "added" computational cost of using
RSA on an AT-type machine is approximately 80 milliseconds (raising a
512-bit number to 3 mod a 512 bit number) REGARDLESS of the size of
the file being verified (the digest is fixed, and less than 512 bits,
requiring one block exponentiation).  Of course the 80 ms gets
smaller on faster machines like Suns.  I think anyone would agree
that that is a fair tradeoff for signer identity verification.  Since
one "signs" files only once, this "cost" is irrelevant.  The cost of
verifying, over and over, is what is important.

So what do you get for your milliseconds? You always know the source
of the digest (and you get non-repudiation, providing an incentive to
signers to make sure programs are clean before signing them).  No one
can change a program and recalculate the digest to spoof you.  If
schemes like this became widespread, the lack of signer
identification would be a hole people would quickly exploit.  You
also get a secure way to *distribute* software over networks.  Pretty
hard to do if everyone "does their own thing".  The Internet RFC's,
if widely adopted, provide a perfect mechanism for this.

Regardless of the hashing algorithm employed, there are powerful
benefits available if RSA is used with it.  And the computational
cost is negligible.

It may be true that simpler methods are adequate for some people.
That determination requires a risk analysis, and people will make
their own decisions.  It has been shown, however, that if a system
can be defeated, it will be.  Certainly secure software distribution
requires something more than an unprotected hash, since keys will
presumably not be shared.  This is where public-key has the most
value.

Using X9.9 is OK if (1) you trust DES (2) can live with its speed,
and (3) don't need to distribute trusted software in a large network.
X9.9 key management becomes a serious problem in a network like the
Internet.  It does have the advantage of being a standard, but it was
developed for a relatively small community of wholesale banks, not
large networks.  Note aboput standards: RSA was named as a supported
algorithm in the NIST/OSI Implementor's Workshop Agreements (for
strong authentication, in the Directory SIG) of December 1989.

Jim Bidzos
RSA Data Security, Inc.

dunc@sun.com (duncs home) (01/10/90)

In article <0008.9001081228.AA09399@ge.sei.cmu.edu> you write:
>In response to Y. Radai's post:
>
>To protect against viruses, the best protection can be obtained by
>using a fast hashing algorithm together with an assymetric
>cryptosystem (like RSA).  This is also by far the most cost-effective
>(based on compute-time) approach...

With this scheme, what prevents a clever nasty from simply patching
the code doing the comparison to always return an all clear?  Also,
while the non- repudiation property seems to provide accountability,
it seems likely to be illusory.  Does the signer of the program really
know what's being signed or was it generated by some other program of
uncertain honesty?
				--Dunc