[comp.virus] Naming/Identifying new viruses

PHYS169@csc.canterbury.ac.nz (Mark Aitchison, U of Canty; Physics) (02/21/91)

There is a continual problem with people finding what seems to be a
new virus, but not wishing to broadcast the whole (dangerous) boot
sector to identify it.  I have a solution to the identification
problem:

I have devised a "hashcode" algorithm specially designed for boot
sectors, that gives a reasonably short (12-character) code (of A-Z,
0-9, '#' and '.')  that pretty well uniquely identifies a boot sector,
is very difficult for virus writers to get around, and is useful in
its own right (i.e. you can look at the code and get a reasonable idea
of what the boot sector is like).

I can let anyone have the source to this (knowing the source doesn't
help virus writers), and I'm happy to make it public domain - in fact
I hope many people adopt the same standard encoding system.  For that
reason, I suggest some discussion of the format and method before it
is used in a serious way.

Briefly, what I have done is to generate code which is...
(1) able to be passed through e-mail systems, etc without distortion
(2) cannot be used to recreate a live virus
(3) is a valid DOS filename, and short enough to say over the telephone easily
(4) always starts with the same character, "#", so people can immediately
    recognise it as a hashcode
(5) has a built-in check against typos (including transposition errors), and
    avoids case distinction or confusing characters (like "/" and "\")
(6) is reasonably easy to calculate
(7) generates the same code on all systems (e.g. no floating point arithmetic
    subject to round-off error or different formats on different systems)
(8) includes four (and a half) bytes of high-order polynomial checksum, making
    it difficult for virus writers to give a bad boot sector the same code as
    a good one. (It would involve very lengthy trial-and-error methods)
(9) The last bytes include bit flags, indicating the presence of dubious code
    of various types, and the absence of important features (such as a reboot),
    making it useful in itself, and making it even harder for virus writers to
    circumvent!
(10)The size of messages and null bytes and code are also taken into account,
    since more sneaky viruses will need more code than a good boot sector, so
    encrypted boot sector viruses would have a tough time getting past!!
(11)DOS 4 diskettes (with serial numbers) get the same hashcode, irrespective
    of serial number (except in a small number of cases, where the serial numbe
r
    happens to contain forbidden instructions).
(12)Minor variations of the same virus get similar hashcodes (the last 3 bytes
    and first 3 bytes should be the same or close).

The code is not...
A sure-fire way of indicating the presence of a virus. You could
simply look at the last byte of the code, and if it isn't '0' than it
is probably a virus. Not a great check, but old viruses (including
Stoned) are easy to spot that way. Or you could have a list of known
good and bad boot sectors, and ring alarms when it isn't a good disk.
But that isn't really the aim. It is intended to identify boot
sectors, so somebody can say "I know that disk"... whether you are
describing the disk over the net or over the phone.

I can send the program, BOOTID.PAS to anyone interested via e-mail;
hopefully it, and it's big brother (CHECKOUT.EXE) will soon be
available via anonymous ftp.

Mark Aitchison, Physics, University of Canterbury, New Zealand.