[comp.virus] File format for virus signatures

FTHSMULD%rulgl.LeidenUniv.nl@CUNYVM.CUNY.EDU (Jeroen W. Pluimers) (03/08/91)

Dear readers,

A few digests agi, there was a question about standard formats of
data files for virus signatures. VIRSCAN and TBSCAN/TBSCANX use
the format below.

It has been copied from the documentation that was with TBSCANX v 2.1.
The format may be spread freely and is fully public domain.

Jeroen W. Pluimers - Gorlaeus labs, Leiden University


- -=-=-=-=-=-=-=-=- VIRSCAN.DAT / TBSCAN.DAT format -=-=-=-=-=-=-=-=-=-


FORMAT OF THE DATA FILE
- -----------------------

    The data file (called TBSCAN.DAT or VIRSCAN.DAT) can be read  and/or
    modified with every ASCII editor.

    All  lines  beginning  with  ";"  are comment lines. TbScanX ignores
    these lines  completely. When  the ";"  character is  followed by  a
    percent-sign the  remaining part  of the  line will  be displayed on
    the screen.  A maximum  of 15  lines can  be printed  on the screen.
    Nice for "HOT NEWS"...

    In the first line the name  of a virus is expected. The  second line
    contains one or more of the next words:
                        BOOT SYS EXE COM HIGH LOW

    These words may be separated by spaces, tabs or commas.

    TbScanX will  only scan  for viruses  with the  keywords COM or EXE.
    The  other  keywords  will  be  ignored,  and  are  only used by the
    non-resident  version:  TBSCAN.  Also  note  that  TbScanX  will not
    distinguish between COM and EXE files. All executable files will  be
    scanned for both EXE and COM viruses. This saves some memory.

    BOOT means that the  virus is a bootsector  virus. SYS, EXE and  COM
    indicate the virus  can occur in  files with these  extensions. Also
    overlay files  (with the  extension OV?)  will be  searched for  EXE
    viruses. HIGH shows that the virus  can occur in the memory of  your
    PC, namely in  the memory located  above the TBSCAN  program itself.
    LOW means that the virus can occur in the memory of your PC,  namely
    in the memory located under the TBSCAN program itself.

    In the  third line  the signature  is expected  in ASCII-HEX.  Every
    virus character is  described by means  of two characters.   Instead
    of two HEX characters, two question marks (the wild- card) may  also
    occur. The  latter means  that every  byte on  that position matches
    the  signature.  Below  you  will  find  an  example of a signature:
            A5E623CB??CD21??83FF3E

    You can also use the asterisk followed by an ASCII-HEX character  to
    skip a  variable amount  of bytes  in the  signature. The  ASCII-HEX
    character specifies the amount of bytes that should be skipped.  The
    signature could be:
            A5E623CB*3CD2155??83FF3E
    The next sequence of bytes will be recognised as a virus:
            A5E623CB142434CD21554583FF3E


    Instead of a  signature in ASCII-HEX  you can also  specify a normal
    text. This should be put  between double quotation marks. A  correct
    signature is for example:
            "I have got you!"

    This  series  of  three  lines  should  be repeated for every virus.
    Between all lines comment lines may occur.

mrs@netcom.COM (Morgan Schweers) (03/11/91)

Greetings,
    Hmmm...  I'll point out that the VIRSCAN/TBSCAN file format is
similar enough to the ViruScan external data file that a conversion
utility SHOULD be relatively trivial.

    For reference, our strings are one line/one virus, no 'BOOT' or
'COM', etc. seperators.  The string format is similar, but rather than
have a single hex-digit after the '*' you put a number in parentheses.
(I.E.  "01020304 *(4) 050607?090a" <virus name> )

    The '?' wildcard ignores that hex-byte, the '*' will detect the
next byte if it is within (x) bytes.

    Now for another 'flame' from me...  "Unreadable/non-clear update
scan strings."  This makes it difficult for a user to add their own
strings.  These products might as well not have user-updatability, in
effect.  Unless the user has access to documentation on creating a
virus 'string' through that particular utility, they can't expand it.

    I've got an open mind on this subject, however.  (Not so open that
my brain falls out, but anyhow...)  If someone who uses this method
can explain the rationale to me, I'll respond.  I can think of two
products which do this, and MAYBE a third.
                                                     --  Morgan Schweers

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  I *AM* mrs@netcom.com, and ms@albert.ai.mit.edu.  I'd prefer you use |
|  the netcom.com address, since MIT is now a WEE bit further away from |
|  me than I like calling...  <Grin>  In any case, I don't represent my |
|  employers.  They don't listen to what I say, and I return the        |
|  compliment whenever possible.  <Grin>                                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+