[comp.virus] Re universal detectors.

USERGSVE@LNCC.BITNET (GEORGE SVETLICHNY) (02/10/90)

Enough has been said on this list to convince most people that a
universal virus detector (UVD) is impossible, and I've given my own two
cent's worth in favor of this position. If by "virus" we mean to
include a whole range of "nasty" programs, then one also runs into
problems of correctness of code and bugs, and gets tangled up in
questions of human intent, which makes the discussion, though still
mathematically viable, (given sufficient abstraction) rather trivial in
the end. The answer is that no UVD exists.

Even defining viruses as some types (not all) of "self-duplicating
codes" and leaving trojans and other nasties aside will not improve the
situation much, as one still faces quite general undecidability
theorems. As I have tried to point out, almost any question about a
program's performance is undecidable.

In Virus-L v3, issue20 russ@Alliant.COM (Russell McFatter) argues that
one should not try to determine what a program *will* do but what it
*might* do and just not run those that might infect others. In
principle this sounds like a good idea, since a-priori such a
rephrasing of the problem could make it decidable. What makes it
totally unviable is the present hardware (at least on PC-type machines,
I'm not that familiar with Macs) as I tried to point out in Virus-L v3
issue22.

Now there just _*MAY*_ (note the triple enphasis) be a theorem of the
following type:

        I. Given such-and-such memory and microprocessor architecture,
        then any program containing a virus (however that is defined)
        will necessarily have a certain patern P in the object code.
        II. Any program that does not contain a virus can be written in
        a way that does not lead to pattern P in the object code.


This would not conflict with the undecidability theorem since the
presence of pattern P is not a UUV as having the pattern does not mean
that the program *is* a virus, only that it *might be*. And part II
allows any honest program to be written and not be dubbed dangerous.

I'm actually mixing mathematics and technology in the above. The purely
mathematical conjecture would be (having defined "virus" in some
appropriate useful way): There is a decidable set W such that 1) all
programs containing viruses are in W and 2) all programs that do not
contain viruses *have an equivalent* (identical input-output behaviour)
that is not in W. Program equivalence is undecidable so this does not
contradict the undecidability of virus detection. The technological
part would consiuAof expressing W in some convenient way through
hardware architecture.

Is this plausible? Frankly, it doesn't sound so to me, but I wouldn't
discard it off hand. It's the only hope I see of some general
mathematical result being useful for anti-viral strategies, so maybe we
should look at it more closely.


 ----------------------------------------------------------------------
 George Svetlichny                 | When it is not your mother who is
 Department of Mathematics         | in danger of being eaten by a
 Pontificia Universidade Catolica  | wild animal, the matter can wait
 Rio de Janeiro, Brasil            | until the morrow.
                                   |
 usergsve@lncc.bitnet Fido 4:4/998 |      Baganda Proverb
 ----------------------------------------------------------------------

lambert@cwi.nl (Lambert Meertens) (02/14/90)

USERGSVE@LNCC.BITNET (GEORGE SVETLICHNY) writes:
) I'm actually mixing mathematics and technology in the above. The purely
) mathematical conjecture would be (having defined "virus" in some
) appropriate useful way): There is a decidable set W such that 1) all
) programs containing viruses are in W and 2) all programs that do not
) contain viruses *have an equivalent* (identical input-output behaviour)
) that is not in W. Program equivalence is undecidable so this does not
) contradict the undecidability of virus detection. The technological
) part would consiuAof expressing W in some convenient way through
) hardware architecture.
)
) Is this plausible? Frankly, it doesn't sound so to me, but I wouldn't
) discard it off hand. It's the only hope I see of some general
) mathematical result being useful for anti-viral strategies, so maybe we
) should look at it more closely.

To make this amenable to mathematical analysis, we need a precise
definition of "virus", of course, but I think that even without such a
definition there is an argument that makes it plausible that this *is*
in fact mathematically possible.

The argument assumes that we have at least a decidable after-the-fact test
that determines whether a program displayed, in its actual behaviour during
a given run, undesirable (virus-like) behaviour (whether by a bug or by
malicious intent of its author).

(N.B.  I don't know if we can give a useful formal definition of what
constitutes undesirable behaviour, but if we cannot capture this notion,
then any hope of creating a universal virus detector is vain.)

Moreover, it assumes (typical mathematical assumption) that we have
unlimited storage.

Given a program P, we can create another program P' which works as
follows (sketch only):

1.  Make a copy of the store.

2.  Operate like P, without touching the copy.

3.  On termination of P, determine whether undesirable behaviour occurred.

4.  If so, restore the store from the copy and flash a red light;
    otherwise, erase the copy and display a green light.
    (There could be an option to have a bell rung instead of the red
    light:-)

(In case the program does not terminate, we can press an abort button
whereupon the store is restored as well.)

The point is that there is a computable function F such that F(P) = P' with
the above characteristics, such that the range of F is a recursive set.
(The argument that F is a recursive function is simply that it is obvious
how to write a program for it.  That the range is recursive follows from
the fact that we can easily construct F such that larger input -- in some
suitable ordering -- gives rise to larger output, so to test if Y is in the
range of F we can enumerate the domain of F until we find an X such that
|F(X)| >= |Y|.)  Take the set W to be the complement of the range of F.
All benign programs have an equivalent program in W-complement, since F
does not change their input-output behaviour, and all programs in W-
complement are by their construction harmless.

I do not see how this theoretical construction would have much relevance,
but it is amusing to realise that the gist of it is to make a full backup
before you run a non-trusted program, which is indeed a good thing.  Now if
only we could timely detect that infection did occur, then it would be easy
to restore and stop the spreading.

- --Lambert Meertens, CWI, Amsterdam; lambert@cwi.nl