[comp.ai.shells] KES Timing Result

srt@aero.org (Scott TCB Turner) (03/14/91)

I recently received an evaluation copy of KES from A&E software.  KES
is an expert system shell which includes the usual inference engine
and in addition a diagnosis system that tries to find best matches
between symptoms and disease descriptions.

The KES developer interface seemed a bit quirky at first, but I quickly
got accustomed to the style.  The interface is dual menu/editor, which
means the developer can abandon the menu system at any time in favor
of typing.  That's a big plus.  For any serious kb development, nobody
wants to be pulling down menus.

My main interest was speed.  I've been comparing a number of expert
systems.  The test consists of repeatedly firing an isolated rule.
This is hardly indicative of realistic knowledge bases, and is only 
intended to give a general feeling for the speed of the inference
engine.  I have neither the time nor energy to do more realistic tests,
but I encourage you to undertake the task if you are interested.

All test were done on a VAXstation 3100 running VMS.

	Shell				Speed (rule firings/minute)

	CLIPS				49,000
	KES				11,500
	ART-IM				5500
	Nexpert 1.1			4000
	G2				1700

From this test, KES would appear to be one of the most efficient
commercial expert system shells.  Given the comprehensive user
interface and substantial documentation, I think KES bears
consideration for any project, and especially for projects where 
processing speed is a consideration (i.e., real-time diagnosis).

					-- Scott Turner

sfp@mars.ornl.gov (Phil Spelt) (03/19/91)

In article <7642@uklirb.informatik.uni-kl.de> srt@aero.org (Scott TCB Turner) writes;
	>I recently received an evaluation copy of KES from A&E software.  KES


	    [Lots of stuff deleated . . . ]

	>I have neither the time nor energy to do more realistic tests,
	>but I encourage you to undertake the task if you are interested.
	>
	>All test were done on a VAXstation 3100 running VMS.

	>
	>	Shell				Speed (rule firings/minute)
	>
	>	CLIPS				49,000
	>	KES				11,500
	>	ART-IM				5500
	>	Nexpert 1.1			4000
	>	G2				1700
	>
	>From this test, KES would appear to be one of the most efficient
	>commercial expert system shells.  Given the comprehensive user
	>interface and substantial documentation, I think KES bears
	>consideration for any project, and especially for projects where 
	>processing speed is a consideration (i.e., real-time diagnosis).
	>
	>					-- Scott Turner

Yes, but LOOK at the difference between KES & CLIPS!!!  Bigger difference there
than anywhere else in the test set.  I can't understand why EVERYONE is
not using CLIPS.  With the source code available for self-modificatin (?)
it seems to me to be the best available.  (Of course I am TOTALLY unbiased).
I am, however, NOT associated with NASA or JCS in any way, just a happy
user of CLIPS.

Phil Spelt, Cognitive Systems & Human Factors Group  sfp@epm.ornl.gov
============================================================================
Any opinions expressed or implied are my own, IF I choose to own up to them.

tenhagen@grautvornix.informatik.rwth-aachen.de (Klaus ten Hagen) (03/21/91)

   My main interest was speed.  I've been comparing a number of expert
   systems.  The test consists of repeatedly firing an isolated rule.
   This is hardly indicative of realistic knowledge bases, and is only 
   intended to give a general feeling for the speed of the inference
   engine. 

Unfortunately the problem is that such an ``benchmark'' not even gives
``a general feeling'', since the speed determining parts of an
rulebased system are not tested by such a crude trial.

Explanation: 

A forward chaining rulebased system proceeds in
``recognize-act-cycles'' (RAC). Every RAC goes through the matching
phase, conflict set resolution and execution of the action part. It is
well-known that in more realistic examples around ~97% of the runtime
is spent in the matching phase. Thus the tricky implementation of the
matching phase determines the overall speed of a rulebased system and
this matching is rather trivial in the case of one or two objects
(sometimes called WME) and merely one rule with only one/two condition
elements.

--
Klaus ten Hagen                | RWTH Aachen
tenhagen@ert.rwth-aachen.de    | ERT -5240-
phone:+49 241 807632           | Templergraben 55
Fax:+49 241 807631             | D-5100 Aachen

srt@aero.org (Scott TCB Turner) (03/23/91)

(Klaus ten Hagen) writes:
>Unfortunately the problem is that such an ``benchmark'' not even gives
>``a general feeling'', since the speed determining parts of an
>rulebased system are not tested by such a crude trial.

Nonsense.  Repeated firing of a single rule tests simple conditions,
rule activation, and the internal representation of rules and data (to
the extent that compiled representations will be faster than
interpreted ones).  Simple tests are simple; that doesn't mean they're
worthless.

>...It is well-known that in more realistic examples around ~97% of
>the runtime is spent in the matching phase.

The most famous expert system ever, MYCIN, did little or no pattern
matching.  The telemetry-based diagnosis system I work on does no
matching, and I suspect that diagnosis systems in general do little
matching.  What they do is test symptom values.  Conversely, the
ART-IM people claim that large expert systems are dominated by the
time to select rules for activation; hence their concentration on an
efficient RETE algorithm.  (Although to some extent that subsumes the
matching problem.)  In the case of an embedded expert system that is
called repeatedly from scratch, execution time may well be dominated
by initialization costs.  To me it isn't at all clear that the right
metric for testing the speed of expert systems is the speed of the
matcher.  Perhaps you can share in more detail the results to which
your refer?

gowj@gatech.edu (James Gow) (03/23/91)

In article <7642@uklirb.informatik.uni-kl.de> srt@aero.org (Scott TCB Turner) writes:
>The KES developer interface seemed a bit quirky at first, but I quickly
>got accustomed to the style.  The interface is dual menu/editor, which
>means the developer can abandon the menu system at any time in favor
>of typing.  That's a big plus.  For any serious kb development, nobody
>wants to be pulling down menus.
>
>					-- Scott Turner

There is a diminution of kes called micro-ps that holds only 20 rules but
is good for an introduction to kes and knowledge engineering.
Micro-ps was distributed in a book by Nagy,gault and nagy 1985. For a low
price you can see how the kb is constructed and variables and attachments
handled.
james

kdw@sae.com (David Witten KES) (03/27/91)

In article <7668@uklirb.informatik.uni-kl.de> James Gow <uflorida!novavax!gowj@gatech.edu> writes:
>There is a diminution of kes called micro-ps that holds only 20 rules but
>is good for an introduction to kes and knowledge engineering.  ...

Micro-ps is cute, and it is nice to see it mentioned, but it in no way should be
construed to indicate the capabilities of KES as it currently stands.  Micro-ps
was crippled compared to the KES of its day which didn't even have classes or
forward chaining.  KES has been fundamentally improved continually since then.

-ttfn, David Witten	david@sae.com	  *All statements are solely my own*

acha@CS.CMU.EDU (Anurag Acharya) (04/04/91)

In article <7667@uklirb.informatik.uni-kl.de> srt@aero.org (Scott TCB Turner) writes:
   (Klaus ten Hagen) writes:
   >Unfortunately the problem is that such an ``benchmark'' not even gives
   >``a general feeling'', since the speed determining parts of an
   >rulebased system are not tested by such a crude trial.

   Nonsense.  Repeated firing of a single rule tests simple conditions,
   rule activation, and the internal representation of rules and data (to
   the extent that compiled representations will be faster than
   interpreted ones).  Simple tests are simple; that doesn't mean they're
   worthless.

Simple tests like repeated firing of a single trivial rule provide little
or no information that might help predict the performance of realistic
programs. Furthermore, "simple" rule of one production system language
may not all that "simple" in another. Such undisciplined benchmarking 
attempts yield data of zilch utility.

example1:

take languages that do not provide pattern matching capabilities:

a typical repeatedly firing production in such languages might be

int foo = 10000;

(p 
  (foo > 0)
  -->
  (replace foo by (foo - 1)))

this is nothing more than a syntactically sugared version of the following
 C while loop

while (foo) foo--;

compare this with :

example 2:

(p 
  (foo ^value <v> > 0)
  -->
  (modify 1 ^value (compute <v> - 1)))

which needs to exercise far more complex capabilities. therefore the 
comparison is grossly in unfair to languages that are more
expressive in terms of the conditions that they can match.

i have two big gripes with the benchmarking results based on repeated firing 
of a single rule that are posted in this group from time to time.

1. productions used in these benchmarks do not contain multiple conditions
   and therefore do not perform consistency checks across conditions.
   most useful productions need this capability. very few productions
   are as widely applicable as to make do with only one condition.

2. benchmarks that compare languages with different expressive power must 
   exercise equal capabilities if they are to be fair or useful.
   otherwise, the data gathered is not worth the cpuseconds spent gathering it.

anurag

srt@aero.org (Scott TCB Turner) (04/06/91)

Like some previous posters, Anurag Acharya recently wrote in this
newsgroup that he doesn't think much of very simple benchmarks of
inferences engines.  He pointed out the importance of testing the
performance of rules with multiple conditions.  Other posters have
suggested other important features: pattern matching, kb lookup, and
so on.

All these are valid points.  You can't understand the performance
characteristics of an inference engine without knowing these things.
Unfortunately, I don't have the time to develop a realistic benchmark
suite and apply it to a variety of inference engines, so I've had to
make do with very simple benchmarks.  So far, the benchmarks have
corresponded fairly well with "real-life" performance of the engines
I've looked at, so perhaps the simple benchmark is more useful than it
appears.  And judging from the comments I've received, others have
found these benchmarks useful, and I think most of the readers of this
newsgroup are knowledgeable enough to recognize these benchmarks for
what they are.  I don't really need anyone else to tell me how simple
and useless these benchmarks are.

In the meantime, there's a big demand out there for thorough
benchmarking of expert system tools.  I've had several requests for an
article along those lines, and I've had companies soliciting me for
evaluations.  I encourage anyone who is interested to begin work on
developing a benchmarking suite, and to share your progress with this
group.  I'm sure you'll get plenty of suggestions of what's important
to test :-).  Once you've made some progress, I can put you in touch
with some people who'd be interested in an article on your results.

					-- Scott Turner