[mod.ai] Reimplementing in C

marick%ccvaxa@GSWD-VMS.ARPA (Brian Marick) (07/28/86)

I've been hearing and seeing something for the past couple years,
something that seems to be becoming a folk theorem.  The theorem goes
like this:

	Many expert systems are being reimplemented in C.

	If even the expert system companies are abandoning
"special-purpose AI languages" like Lisp and Prolog, surely nobody
else - other than academics and semi-academics - will use them.


I'm curious what the facts are.  Which companies are reimplementing in
C (or other languages).  Why?  And what (roughly) does "reimplementing
in C" mean?  What languages are used for development of new products?
What will happen in the future?  Which companies are not reimplementing?
Why not?

(I'm concentrating on these particular companies because they're what the
"theorizers" concentrate on.  Comments from others welcome.)


Brian Marick, Wombat Consort
Gould Computer Systems -- Urbana && University of Illinois
...ihnp4!uiucdcs!ccvaxa!marick
ARPA:  Marick@GSWD-VMS

jc@cdx39.UUCP (08/20/86)

> I've been hearing and seeing something for the past couple years,
> something that seems to be becoming a folk theorem.  The theorem goes
> like this:
> 	Many expert systems are being reimplemented in C.
> I'm curious what the facts are.  

  [I program in C, and have reached the conclusion that most AI
  programming could be done in that language as easily as in LISP
  if libraries of list-oriented subroutines were available.  (They
  needn't be consed lists -- I use dynamically allocated arrays.)
  You do have to worry about storage deallocation, but that buys you
  considerable run-time efficiency.  You also lose the powerful
  LISP debugging environment, so fill your code with lots of
  argument checks and ASSERTs.  Tail recursion isn't optimized,
  so C code should use iteration rather than recursion for most
  array-based list traversals.  Data-driven and object-oriented
  coding are easy enough, but you can't easily build run-time
  "active objects" (i.e., procedures to be applied to message
  arguments); compiled subroutines have to do the work, and dynamic
  linking is not generally worth the effort.  I haven't tried much
  parsing or hierarchy traversal, but programs such as LEX, YACC,
  and MAKE show that it can be done.  -- KIL]


Well, now, I don't know about re-implementing in C, but I myself
have been doing a fair amount of what might be called "expert
systems" programming in C, and pretty much out of necessity.

This is because I've been working in the up-and-coming world
of networks and "intelligent" communication devices.  These
show much promise for the future; unfortunately they also
add a very "interesting" aspect to the job of an application
(much less a system) programmer.  

The basic problem is that such comm devices act like black
boxes with a very large number of internal states; the states
aren't completely documented; those that are documented are
invariably misunderstood by anyone but the people who built
the boxes; and worst of all, there is usually no reliable 
way to get the box into a known initial state.

As a result, there is usually no way to write a simple,
straightforward routine to deal with such gadgets.  Rather,
you are forced to write code that tries to determine 1)
what states a given box can have; 2) what state it appears
to be in now; and 3) what sort of command will get it from
state X to state Y.  The debugging process involves noting
unusual responses of the box to a command, discussing the
"new" behavior with the experts (the designers if they are
available, or others with experience with the box), and
adding new cases to your code to handle the behavior when
it shows up again.  

One of the simplest examples is an "intelligent ACU", which
we used to call a "dial-out modem".  These now contain their
own processor, plus sufficiently much ROM and RAM to amount
to small computer systems of their own.  Where such boxes
used to have little more than a status line to indicate the
state of a line (connected/disconnected), they now have an
impressive repertoire of commands, with a truly astonishing
list of responses, most of which you hope never to see.  But
your code will indeed see them.  When your code first talks
to the ACU, the responses may include any of:
	1. Nothing at all.
	2. Echo of the prompt.
	3. Command prompt (different for each ACU).
	4. Diagnostic (any of a large set).
Or the ACU may have been in a "connected" state, in which
case your message will be transmitted down the line, to be
interpreted by whatever the ACU was connected to by the most
recent user.  (This recursive case is really fun!:-)

The last point is crucial:  In many cases, you don't know
who is responding to your message.  You are dealing with 
chains of boxes, each of which may respond to your message
and/or pass it on to the next box.  Each box has a different
behaviour repertoire, and even worse, each has a different
syntax.  Furthermore, at any time, for whatever reason
(such as power glitches or commands from other sources),
any box may reset its internal state to any other state.
You can be talking to the 3rd box in a chain, and suddenly
the 2nd breaks in and responds to a message not intended
for it.

The best way of handling such complexity is via an explicit 
state table that says what was last sent down the line, what
the response was, what sort of box we seem to be talking to,
and what its internal state seems to be.  The code to use such 
info to elicit a desired behavior rapidly develops into a real 
piece of "expert-systems" code.

So far, there's no real need for C; this is all well within the
powers of Lisp or Smalltalk or Prolog.  So why C?  Well, when
you're writing comm code, you have one extra goodie.  It's very
important that you have precise control over every bit of every
character.  The higher-level languages always seen to want to
"help" by tokenizing the input and putting the output into some
sort of standard format.  This is unacceptable.  

For instance, the messages transmitted often don't have any 
well-defined terminators.  Or, rather, each box has its own
terminator(s), but you don't know beforehand which box will
respond to a given message.  They often require nulls.  It's 
often very important whether you use CR or LF (or both, in
a particular order).  And you have to timeout various inputs,
else your code just hangs forever.  Such things are very awkward,
if not impossible to express in the typical AI languages. 

This isn't to say that C is the world's best AI language; quite
the contrary.  I'd love to get a chance to work on a better one.
(Hint, hint....)  But given the languages available, it seems
to be the best of a bad lot, so I use it.

If you think doing it in C is weird, just wait 'til 
you see it in Ada....

Denber.wbst@XEROX.COM (09/16/86)

	"Such things are very awkward, if not impossible to express in the
typical AI languages"

Well, maybe I've been using an atypical AI language, but Interlisp-D has
all that stuff - byte I/O, streams, timers, whatever.  It's real e-z to
use.  Check it out.

			- Michel