[comp.parallel] Research for RETE-based systems on Intel hypercube

carrow@itd.nrl.navy.mil (Steven Carrow) (01/02/91)

	I am currently doing research on implementing a Rete-based system on
a hypercube architecture. I have done some searching of the literature here
at Naval Research Laboratory, but I have not found much beyond the 1988 AAAI
article written by Gupta and Tambe. I was wondering if anyone knew of further
research in or implementations of Rete-based systems on hypercubes. I am also
interested in exploring possible applications of vectorization techniques in 
this field; any info on this would be appreciated, as my search here came up
dry. Pointers on where else to look will be appreciated as well, as my
research skills are rusty. Although I used the NRL library, I do not know that
I used it well. My US Mail address is:

	Steve Carrow
	NRL, Code 5570
	4555 Overlook Ave., SW
	Washington, DC 20735-5000

Thank you for your time.
*****************************************************************************
* Steven A Carrow               *      ARPA: carrow@itd.nrl.navy.mil        *
* Code 5570                     *      UUCP: {backbone}!uunet!              *
* Naval Research Lab            *            nrl-css!carrow (I think)       *
* Washington, DC 20375          *      Disclaimer: I can barely speak for   *
* (202) 767-9125                *      myself, much less the government.    *
*****************************************************************************

moskowit@paul.rutgers.edu (Len Moskowitz) (01/04/91)

Though they're not working specifically on the Hypercube, Sal Stolfo
at Columbia University and Anurag Acharya at CMU have been doing much
of the work on parallelizing OPS5-type languages.  

We at Bendix have CMU's CParaOPS5 running here on our Suns and with a
little luck it'll soon be running on our new 8 node Tadpole 88000 RISC
multi-processor.  CParaOPS5 uses a truly parallel Rete.

We've spoken recently with Stolfo and his group is doing very
interesting work on changing OPS5's semantics to allow truly parallel
programming.  Might pay to contact him.

I'd like to hear what you're doing.  Have you looked into other
approaches than parallel Rete, such as rule level parallelism or
multiple independent rule systems?

========================================================================
Len Moskowitz Allied-Signal Aerospace Bendix Test Systems Division
Teterboro, NJ 07608
moskowitz@bendix.com moskowitz%TSD@atc.bendix.com moskowit@paul.rutgers.edu

schmolze%cs.tufts.edu@RELAY.CS.NET (01/10/91)

I am not on the parallel bulletin board, but learned of several messages
concerning the above subject, in which I have some research results.  Would
you please post the message below?  I think it will be of interest to those
involved in the ongoing discussion.  I'm not ready to join this bulletin
board just yet, so I've included a note to respond directly to me.  Thanks --
Jim
			-----
I am not a member of this bulletin board, but have seen some messages from it
and wanted to respond to requests for information.  There is much research on
parallelizing OPS5-like languages for distributed-memory machines that is
ongoing, and the messages I've seen regarding this topic did not mention all
of it.  So, here is some more of it, which includes work of mine.

1. My own work has focused on the PARS system, which runs OPS5 on a hypercube
   (by NCUBE) where: 
 - the WM and PM is distributed over the processors, 
 - each processor runs its own basic loop (MATCH-SELECT-ACT),
 - the processors run asynchronously for the most part,
 - actions that affect the WM or PM stored in other processors are forwarded
   as needed,
 - certain communications are performed during the basic loop so as to
   guarantee: 
    * serializability -- the resulting overall WM is one that could have been
      produced on a serial machine running OPS5 from the original WM and PM.
      This is my correctness criterion.  Note that PARS does not guarantee
      any particular conflict resolution strategy.
    * no errors due to temporary WM inconsistencies -- while actions are
      being communicated between processors, temporary WM inconsistencies can
      occur which can lead to non-serializable results.  These problems are
      prevented. 
    * no deadlocks -- in order to guarantee serializability, certain rules
      must be prevented from co-executing at the same time.  I perform this
      in a way that prevents deadlocks.

The theoretical foundation of PARS is described in:
  "A Parallel Asynchronous Distributed Production System", by Schmolze and
  Goel, Proceedings of the Eighth National Conference on Artificial
  Intelligence (AAAI-90), Boston, MA, July 1990.

Not all of the proofs are not included in that paper, but the missing ones
are nearly identical to the proofs appearing in the following paper that
builds upon earlier work of Ishida and Stolfo and runs multiple rules in
parallel in a synchronous fashion for shared-memory machines.
  "Guaranteeing Serializable Results in Synchronous Parallel Production
  Systems", Technical Report 89-5, Dept. of Computer Science, Tufts
  University, October 1989.

(For those interested in shared memory approaches with multiple rule firings,
I am in the midst of collecting benchmark measurements for a variety of
algorithms.  This work should be completed by February 1.)

The loss of a conflict resolution strategy in PARS (and pretty much all
similar works) is unfortunate but necessary.  To make up for some if its
impact, in PARS I have extended OPS5 slightly to include a special provision
for forming rule groups.  Basically, only the rules in the "current" group
can be executed.  This addresses the main problem that the loss of a conflict
strategy causes.

The PARS system is itself in the midst of being completed.  It will run on
the NCUBE hypercube and is based on the C version of serial OPS5 written by
Miranker et al at U. Texas, which is called OPS5C.  The only dependence on
NCUBE (versus, say, Intel) is the manner of starting it up and of sending
messages, thus it could be ported fairly easily.

2. Work that is similar to my PARS work but done independently is that of
Ishida and Yokoo (of NTT Labs in Japan) and Gasser (of USC).  Their system
also addresses the problem of running OPS5 on a distributed memory machine in
an asynchronous fashion.  The resulting system is very similar to PARS except
that: 
 - it does not prevent deadlocks nor problems arising from temporary WM
   inconsistencies, 
 - it offers a dynamic distribution algorithm, whereas PARS offers a static
   distibution algorithm.
The best parts of PARS and this system can easily be combined, something that
Ishida and I have discussed.  As far as I know, Ishida et al are not
currently constructing an implementation.  

Contact: shochi.ishida@ntt-20.ntt.jp
 
3. Work on distributed memory machines is also being pursued at U Texas by
Miranker et al.  At my last reading of their work, they were still in the
midst of designing their system.  Most of their results at that time were
with regard to re-writing rule sets so as to take advantage of parallel
hardware.  (Of course, this group has a host of other results from TREAT to
lazy serial matching algorithms to etc.)  

Contact: miranker@cs.utexas.edu


That's all for now.  If you wish to contact me, please do so directly as Iam
not a regular member of this bulletin board.

Jim Schmolze
Dept. of Computer Science       Phone: 617/381-3681
Tufts University                Internet: schmolze@cs.tufts.edu
Medford, MA 02155  USA

Anurag.Acharya@dravido.soar.cs.cmu.edu (01/10/91)

Distributed memory machines have not been popular platforms for
implementing Rete and other production system match algorithms.
There have been several efforts to place production systems on
distributed memory machines but most of them were either
not pursued to the point of actual implementation or did not 
yield encouraging results.

Here is a list of some of the projects that I know about with a 
reference to each one of them. 

1. The Dado project at Columbia : Salvatore Stolfo, Dan Miranker [1]
2. The Non-Von project at columbia : Bruce Hillyer and David Shaw [2]
3. The Pesa-1 project at university of kaiserslautern, germany :
	Franz Schreiner and Gerhard Zimmermann [3]
4. The Rubic Project at University of Southern Calif : Dan Moldovan [4]
5. The Cupid project at university of waterloo : Michael Kelly and Rudolph
	Seviora [5]
6. Mapping Rete onto a dynamically reconfigurable tree-structured
   architecture : BT Smith and D Middleton [6]
7. Mapping Rete onto the Connection machine using Oflazer's algorithm :
 	Rex Flynn [7]
8. Compiling constant productions into Activity Flow Networks and
   mapping them onto the Connection machine : Guy Blelloch [8]
9. Mapping Rete onto the MIT tagged token dataflow architecture :
	Jean-Luc Gaudiot, Sukhan Lee and Andrew Sohn [9]

In addition, there was some work done by a couple of Japanese researchers
which was published in a large-scale database/knowledgebase conference --
the names and details of the work escape at the moment but I have a hazy
recollection that the approach taken resembled that taken by the CUPID
project.

Most of these efforts were based on custom hardware. Milind Tambe
and Anoop Gupta published a paper in AAAI that described a mapping
for fine-grained stock hardware, i.e. with no special support for
production systems. [10]

Milind Tambe and myself followed up on this. We modified the original 
mapping which was intended for fine-grained machines for medium-grained 
machines (that is machines with 2-25 processors). We performed simulations 
to evaluate the mapping and study the effect of varying the 
computation/communication ratio on the speedups. The simulations assumed 
a crossbar interconection network and Sparc-based processing elements. 
Results indicated that it was possible to achieve reasonable speedups
(where reasonable means between 4 and 10 fold) on such machines.
This work has been reported at ICPP'89 [11]. A more detailed version of the
report is soon going to be published in the IEEE transaction on 
parallel and distributed systems.

I had some communication with Bill Harding, who was a grad student at the 
Air Force Institute of Technology (at the Wright-Patterson Air Force Base)
about placing production systems on hypercubes. He was working with Gary
Lamont who is on the faculty of the institute. However, I have not been
able to get in touch with him since Feb last or thereabouts.

Attractive as they might seem for parallelization, production systems have
not been effectively parallelized. To the best of my knowledge,
only two parallel implementations have been made available  : ParaOPS5 and 
CParaOPS5. Both of these are on shared memory machines and are available from
the School of Computer Schience, Carnegie Mellon University.

[1] Salvatore Stolfo and Dan Miranker, "The Dado Production System Machine",
    Journal of Parallel and Distributed Computing (JPDC), volume 3,  1986, 
    pp 269-296

[2] Bruce Hillyer and David Shaw, "Execution of OPS5 Production systems on
    a Massively Parallel Machine", JPDC, vol 3, 1986 pp 236-268

[3] Franz Schreiner and Gerhard Zimmermann, "Pesa-1 - A Parallel Architecture 
    for Production Systems", ICPP'87, pp 167-169

[4] Dan Moldovan, "RUBIC: A Multiprocessor for Rule-based Systems",
    IEEE Transactions on Systems, Man and Cybernetics, vol 19 (4),Jul/aug 89

[5] Michael Kelly and Rudolph Seviora, "Performance of OPS5 Matching on CUPID",
    Microprocessing and Microprogramming, vol 27, 1989, pp 397-404

[6] BT Smith and D Middleton, "Exploiting Fine-grained Parallelism in 
    Production Systems", National Conf of the Canadian Society for the 
    Computational Studies of Intelligence, 1987, pp 262-270

[7] Rex Flynn, "Placing Soar on the Connection Machine", AAAI workshop
    on "How Slow Elements can Think so Fast".

[8] Guy Blelloch, "CIS : A Massivley Concurrent Rule-based System",
    Probably in AAAI -- I have the paper but don't have the reference

[9] Jean-Luc Gaudiot, Suckhan Lee and Andrew Sohn, "Data-driven Multiprocessor
    Implementation of the Rete Match Algorithm", ICPP'88

[10] Anoop Gupta and Milind Tambe, "Suitability of Message Passing Computers
     for Implementing Production Systems", NCAI'88, pp 687-692

[11] Anurag Acharya and Milind Tambe, "Production Systems on Message Passing
     Computers: Simulation Results and Analysis", ICPP'89.


anurag

ps: Steve, I tried to get this message to you but my mail either hung in the 
    mail queue or bounced right back at me after a couple of days.