carrow@itd.nrl.navy.mil (Steven Carrow) (01/02/91)
I am currently doing research on implementing a Rete-based system on
a hypercube architecture. I have done some searching of the literature here
at Naval Research Laboratory, but I have not found much beyond the 1988 AAAI
article written by Gupta and Tambe. I was wondering if anyone knew of further
research in or implementations of Rete-based systems on hypercubes. I am also
interested in exploring possible applications of vectorization techniques in
this field; any info on this would be appreciated, as my search here came up
dry. Pointers on where else to look will be appreciated as well, as my
research skills are rusty. Although I used the NRL library, I do not know that
I used it well. My US Mail address is:
Steve Carrow
NRL, Code 5570
4555 Overlook Ave., SW
Washington, DC 20735-5000
Thank you for your time.
*****************************************************************************
* Steven A Carrow * ARPA: carrow@itd.nrl.navy.mil *
* Code 5570 * UUCP: {backbone}!uunet! *
* Naval Research Lab * nrl-css!carrow (I think) *
* Washington, DC 20375 * Disclaimer: I can barely speak for *
* (202) 767-9125 * myself, much less the government. *
*****************************************************************************moskowit@paul.rutgers.edu (Len Moskowitz) (01/04/91)
Though they're not working specifically on the Hypercube, Sal Stolfo at Columbia University and Anurag Acharya at CMU have been doing much of the work on parallelizing OPS5-type languages. We at Bendix have CMU's CParaOPS5 running here on our Suns and with a little luck it'll soon be running on our new 8 node Tadpole 88000 RISC multi-processor. CParaOPS5 uses a truly parallel Rete. We've spoken recently with Stolfo and his group is doing very interesting work on changing OPS5's semantics to allow truly parallel programming. Might pay to contact him. I'd like to hear what you're doing. Have you looked into other approaches than parallel Rete, such as rule level parallelism or multiple independent rule systems? ======================================================================== Len Moskowitz Allied-Signal Aerospace Bendix Test Systems Division Teterboro, NJ 07608 moskowitz@bendix.com moskowitz%TSD@atc.bendix.com moskowit@paul.rutgers.edu
schmolze%cs.tufts.edu@RELAY.CS.NET (01/10/91)
I am not on the parallel bulletin board, but learned of several messages
concerning the above subject, in which I have some research results. Would
you please post the message below? I think it will be of interest to those
involved in the ongoing discussion. I'm not ready to join this bulletin
board just yet, so I've included a note to respond directly to me. Thanks --
Jim
-----
I am not a member of this bulletin board, but have seen some messages from it
and wanted to respond to requests for information. There is much research on
parallelizing OPS5-like languages for distributed-memory machines that is
ongoing, and the messages I've seen regarding this topic did not mention all
of it. So, here is some more of it, which includes work of mine.
1. My own work has focused on the PARS system, which runs OPS5 on a hypercube
(by NCUBE) where:
- the WM and PM is distributed over the processors,
- each processor runs its own basic loop (MATCH-SELECT-ACT),
- the processors run asynchronously for the most part,
- actions that affect the WM or PM stored in other processors are forwarded
as needed,
- certain communications are performed during the basic loop so as to
guarantee:
* serializability -- the resulting overall WM is one that could have been
produced on a serial machine running OPS5 from the original WM and PM.
This is my correctness criterion. Note that PARS does not guarantee
any particular conflict resolution strategy.
* no errors due to temporary WM inconsistencies -- while actions are
being communicated between processors, temporary WM inconsistencies can
occur which can lead to non-serializable results. These problems are
prevented.
* no deadlocks -- in order to guarantee serializability, certain rules
must be prevented from co-executing at the same time. I perform this
in a way that prevents deadlocks.
The theoretical foundation of PARS is described in:
"A Parallel Asynchronous Distributed Production System", by Schmolze and
Goel, Proceedings of the Eighth National Conference on Artificial
Intelligence (AAAI-90), Boston, MA, July 1990.
Not all of the proofs are not included in that paper, but the missing ones
are nearly identical to the proofs appearing in the following paper that
builds upon earlier work of Ishida and Stolfo and runs multiple rules in
parallel in a synchronous fashion for shared-memory machines.
"Guaranteeing Serializable Results in Synchronous Parallel Production
Systems", Technical Report 89-5, Dept. of Computer Science, Tufts
University, October 1989.
(For those interested in shared memory approaches with multiple rule firings,
I am in the midst of collecting benchmark measurements for a variety of
algorithms. This work should be completed by February 1.)
The loss of a conflict resolution strategy in PARS (and pretty much all
similar works) is unfortunate but necessary. To make up for some if its
impact, in PARS I have extended OPS5 slightly to include a special provision
for forming rule groups. Basically, only the rules in the "current" group
can be executed. This addresses the main problem that the loss of a conflict
strategy causes.
The PARS system is itself in the midst of being completed. It will run on
the NCUBE hypercube and is based on the C version of serial OPS5 written by
Miranker et al at U. Texas, which is called OPS5C. The only dependence on
NCUBE (versus, say, Intel) is the manner of starting it up and of sending
messages, thus it could be ported fairly easily.
2. Work that is similar to my PARS work but done independently is that of
Ishida and Yokoo (of NTT Labs in Japan) and Gasser (of USC). Their system
also addresses the problem of running OPS5 on a distributed memory machine in
an asynchronous fashion. The resulting system is very similar to PARS except
that:
- it does not prevent deadlocks nor problems arising from temporary WM
inconsistencies,
- it offers a dynamic distribution algorithm, whereas PARS offers a static
distibution algorithm.
The best parts of PARS and this system can easily be combined, something that
Ishida and I have discussed. As far as I know, Ishida et al are not
currently constructing an implementation.
Contact: shochi.ishida@ntt-20.ntt.jp
3. Work on distributed memory machines is also being pursued at U Texas by
Miranker et al. At my last reading of their work, they were still in the
midst of designing their system. Most of their results at that time were
with regard to re-writing rule sets so as to take advantage of parallel
hardware. (Of course, this group has a host of other results from TREAT to
lazy serial matching algorithms to etc.)
Contact: miranker@cs.utexas.edu
That's all for now. If you wish to contact me, please do so directly as Iam
not a regular member of this bulletin board.
Jim Schmolze
Dept. of Computer Science Phone: 617/381-3681
Tufts University Internet: schmolze@cs.tufts.edu
Medford, MA 02155 USAAnurag.Acharya@dravido.soar.cs.cmu.edu (01/10/91)
Distributed memory machines have not been popular platforms for
implementing Rete and other production system match algorithms.
There have been several efforts to place production systems on
distributed memory machines but most of them were either
not pursued to the point of actual implementation or did not
yield encouraging results.
Here is a list of some of the projects that I know about with a
reference to each one of them.
1. The Dado project at Columbia : Salvatore Stolfo, Dan Miranker [1]
2. The Non-Von project at columbia : Bruce Hillyer and David Shaw [2]
3. The Pesa-1 project at university of kaiserslautern, germany :
Franz Schreiner and Gerhard Zimmermann [3]
4. The Rubic Project at University of Southern Calif : Dan Moldovan [4]
5. The Cupid project at university of waterloo : Michael Kelly and Rudolph
Seviora [5]
6. Mapping Rete onto a dynamically reconfigurable tree-structured
architecture : BT Smith and D Middleton [6]
7. Mapping Rete onto the Connection machine using Oflazer's algorithm :
Rex Flynn [7]
8. Compiling constant productions into Activity Flow Networks and
mapping them onto the Connection machine : Guy Blelloch [8]
9. Mapping Rete onto the MIT tagged token dataflow architecture :
Jean-Luc Gaudiot, Sukhan Lee and Andrew Sohn [9]
In addition, there was some work done by a couple of Japanese researchers
which was published in a large-scale database/knowledgebase conference --
the names and details of the work escape at the moment but I have a hazy
recollection that the approach taken resembled that taken by the CUPID
project.
Most of these efforts were based on custom hardware. Milind Tambe
and Anoop Gupta published a paper in AAAI that described a mapping
for fine-grained stock hardware, i.e. with no special support for
production systems. [10]
Milind Tambe and myself followed up on this. We modified the original
mapping which was intended for fine-grained machines for medium-grained
machines (that is machines with 2-25 processors). We performed simulations
to evaluate the mapping and study the effect of varying the
computation/communication ratio on the speedups. The simulations assumed
a crossbar interconection network and Sparc-based processing elements.
Results indicated that it was possible to achieve reasonable speedups
(where reasonable means between 4 and 10 fold) on such machines.
This work has been reported at ICPP'89 [11]. A more detailed version of the
report is soon going to be published in the IEEE transaction on
parallel and distributed systems.
I had some communication with Bill Harding, who was a grad student at the
Air Force Institute of Technology (at the Wright-Patterson Air Force Base)
about placing production systems on hypercubes. He was working with Gary
Lamont who is on the faculty of the institute. However, I have not been
able to get in touch with him since Feb last or thereabouts.
Attractive as they might seem for parallelization, production systems have
not been effectively parallelized. To the best of my knowledge,
only two parallel implementations have been made available : ParaOPS5 and
CParaOPS5. Both of these are on shared memory machines and are available from
the School of Computer Schience, Carnegie Mellon University.
[1] Salvatore Stolfo and Dan Miranker, "The Dado Production System Machine",
Journal of Parallel and Distributed Computing (JPDC), volume 3, 1986,
pp 269-296
[2] Bruce Hillyer and David Shaw, "Execution of OPS5 Production systems on
a Massively Parallel Machine", JPDC, vol 3, 1986 pp 236-268
[3] Franz Schreiner and Gerhard Zimmermann, "Pesa-1 - A Parallel Architecture
for Production Systems", ICPP'87, pp 167-169
[4] Dan Moldovan, "RUBIC: A Multiprocessor for Rule-based Systems",
IEEE Transactions on Systems, Man and Cybernetics, vol 19 (4),Jul/aug 89
[5] Michael Kelly and Rudolph Seviora, "Performance of OPS5 Matching on CUPID",
Microprocessing and Microprogramming, vol 27, 1989, pp 397-404
[6] BT Smith and D Middleton, "Exploiting Fine-grained Parallelism in
Production Systems", National Conf of the Canadian Society for the
Computational Studies of Intelligence, 1987, pp 262-270
[7] Rex Flynn, "Placing Soar on the Connection Machine", AAAI workshop
on "How Slow Elements can Think so Fast".
[8] Guy Blelloch, "CIS : A Massivley Concurrent Rule-based System",
Probably in AAAI -- I have the paper but don't have the reference
[9] Jean-Luc Gaudiot, Suckhan Lee and Andrew Sohn, "Data-driven Multiprocessor
Implementation of the Rete Match Algorithm", ICPP'88
[10] Anoop Gupta and Milind Tambe, "Suitability of Message Passing Computers
for Implementing Production Systems", NCAI'88, pp 687-692
[11] Anurag Acharya and Milind Tambe, "Production Systems on Message Passing
Computers: Simulation Results and Analysis", ICPP'89.
anurag
ps: Steve, I tried to get this message to you but my mail either hung in the
mail queue or bounced right back at me after a couple of days.